Microsoft is working to address one of the biggest pain points with PDFs, filing a paper that outlines a method for converting non-editable PDFs to editable ones.
Virtually every computer user has been frustrated by trying to edit a PDF, only to discover that it is a non-editable file. While any number of solutions exist to extract the information from a PDF and convert it to an editable format, Microsoft highlights the issues with that approach.
While some mechanisms exists for converting some non-editable documents to editable documents (e.g., converting PDFs to Word documents), the current mechanisms do not take into account and accurately convert elements such as fonts, color schemes, and layouts. As such, there are no existing mechanisms that are able to automatically replicate text style, color scheme, objects and the like from a non-editable document to an editable document. Thus, there exists a technical problem of lack of mechanisms for automatically detecting features of a non-editable document to enable automatic conversion of a non-editable document to an editable document.
Microsoft plans to use AI and machine learning (ML) to analyze a document and convert it to an editable one, all while preserving the original layout, fonts, and colors.
A system and method and for generating an editable document from a non-editable document includes automatically identifying a layout for a plurality of objects in the non-editable document, where the layout includes bounding boxes of one or more of the objects, and automatically identifying a font for a text object of the plurality of objects. The method and system also includes selecting a color scheme for one or more of the plurality of objects, the color scheme corresponding to one or more color values associated with the plurality of objects the non-editable document, before automatically generating the editable document in accordance with at least one of the identified layout, identified font and selected color scheme.
The full paper, describing the process in detail, can be found here.
If Microsoft is able to successfully leverage ML to accomplish the goals outlined in the paper, it could solve one of the biggest challenges with document sharing and editing that individuals and businesses alike face.
from WebProNews https://ift.tt/NJZPkdM
No comments:
Post a Comment