Ask a Question
Advertise on boostr.in
boostr.in questions - Question:How does OCR work in Google Drive?
I have noticed that Google Drive recognizes text in PDFs (and other files such as images and text documents). Out of curiosity, I want to know what did they do to show selectable and searchable img tags. For instance, when I inspect a Google Drive document in Chrome Developer Tools, each page is an image but it doesn't behave as an image because the text is selectable. On the other hand, when I zoom in, it seems like another image with higher resolution is loaded. I think that's the same trick that scribd is using.
I also read that the Google has been improving tesseract-ocr and that the Google Books team helped with the OCR implementation in Google Drive, but I'm not sure what is the process to generate img tags in the way they are doing it.
What is going on behind scenes?
Sep 13, 2013
Sep 12, 2013
to add a comment.
Your name to display (optional):
Email me at this address if my answer is selected or commented on:
Email me if my answer is selected or commented on
Privacy: Your email address will only be used for sending these notifications.
To avoid this verification in future, please