Categories
Scholarly tech

Scholarly Tech: OCR and Smartphones

Commercial Camera Company Photostat ad from the July 1, 1920 issue of American Machinist.
Commercial Camera Company Photostat ad from the July 1, 1920 issue of American Machinist.

Optical character recognition (OCR) is a little considered research tool that can prove both handy and valuable for students and scholars. This is especially so if you combine its abilities with the ubiquitous smartphone.

It wasn’t that long ago that scholars commonly resorted to taking index card notes in pencil when examining books that could not be checked out of a library or material found in an government archive. Some libraries had Photostat machines, which enabled one to “photocopy” material, until they were usurped by cheaper and more convenient electrostatic copiers from Xerox.

If you go to a special collections room at a college or university library today, the most common device researchers seem to use to copy material is the cameras in their smartphones and tablets. Compared to arranging to have Xerox copies made (the amount of which might be limited by library policy), the process is much simpler, more convenient, and certainly cheaper. (It is also environmentally more sustainable.) As such, researchers like me are able to gather greater amounts of material much more quickly.

This brings up the issue of what to do with all this new material? One way to handle this bounty is to use OCR software. Scaled-down OCR programs are usually included when you buy a multifunction printer-scanner. The quality of these programs is variable; so, if you want to do OCR on the cheap, I recommend you look for a scanner that bundles software from ABBYY. ABBYY seems to be the gold standard for OCR, having been used in scanning books from major libraries for Google Books. However, if you want to do it right, consider buying ABBY Fine Reader Professional 12 (Windows), which I use to handle all sorts of text documents, including those captured with my smartphone. (There are also corporate and Mac versions.) The latest version is especially valuable, as it is optimized to handle smartphone images. Once ABBYY has processed the images, it can then spit out a document in several formats, including Microsoft Word and Adobe Acrobat (pdf), from which you can then copy and paste the text you want.

Adobe Acrobat Pro DC (and other pdf programs) can also perform good quality OCR on photocopied material, though the initial conversion is to the Acrobat format; from there, they can further convert it to MS Word.