Crowdscourced Digitization – Library Tech Notes

A recent post to the librarians’ listserv got me thinking about crowdscourced digitization efforts. Is there a way to obtain good — or at least workable — images by a large quantity of people using the mobile technology they own? I did some experimentation…

The first test was to find out if the iOS app Tiny Scanner (free) could manage the task of capturing images and export the images as a multi-page PDF document. I took photos of a small multi-page document. The pages were easily captured and it was possible to drag the corners of the images to best deal with the curvature of the page. (Some other apps magnify the corners making it easier to get exact. Tiny Scanner did not. It was difficult to get exact.)

The next test was to determine if the image capture was a good enough quality to convert to OCR. For this test, I did not attempt to find an all-in-one PDF capture to OCR app. Instead, I used my personal ABBYY FineReader Express for Mac software. Despite receiving an error on each page encouraging me to scan in at least 300 DPI, I believe the Tiny Scanner document was OCR’d to the same level as the document scanned with the Epson V330 Photo scanner.

So, how do the scans compare? Without a doubt, the scanner did a better job on the images. Each page came out the exact same size and the pages with photos are much, much cleaner. However, the Tiny Scanner document, when added to Google Drive, rendered nicely with exact image sizes for all pages. (This is something that could be fixed by taking the photo from the same height each time. Oops.)

Compare the two documents:

Tiny Scanner Document

Epson Scanner Document

For text documents, it might be worth trying a crowdscourced digitization project!

Crowdscourced Digitization

Tagged on: Digitization