ComSoc Community

Users helping users!

In Memory of Jacques "Jack" Berlin
General Meeting Jan 22, 2019 Amherst Audubon Library
Website is back online.  See the Announcements board for details.

Scanning a paperback book
Read 7758 times
* May 08, 2010, 05:22:44 pm
I have a very old (out of print) paperback book I want to scan.  I have an HP C4480 all-in-one that works under Windows and Linux (but has no OCR under Linux).  I am running Windows XP fully updated and I have Omnipage 15 and the HP Imaging software (including IRIS OCR) that came with my printer.  It would be OK to just copy the book as images, but it would be really great if I could put the whole thing through the OCR.  Then I could search it or even add my own index to it.

When I try to scan using Omnipage (gray scale), sometimes it scans and sometimes it displays the HP scanning  splash image and then an error message, repeats that process again, and then displays an unknown scanning error in Omnipage.  I have no idea what's going wrong - probably dueling drivers.  I think it always bombs, as above, when I select scan in color.

Ideally, I would like to get it set up so that Omnipage scans each page with a zone that excludes the page on the other side of the binding and it would be great to have it work so I could set up the next page and just press the scan button on the scanner.  It's kind of difficult to move over to the notebook screen and click something with the mouse and still keep the book pressed down and positioned.

Now, when I load the HP Scanning monitor and press the scan button on the scanner, it sends the page into the HP Photosmart photo management software.  I don't see an option to configure it to go into Omnipage or IRIS.

I was thinking that it might be best to scan the whole thing in as images and, later, run it through OCR to make the work flow smoother, but I'm not sure how to do that.  I'd guess that scanning the whole thing as single page or multiple page tiff's would work.  Hopefully, Omnipage, IRIS, or whatever program could then read those and OCR them.  Then, I would just be sitting at my notebook and not have to deal with the physical book at all except to compare the original text to the OCR when something didn't make sense. 

I expect to do a fair amount of scanning over time, so I could buy newer software if that would really help.  I paid around $80 for Omnipage 15 (when it first came out) and haven't used it until now, so I'm not too enthusiastic about paying that much or more again unless it's necessary.




* May 09, 2010, 12:05:44 am
Sounds like you have a few options, Joe:  Run OmniPage and/or IRIS OCR under Windows or via WINE under Linux - whichever is more stable and efficient, utilizing the OCR functions of the scanning software, OR scan to images under your preference of Windows or Linux and run the images through your preference of OCR software under Windows. 

If you can script the OCR process using image files, that would be preferable since you could scan efficiently, and walk away during the recognition step which can be sufficiently time consuming as to constitute a real waste of time spent staring at the screen.

I believe there are Linux OCR options, so do a search and see what's out there that might be F/OSS to give yourself another possibility.



* May 09, 2010, 12:18:17 am

A quick search of of free and open source ocr software turned up these that you can try if interested:

As I mention in another post, if any/all of these applications support scripting, that might allow you to do this project with far less trouble and inconvenience than doing it all gooey in Windows.



* May 09, 2010, 05:53:12 pm
Thanks.  I'll check those out.

It's been awhile, but, last I checked, OCR and Voice Recognition - for dictation ( like what Dragon Naturally Speaking can do), not just for limited voice control applications were two fiefdoms that were thoroughly defended from invasions by Linux.

I was wondering if anyone had experience with Omnipage or similar.  It does quite a bit more than just scan text.  It can survive multiple columns, forms, and other structured text which a simple recognizer will make mince meat out of.  It will probably do what I want nicely if I can figure out how to tell it what I want and convince it to play nicely with the rest of my system.