<VV> Scanning the Communique

Guus de Haan guusdehaan at mac.com
Wed Dec 23 05:14:50 EST 2009


Op 23 dec 2009, om 08:14 heeft Tony Underwood het volgende geschreven:

>> Keep in mind that scanning to pdf is very convenient but has it's own limits. Pdf's (usually) have a fixed format. Reading a pdf formatted for Legal (A4 for us ;-) on a big screen is perfect, but it's not very convenient if you try to do so on a small laptop or even iPhone. OCR software is good but usually not flawless. Try to find a way to check the output from a OCR scan.
> 
> 
> I proofread everything I scan, just to make sure.   It's also why I scan everything at 300 dpi since that does help the OCR software have a better shot at recognizing and reading small text.  

Don't forget the format aspect I mentioned. You might want to check on e-formats like epub. This is a flowable format. This can be made from any Word document. This can be read on  computers with a free product like Adobe Digital Editions and on other devices (iPhone/iPod, e-readers) too.

>> Many people OCR-ing pdf's have no idea that word are missing and/or incorrect words are added to the invisible layer. When searching on a computer all this errors will show up.  
> 
> 
> The pgm I use to scan the Communiques, Omnipage Pro, is pretty good, seldom makes any errors.    I'm rather pleased with it, not used it much until this scan project came along.   

Do you extract the text from the pdf in some way? There are tools to index text (word, rtf, txt) file so the can be checked for "strange" words.

I use the Fujitsu scansnap s1500m myself. This is a dedicated text scanner and I almost fell of my chair the first time I saw it in action, it's FAST. Of course the most time is in the OCR-ing. The best would be using the original electronic documents for new formats. This will have been done for some time now I guess.

Do you already have an idea about the size of the different scanned items?

>> I also would like to point out that there are Mac users out there, don't get lock-in into a Windows only solution please!
> 
>  I thought about that as well, which is why I was pushing for PDFs.   Well, that and the size advantage a multi-format PDF can offer over a total image scan of a document which tends to get rather large.    Likewise a .DOC file which certainly looks good but not sure if a Mac would be happy with a multi-format DOC file.   PDF, no problem, and multiformat (ascii text with jpg images) PDFs can be pretty small.   I'm not Mac literate so somebody correct me if I err, Apple-wise. 

Mac's can read doc, rtf, txt. The new docx is no problem if you have Office for Mac or Pages. PDF is the standard printing format for Mac so that's always good. Document formats are hardly ever a problem. Should Corsa decide to store the information in a special program then it's a thing to consider more platform compatibility (Linux too).

> tony..  
> 
> PS:   Guus, I'm ready to listen to any suggestions you may offer.   :)

Just one question, is there a coordinator for this stuff?

Guus de Haan
The Netherlands
'65 Corsa Turbo-Charged Cvt



More information about the VirtualVairs mailing list