<VV> Scanning the Communique

Tony Underwood tony.underwood at cox.net
Wed Dec 23 02:14:46 EST 2009


At 09:23 AM 12/22/2009, Guus de Haan wrote:
>I'd like to add my 0,02 cents worth to the scanning discussion if I may.
>
>For work and study I've done a lot of paper research and information 
>retrieval. It's also something that has my personal interest.
>
>Try to look at the information from a "use" point of view. Many 
>people and organizations are busy bringing paper contend into the 
>digital world in the way it was originally published. That is not 
>always useful. The way a newspaper is made suits a newspaper but it 
>won't work for a website as this is much smaller. If you have 
>articles covered in several Communiques, it's a lot easier to make 
>this contend so it can be read as a whole. It has been mentioned 
>before, some articles are time related, some are "timeless". It can 
>be very useful to separate these two. Maybe you want to consider a 
>timeline for contend that is indeed time related. These new angles 
>can make "old" information very interesting again.


Good point.   With PDFs in raw searchable form such articles could be 
rearranged, and I'm perfectly willing to allow whoever has the last 
word the privilege of slicing up anything I scan...  whichever way it 
may go to make the project more useful.



>Keep in mind that scanning to pdf is very convenient but has it's 
>own limits. Pdf's (usually) have a fixed format. Reading a pdf 
>formatted for Legal (A4 for us ;-) on a big screen is perfect, but 
>it's not very convenient if you try to do so on a small laptop or 
>even iPhone. OCR software is good but usually not flawless. Try to 
>find a way to check the output from a OCR scan.


I proofread everything I scan, just to make sure.   It's also why I 
scan everything at 300 dpi since that does help the OCR software have 
a better shot at recognizing and reading small text.


>Many people OCR-ing pdf's have no idea that word are missing and/or 
>incorrect words are added to the invisible layer. When searching on 
>a computer all this errors will show up.


The pgm I use to scan the Communiques, Omnipage Pro, is pretty good, 
seldom makes any errors.    I'm rather pleased with it, not used it 
much until this scan project came along.


>I also would like to point out that there are Mac users out there, 
>don't get lock-in into a Windows only solution please!



I thought about that as well, which is why I was pushing for 
PDFs.   Well, that and the size advantage a multi-format PDF can 
offer over a total image scan of a document which tends to get rather 
large.    Likewise a .DOC file which certainly looks good but not 
sure if a Mac would be happy with a multi-format DOC file.   PDF, no 
problem, and multiformat (ascii text with jpg images) PDFs can be 
pretty small.   I'm not Mac literate so somebody correct me if I err, 
Apple-wise.



tony..

PS:   Guus, I'm ready to listen to any suggestions you may offer.   :)  


More information about the VirtualVairs mailing list