Hai Ive been using ocrad http://www.gnu.org/software/ocrad/ocrad.htm for ocr stuff with a script.

Originally I had to convert some pdf's to text files I ended up writing a batch script (bash).

I haven't got a copy of the script on hand (took me a while to perfect too) its been lost in the ether of time.

I basically did the following

Strip image file from pdf ( the pdf just contained an image file)

run image file through some filters (later I discovered gimp can do command line filtering of images yep without the UI)

convert image file to pbm format

run file through ocrad program..  ( gocr is also pretty good read my note on the bottom)

run text file through a bash text filter  ( rip out one/two text characters surrounded by lots of white space ignore upper case words etc)

then the final thing I did was run the whole text file through a spell checker set to ignore upper case words.

It worked pretty well got 95% accuracy on really crappy scans of documents that windows clients used to do a total dummy spit on.

Took me three days to get it working and I played with it for a few weeks then forgot it after I ran the script over all the files that needed translating

One thing I must note with gocr the intermediate file format is "VERY" important....feed the wrong file type to gocr (event though it works) and the results look horrible.

ocrad is pretty good as long as the scanned image is good, also depends on your image-pbm conversion tools too.

There are commercial ocr clients for Linux (read about it in one of the Linux rags) someone may have a link or just search google


On Tue, 2004-09-14 at 19:17, Nick Croft wrote:
* Michael Lake ([EMAIL PROTECTED]) wrote:
> 
> I have not found anything that is anywhere near the ability of the 
> commercial ocr packages that come with scanner software on Windows :-(
> 
This is the only reason why I run an emulator -- for TextBridge.

Nick
Regards
Richard Neal

**************************************************************
Of course, it is very important to be sober when you take an exam.
Many worthwhile careers in the street-cleansing, fruit-picking and
subway-guitar-playing industries have been founded on a lack of
understanding of this simple fact.
(Moving Pictures)
***************************************************************
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to