I have a bunch of PDF files that have had an OCR package run against them.
The problem is that it adds the text to the normal Page content, and tries to 
position the recognized text at the location in the image it was found.
So the text is mixed with lots of positioning, etc..  information.
I'd like to extract all the text as a block of text, and just add it all as a 
single item.  Probably an annotation.
There are lots of tools to extract text from a PDF - but they are all web 
based, or use a GUI to do one file at a time.
I want to just run this against a directory full of PDF's and have it do all of 
them.

Anyone know of such a tool?  Have one written?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to