I think it has already OCRed because as I wrote I can search in the pdf
with adobe reader and it also selects the result. But what I see is a
scanned paper and I guess there is a text layer "behind" it. Is it possible?
Paco Avila írta:
You can make a text extractor which perform an OCR.
On Mon, Jan 26, 2009 at 5:25 PM, Péterfi Balázs <[email protected]> wrote:
Hello,
I'm developing an application that uses jackrabbit and have some problem
with searching in pdf files. When I search in a pdf that was generated from
a word document it works. When I try to search in a pdf that has a scanned
document inside it and I can search through its contents from within Adobe
Reader (some sort of Optical Character Recognition) but my application does
not obtain results. I don't know how does this kind of pdf work but I need
to search in it. Does jackrabbit support it?
Thank you!
Balazs