Re: searching in OCRed pdf

Paco Avila Mon, 26 Jan 2009 08:53:39 -0800

Jackrabbit PDF text extractor uses PDFBox. If Adobe Reader can search
the text then PDFBox should be capable of extract this text, but I
only is my opinion.


On Mon, Jan 26, 2009 at 5:47 PM, Péterfi Balázs <[email protected]> wrote:
> I think it has already OCRed because as I wrote I can search in the pdf with
> adobe reader and it also selects the result. But what I see is a scanned
> paper and I guess there is a text layer "behind" it. Is it possible?
>
> Paco Avila írta:
>>
>> You can make a text extractor which perform an OCR.
>>
>> On Mon, Jan 26, 2009 at 5:25 PM, Péterfi Balázs <[email protected]>
>> wrote:
>>
>>>
>>> Hello,
>>>
>>> I'm developing an application that uses jackrabbit and have some problem
>>> with searching in pdf files. When I search in a pdf that was generated
>>> from
>>> a word document it works. When I try to search in a pdf that has a
>>> scanned
>>> document inside it and I can search through its contents from within
>>> Adobe
>>> Reader (some sort of Optical Character Recognition) but my application
>>> does
>>> not obtain results. I don't know how does this kind of pdf work but I
>>> need
>>> to search in it. Does jackrabbit support it?
>>>
>>> Thank you!
>>> Balazs
>>>
>>>
>>>
>>
>>
>>
>>
>



-- 
Paco Avila
GIT Consultors
tel: +34 971 498310
fax: +34 971496189
e-mail: [email protected]
http://www.git.es

Re: searching in OCRed pdf

Reply via email to