Re: Text extractors doesn't work correctly

Fabiano Nunes Sun, 19 Jul 2009 14:22:08 -0700

PDFBox 0.72 doesn't work properly with some pdf documents. See more in
https://issues.apache.org/jira/browse/PDFBOX-361.
So, I wrote a extractor (a copy of the original, in fact) based on trunk
version of PDFBox. Furthermore, the trunk version is faster then 0.72.


On Sun, Jul 19, 2009 at 5:35 PM, Vjger <[email protected]> wrote:

>
> Hi to all.
> I'm using JackRabbit 1.5.5 and in my classpath I've
> jackrabbit-text-extractors-1.5.0-jar
>
> Well, I noticed two problems.
>
> 1) The plain text text extractors depends by the file extension: in fact,
> in
> my workspace I've two nt:file node one as .txt extension the other as .sql
> extension. The SQL contains function found only the first even if the two
> file are identical (apart of the extension).
>
> 2) The pdf extractor has not worked correctly: with two different pdf files
> it has not found the searched text
>
> Any suggests?
>
> Thanks in advance
> --
> View this message in context:
> http://www.nabble.com/Text-extractors-doesn%27t-work-correctly-tp24560696p24560696.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>
>

Re: Text extractors doesn't work correctly

Reply via email to