PDFBox 0.72 doesn't work properly with some pdf documents. See more in https://issues.apache.org/jira/browse/PDFBOX-361. So, I wrote a extractor (a copy of the original, in fact) based on trunk version of PDFBox. Furthermore, the trunk version is faster then 0.72.
On Sun, Jul 19, 2009 at 5:35 PM, Vjger <[email protected]> wrote: > > Hi to all. > I'm using JackRabbit 1.5.5 and in my classpath I've > jackrabbit-text-extractors-1.5.0-jar > > Well, I noticed two problems. > > 1) The plain text text extractors depends by the file extension: in fact, > in > my workspace I've two nt:file node one as .txt extension the other as .sql > extension. The SQL contains function found only the first even if the two > file are identical (apart of the extension). > > 2) The pdf extractor has not worked correctly: with two different pdf files > it has not found the searched text > > Any suggests? > > Thanks in advance > -- > View this message in context: > http://www.nabble.com/Text-extractors-doesn%27t-work-correctly-tp24560696p24560696.html > Sent from the Jackrabbit - Users mailing list archive at Nabble.com. > >
