Yes I have tika-parsers-1.10.jar on the classpath. I wonder if this could be an issue with also having pdfbox-2.0.0-SNAPSHOT on the classpath?
Our project depends on PDFBox 2.0.0 and I see that tika-parsers depends on PDFBox 1.8.10. On 14 October 2015 at 18:59, Allison, Timothy B. <talli...@mitre.org> wrote: > File works with Tika trunk. What's on your classpath: tika-app or just > tika-core? Is there a chance that you don't have tika-parsers on your cp? > > > -----Original Message----- > From: Adam Retter [mailto:adam.ret...@googlemail.com] > Sent: Wednesday, October 14, 2015 12:14 PM > To: user@tika.apache.org > Subject: Tika unable to extract PDF Text > > I have a PDF which was created using Apache PDF Box 2.0.0-SNAPSHOT. > Unfortunately Tika 1.10 seems unable to extract any text from the PDF, I > don't get any exceptions or errors. The code is as simple as: > > new Tika().parseToString(new FileInputStream(f)) > > Tika is always returning just the empty string. > > The PDF is available here - http://static.adamretter.org.uk/adam-1.pdf > > Any ideas? > > -- > Adam Retter > > skype: adam.retter > tweet: adamretter > http://www.adamretter.org.uk -- Adam Retter skype: adam.retter tweet: adamretter http://www.adamretter.org.uk