Re: Tika unable to extract PDF Text

Adam Retter Thu, 15 Oct 2015 08:54:51 -0700

Yes I have tika-parsers-1.10.jar on the classpath. I wonder if this
could be an issue with also having pdfbox-2.0.0-SNAPSHOT on the
classpath?


Our project depends on PDFBox 2.0.0 and I see that tika-parsers
depends on PDFBox 1.8.10.

On 14 October 2015 at 18:59, Allison, Timothy B. <[email protected]> wrote:
> File works with Tika trunk.  What's on your classpath: tika-app or just 
> tika-core?  Is there a chance that you don't have tika-parsers on your cp?
>
>
> -----Original Message-----
> From: Adam Retter [mailto:[email protected]]
> Sent: Wednesday, October 14, 2015 12:14 PM
> To: [email protected]
> Subject: Tika unable to extract PDF Text
>
> I have a PDF which was created using Apache PDF Box 2.0.0-SNAPSHOT.
> Unfortunately Tika 1.10 seems unable to extract any text from the PDF, I 
> don't get any exceptions or errors. The code is as simple as:
>
> new Tika().parseToString(new FileInputStream(f))
>
> Tika is always returning just the empty string.
>
> The PDF is available here - http://static.adamretter.org.uk/adam-1.pdf
>
> Any ideas?
>
> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk



-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Re: Tika unable to extract PDF Text

Reply via email to