Also interesting if I use the simple example program: import java.io.File; import java.io.FileInputStream; import org.apache.tika.Tika;
public class ExtractTest { public static void main(String args[]) { String text = new Tika().parseToString(new FileInputStream(new File("/tmp/adam-1.pdf"))); System.out.println("'" + text + "'"); } } 1) And run it like this: java -cp /Users/aretter/Downloads/tika-core-1.10.jar:/Users/aretter/Downloads/tika-parsers-1.10.jar:/Users/aretter/Downloads/pdfbox-1.8.10.jar ExtractTest The I get an empty string as the result. 2) However if I run: java -cp /Users/aretter/Downloads/tika-app-1.10.jar ExtractTest Then I get the expected text extracted from the PDF as the result. So, what dependencies am I missing from my classpath in (1) to be able to extract text from a PDF? Also is there anyway to get Tika to complain or throw an exception if it doesn't have the dependencies that it needs? On 14 October 2015 at 18:59, Allison, Timothy B. <talli...@mitre.org> wrote: > File works with Tika trunk. What's on your classpath: tika-app or just > tika-core? Is there a chance that you don't have tika-parsers on your cp? > > > -----Original Message----- > From: Adam Retter [mailto:adam.ret...@googlemail.com] > Sent: Wednesday, October 14, 2015 12:14 PM > To: user@tika.apache.org > Subject: Tika unable to extract PDF Text > > I have a PDF which was created using Apache PDF Box 2.0.0-SNAPSHOT. > Unfortunately Tika 1.10 seems unable to extract any text from the PDF, I > don't get any exceptions or errors. The code is as simple as: > > new Tika().parseToString(new FileInputStream(f)) > > Tika is always returning just the empty string. > > The PDF is available here - http://static.adamretter.org.uk/adam-1.pdf > > Any ideas? > > -- > Adam Retter > > skype: adam.retter > tweet: adamretter > http://www.adamretter.org.uk -- Adam Retter skype: adam.retter tweet: adamretter http://www.adamretter.org.uk