Also interesting if I use the simple example program:

import java.io.File;
import java.io.FileInputStream;
import org.apache.tika.Tika;

public class ExtractTest {
  public static void main(String args[]) {
    String text = new Tika().parseToString(new FileInputStream(new
File("/tmp/adam-1.pdf")));
    System.out.println("'" + text + "'");
  }
}

1) And run it like this:

java -cp 
/Users/aretter/Downloads/tika-core-1.10.jar:/Users/aretter/Downloads/tika-parsers-1.10.jar:/Users/aretter/Downloads/pdfbox-1.8.10.jar
ExtractTest

The I get an empty string as the result.

2) However if I run:

java -cp /Users/aretter/Downloads/tika-app-1.10.jar ExtractTest

Then I get the expected text extracted from the PDF as the result. So,
what dependencies am I missing from my classpath in (1) to be able to
extract text from a PDF? Also is there anyway to get Tika to complain
or throw an exception if it doesn't have the dependencies that it
needs?

On 14 October 2015 at 18:59, Allison, Timothy B. <talli...@mitre.org> wrote:
> File works with Tika trunk.  What's on your classpath: tika-app or just 
> tika-core?  Is there a chance that you don't have tika-parsers on your cp?
>
>
> -----Original Message-----
> From: Adam Retter [mailto:adam.ret...@googlemail.com]
> Sent: Wednesday, October 14, 2015 12:14 PM
> To: user@tika.apache.org
> Subject: Tika unable to extract PDF Text
>
> I have a PDF which was created using Apache PDF Box 2.0.0-SNAPSHOT.
> Unfortunately Tika 1.10 seems unable to extract any text from the PDF, I 
> don't get any exceptions or errors. The code is as simple as:
>
> new Tika().parseToString(new FileInputStream(f))
>
> Tika is always returning just the empty string.
>
> The PDF is available here - http://static.adamretter.org.uk/adam-1.pdf
>
> Any ideas?
>
> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk



-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Reply via email to