Hi,
 
-----Original message-----
> From:kiran chitturi <chitturikira...@gmail.com>
> Sent: Thu 25-Oct-2012 20:49
> To: user@nutch.apache.org
> Subject: Nutch 2.x Eclipse: Can't retrieve Tika parser for mime-type 
> application/pdf
> 
> Hi,
> 
> i have built Nutch 2.x in eclipse using this tutorial (
> http://wiki.apache.org/nutch/RunNutchInEclipse) and with some modifications.
> 
> Its able to parse html files successfully but when it comes to pdf files it
> says 2012-10-25 14:37:05,071 ERROR tika.TikaParser - Can't retrieve Tika
> parser for mime-type application/pdf
> 
> Is there anything wrong with my eclipse configuration? I am looking to
> debug some  things in nutch, so i am working with eclipse and nutch.
> 
> Do i need to point any libraries for eclipseto recognize tika parsers for
> application/pdf type ?
> 
> What exactly is the reason for this type of error to appear for only pdf
> files and not html files ? I am using recent nutch 2.x which has tika
> upgraded to 1.2

This is possible if the PDFBox dependancy is not found anywhere or is wrongly 
mapped in Tika's plugin.xml. The above error can also happen if you happen to 
have a tika-parsers-VERSION.jar in your runtime/local/lib directory, for some 
strange reason.

> 
> I would like some help here and would like to know if anyone has
> encountered similar problem with eclipse, nutch 2.x and parsing
> application/pdf files ?
> 
> Many Thanks,
> -- 
> Kiran Chitturi
> 

Reply via email to