Thank you very much. This has worked great and resolved the issue of finding parser.
One interesting thing is out of 10 pdf files, it has crawled 2 files and said unsuccessful for other pdf files. This has happened like 10 times for now. I really need to debug and put more error messages than just 'unable to succesfully parse content ..' Thanks again, Kiran. On Fri, Oct 26, 2012 at 4:16 AM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > > > > Is there anything wrong with my eclipse configuration? I am looking to > > debug some things in nutch, so i am working with eclipse and nutch. > > > easier to follow the steps in Remote Debugging in Eclipse from > http://wiki.apache.org/nutch/RunNutchInEclipse > > it will save you all sorts of classpath issues etc... note that this works > in local mode only > > HTH > > Julien > > > On 25 October 2012 19:44, kiran chitturi <chitturikira...@gmail.com> > wrote: > > > Hi, > > > > i have built Nutch 2.x in eclipse using this tutorial ( > > http://wiki.apache.org/nutch/RunNutchInEclipse) and with some > > modifications. > > > > Its able to parse html files successfully but when it comes to pdf files > it > > says 2012-10-25 14:37:05,071 ERROR tika.TikaParser - Can't retrieve Tika > > parser for mime-type application/pdf > > > > Is there anything wrong with my eclipse configuration? I am looking to > > debug some things in nutch, so i am working with eclipse and nutch. > > > > Do i need to point any libraries for eclipseto recognize tika parsers for > > application/pdf type ? > > > > What exactly is the reason for this type of error to appear for only pdf > > files and not html files ? I am using recent nutch 2.x which has tika > > upgraded to 1.2 > > > > I would like some help here and would like to know if anyone has > > encountered similar problem with eclipse, nutch 2.x and parsing > > application/pdf files ? > > > > Many Thanks, > > -- > > Kiran Chitturi > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > -- Kiran Chitturi