On 14 January 2013 16:12, paddz <[email protected]> wrote: > > Hi Lewis, > > i am using nutch 1.5.1 > I get no specific log output or errors. > > I am expecting nutch to crawl pdfs with no file extension e.g. > /output/mypdffile, actually nutch is only crawling/parsing pdfs which look > like this /output/mypdffile*.pdf* [...]
Just a thought: Is your PDF content being served with mimetype="application/pdf"? Regards, Gora

