On 14 January 2013 16:12, paddz <[email protected]> wrote:
>
> Hi Lewis,
>
> i am using nutch 1.5.1
> I get no specific log output or errors.
>
> I am expecting nutch to crawl pdfs with no file extension e.g.
> /output/mypdffile, actually nutch is only crawling/parsing pdfs which look
> like this /output/mypdffile*.pdf*
[...]

Just a thought: Is your PDF content being served with
mimetype="application/pdf"?

Regards,
Gora

Reply via email to