Re: Include parent URL in pdf data - nutch

2018-09-27 Thread Sebastian Nagel
Hi, could you explain in detail what is meant by "parent URL"? - the page the PDF document is linked from - a redirect pointing to the PDF doc - the "directory" of the PDF URL (clip URL after last "/") - ... Nutch indexes all successfully fetched pages but not redirects, 404s, etc. Of course, pag

Include parent URL in pdf data - nutch

2018-09-27 Thread UMA MAHESWAR
I am using nutch1.x for website cawing and indexing in solr(5.5.0). I am trying to include the parent URL along with pdf data . Can someone please suggest me some way to do it ? Thanks in advance for your comments and suggestions -- Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f60