Hi,
could you explain in detail what is meant by "parent URL"?
- the page the PDF document is linked from
- a redirect pointing to the PDF doc
- the "directory" of the PDF URL (clip URL after last "/")
- ...
Nutch indexes all successfully fetched pages but not redirects,
404s, etc. Of course, pag
I am using nutch1.x for website cawing and indexing in solr(5.5.0).
I am trying to include the parent URL along with pdf data .
Can someone please suggest me some way to do it ?
Thanks in advance for your comments and suggestions
--
Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f60
2 matches
Mail list logo