I want to crawl a full website in order to index specific file types.
Assume I have a web site with many pages, some of them contain links to pdf
files.
I want to index only these files.
If I use suffix-filter to filter out   anything but .pdf, then nothing is
fetched (well I assume that its obvious - the pages that contain the pdf
links are not .pdf)

Is  it possible to tell nutch - go through the given url, take only links
to .pdf file and then index only the files, without indexing the pages
themselves?

Reply via email to