I want to crawl a full website in order to index specific file types. Assume I have a web site with many pages, some of them contain links to pdf files. I want to index only these files. If I use suffix-filter to filter out anything but .pdf, then nothing is fetched (well I assume that its obvious - the pages that contain the pdf links are not .pdf)
Is it possible to tell nutch - go through the given url, take only links to .pdf file and then index only the files, without indexing the pages themselves?

