Is it possible to go through a whole website but index only a specific file type?

Dan Volfman Tue, 24 Jan 2012 09:22:48 -0800

I want to crawl a full website in order to index specific file types.
Assume I have a web site with many pages, some of them contain links to pdf
files.
I want to index only these files.
If I use suffix-filter to filter out   anything but .pdf, then nothing is
fetched (well I assume that its obvious - the pages that contain the pdf
links are not .pdf)


Is  it possible to tell nutch - go through the given url, take only links
to .pdf file and then index only the files, without indexing the pages
themselves?

Is it possible to go through a whole website but index only a specific file type?

Reply via email to