Hello,
From a list of start URLs (each associated with a regular expression), I'd like to get - for each start URL - all URLs that come from the same domain and that match the expression...I don't wanna analyse or index the URLs, just to write them down in a flat file.
Example : start URL : http://www.mydomain.com regular expresssion : /files/*.html
gives : - http://www.mydomain.com/files/index.html - http://www.mydomain.com/files/a.html - http://www.mydomain.com/files/a01.html - http://www.mydomain.com/files/b.html - ...
How can I do that simply with Nutch without "reinventing the wheel" ? Should I extend an existing class ? develop a plugin ? Could you give me some tips please ?
Thanks a lot for this useful forum !!!
Fabrice
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers