Hello,

From a list of start URLs (each associated with a regular expression), I'd like to get - for each start URL - all URLs that come from the same domain and that match the expression...I don't wanna analyse or index the URLs, just to write them down in a flat file.

Example :
start URL : http://www.mydomain.com
regular expresssion : /files/*.html

gives :
- http://www.mydomain.com/files/index.html
- http://www.mydomain.com/files/a.html
- http://www.mydomain.com/files/a01.html
- http://www.mydomain.com/files/b.html
- ...

How can I do that simply with Nutch without "reinventing the wheel" ? Should I extend an existing class ? develop a plugin ? Could you give me some tips please ?

Thanks a lot for this useful forum !!!

Fabrice


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to