Re: Nutch Topical / Focused Crawl

2009-10-06 Thread MyD
I just found an interesting thesis which explains how to turn / modify Nutch into a focused / topical crawler. This thesis helped me a lot. Maybe useful to others... http://wing.comp.nus.edu.sg/publications/theses/2009/markusHaenseThesis.pdf MyD wrote: > > Hi @ all, > > I'd like to turn Nutc

Authenticity of URLs from DMOZ

2009-10-06 Thread Gaurang Patel
Hey, Can anyone tell what could be the reason for following which happened while fetching data using bin/nutch fetch: My AVG Antivirus is detecting virus threats while Nutch fetches pages from available urls of *crawldb.* I injected DMOZ Open Directory urls to crawldb. Antivirus already detected