How u know that u are not using urlfilter.txt? Fetching "0" records tells that no url has been selected or url mentioned is wrong one....try to find the error in those all such files where such things about domain name is mentioned as for e.g. , /nutch-1.0/conf/regex-urlfilter.txt nutch-1.0/conf/prefix-urlfilter.txt nutch-1.0/conf/crawl-urlfilter.txt
try these.... On Mon, Mar 7, 2011 at 8:49 AM, chidu r [via Lucene] < [email protected]> wrote: > Hi all > > I am trying to setup nutch 1.2 on Hadoop and used the instructions at > http://wiki.apache.org/nutch/NutchHadoopTutorial, it has been very useful. > > > However, I find that when I execute the command: > > $bin/nutch crawl urls -dir crawl -depth 4 -topN 50 > > The crawler stops at the generator stage with the message: > 2011-03-06 17:23:49,538 WARN crawl.Generator - Generator: 0 records > selected for fetching, exiting ... > > I have configured the following plugins in nutch-site.xml > > protocol-http|parse-(text|html|js)|urlnormalizer-(pass|regex|basic)|urlfilter-regex|index-(basic|anchor) > > > I am not using crawl-urlfilter.txt or regex-urlfilter.txt tp filter URLs. I > > initiated the crawl with 10 seed urls from popular sites on internet. > > Any pointers to what I am missing here? > > > regards > Chidu > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Help-Crawl-returns-no-URLs-tp2644587p2644587.html > To start a new topic under Nutch - User, email > [email protected] > To unsubscribe from Nutch - User, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=>. > > -- Kumar Anurag ----- Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Crawl-returns-no-URLs-tp2644587p2645916.html Sent from the Nutch - User mailing list archive at Nabble.com.

