Hi all

I am trying to setup nutch 1.2 on Hadoop and used the instructions at
http://wiki.apache.org/nutch/NutchHadoopTutorial, it has been very useful.

However, I find that when I execute the command:

$bin/nutch crawl urls -dir crawl -depth 4 -topN 50

The crawler stops at the generator stage with the message:
2011-03-06 17:23:49,538 WARN  crawl.Generator - Generator: 0 records
selected for fetching, exiting ...

I have configured the following plugins in nutch-site.xml
 
protocol-http|parse-(text|html|js)|urlnormalizer-(pass|regex|basic)|urlfilter-regex|index-(basic|anchor)

I am not using crawl-urlfilter.txt or regex-urlfilter.txt tp filter URLs. I
initiated the crawl with 10 seed urls from popular sites on internet.

Any pointers to what I am missing here?


regards
Chidu

Reply via email to