Nutch 1.6 - sequence in which crawler works its way to a URL

A Laxmi Wed, 31 Jul 2013 07:56:51 -0700

Hello,

For example, I have a single *seed *url say "http://nutch.apache.org/"; and
I am crawling it for "n" times. At the end of the crawl, I have 1220 new
urls generated/fetched/updated from a single seed url. While looking at
these 1220 new urls, I am interested to know how a particular site eg.
"www.abc/xy.com" has been crawled. Better question would be - in what
sequence did the crawler work its way to a particular url "www.abc/xy.com"?


Thanks for your help!

Nutch 1.6 - sequence in which crawler works its way to a URL

Reply via email to