On 11:44:00 17/Sep , Piotr Kosiorowski wrote:
> Yes - depth means in fact - number of interations of
> generate/fetch/update cycle.
ok, now it's clear :)
> nutch generate - will include already fetched pages in new segment for
> fetching after some time (I think default is 30 days and you can
Daniele Menozzi wrote:
ok, so the depth value is only used to stop the crawling at a certain
point, and proceed with the indexing, right?
Yes - depth means in fact - number of interations of
generate/fetch/update cycle.
But, another thing: how can I refresh old pages? What class do I have to
On 19:33:57 16/Sep , Piotr Kosiorowski wrote:
> bin/nutch updatedb db $s1
> command updates WebDB with links you fetched in segment $s1.
ok, so the depth value is only used to stop the crawling at a certain
point, and proceed with the indexing, right?
But, another thing: how can I refresh old p
bin/nutch updatedb db $s1
command updates WebDB with links you fetched in segment $s1.
Regards
Piotr
Daniele Menozzi wrote:
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do
not have really understood what is the ralationship between
depth,segments,fetching..
Take for ex
at look at this good nutch doc
http://wiki.apache.org/nutch/DissectingTheNutchCrawler
Michael Ji
--- Daniele Menozzi <[EMAIL PROTECTED]> wrote:
> Hi all, I have questions regarding
> org.apache.nutch.tools.CrawlTool: I do
> not have really understood what is the ralationship
> between
> depth,s
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do
not have really understood what is the ralationship between
depth,segments,fetching..
Take for example the tutorial, I understand theese 2 steps:
bin/nutch admin db -create
bin/nutch inject db -dmozfile conte