On Aug 19, 2009, at 8:42 PM, alx...@aim.com wrote:
hi
Thanks. What if urls in my seed file do not have outlinks, let
say .pdf files. Should I still specify topN variable? All I need is
to index all urls in my seed file. And they are about 1 M.
topN means that your generated
@lucene.apache.org
Sent: Thu, Aug 20, 2009 12:17 am
Subject: Re: topN value in crawl
On Aug 19, 2009, at 8:42 PM, alx...@aim.com wrote:?
?
?
?
?
hi?
?
?
?
Thanks. What if urls in my seed file do not have outlinks, let say .pdf
files. Should I still specify topN variable? All I need
Hi,
I have read a few tutorials on running Nutch to crawl web. However, I still do
not understand the meaning of topN variable in crawl command. In tutorials it
is suggested to create 3 segments and fetch them with topN=1000. What if I
create 100 segments or only one. What would be
@lucene.apache.org
Sent: Wed, Aug 19, 2009 11:02 am
Subject: Re: topN value in crawl
On Wed, Aug 19, 2009 at 12:13 PM, alx...@aim.com wrote:
?Hi,
I have read a few tutorials on running Nutch to crawl web. However, I still
do
not understand the meaning of topN variable in crawl command