Dne 26.7.2011 21:55, Markus Jelsma napsal(a):
We have the injector for that ;)
What will injector do if injected URL is already in database? Will be
injected with priority 1.0 and re-scheduled for immediate fetch?
...@openindex.io
Sent: Tue, Jun 7, 2011 1:16 pm
Subject: Re: keeping index up to date
Hi,
To add to Markus' comments, if you take a look at the script it is written
in such a way that if run in safe mode it protects us against an error which
may occur. If this is the case we an recover segments
@nutch.apache.org; markus.jelsma
markus.jel...@openindex.io Sent: Tue, Jun 7, 2011 1:16 pm
Subject: Re: keeping index up to date
Hi,
To add to Markus' comments, if you take a look at the script it is written
in such a way that if run in safe mode it protects us against an error
which may
.
-Original Message-
From: Julien Nioche lists.digitalpeb...@gmail.com
To: user user@nutch.apache.org
Sent: Wed, Jun 1, 2011 12:59 am
Subject: Re: keeping index up to date
You should use the adaptative fetch schedule. See
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl
: Re: keeping index up to date
You should use the adaptative fetch schedule. See
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20for
details
On 1 June 2011 07:18, alx...@aim.com wrote:
Hello,
I use
@nutch.apache.org
Sent: Wed, Jun 1, 2011 12:59 am
Subject: Re: keeping index up to date
You should use the adaptative fetch schedule. See
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20
for
details
On 1
Hello,
I use nutch-1.2 to index about 3000 sites. One of them has about 1500 pdf files
which do not change over time.
I wondered if there is a way of configuring nutch not to fetch unchanged
documents again and again, but keep the old index for them.
Thanks.
Alex.
You should use the adaptative fetch schedule. See
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/
http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20for
details
On 1 June 2011 07:18, alx...@aim.com wrote:
Hello,
I use nutch-1.2 to index about 3000 sites. One
8 matches
Mail list logo