Re: keeping index up to date

2011-08-26 Thread Radim Kolar
Dne 26.7.2011 21:55, Markus Jelsma napsal(a): We have the injector for that ;) What will injector do if injected URL is already in database? Will be injected with priority 1.0 and re-scheduled for immediate fetch?

Re: keeping index up to date

2011-07-26 Thread alxsss
...@openindex.io Sent: Tue, Jun 7, 2011 1:16 pm Subject: Re: keeping index up to date Hi, To add to Markus' comments, if you take a look at the script it is written in such a way that if run in safe mode it protects us against an error which may occur. If this is the case we an recover segments

Re: keeping index up to date

2011-07-26 Thread Markus Jelsma
@nutch.apache.org; markus.jelsma markus.jel...@openindex.io Sent: Tue, Jun 7, 2011 1:16 pm Subject: Re: keeping index up to date Hi, To add to Markus' comments, if you take a look at the script it is written in such a way that if run in safe mode it protects us against an error which may

Re: keeping index up to date

2011-06-07 Thread alxsss
. -Original Message- From: Julien Nioche lists.digitalpeb...@gmail.com To: user user@nutch.apache.org Sent: Wed, Jun 1, 2011 12:59 am Subject: Re: keeping index up to date You should use the adaptative fetch schedule. See http://pascaldimassimo.com/2010/06/11/how-to-re-crawl

Re: keeping index up to date

2011-06-07 Thread Markus Jelsma
: Re: keeping index up to date You should use the adaptative fetch schedule. See http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/ http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20for details On 1 June 2011 07:18, alx...@aim.com wrote: Hello, I use

Re: keeping index up to date

2011-06-07 Thread lewis john mcgibbney
@nutch.apache.org Sent: Wed, Jun 1, 2011 12:59 am Subject: Re: keeping index up to date You should use the adaptative fetch schedule. See http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/ http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20 for details On 1

keeping index up to date

2011-06-01 Thread alxsss
Hello, I use nutch-1.2 to index about 3000 sites. One of them has about 1500 pdf files which do not change over time. I wondered if there is a way of configuring nutch not to fetch unchanged documents again and again, but keep the old index for them. Thanks. Alex.

Re: keeping index up to date

2011-06-01 Thread Julien Nioche
You should use the adaptative fetch schedule. See http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/ http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/%20for details On 1 June 2011 07:18, alx...@aim.com wrote: Hello, I use nutch-1.2 to index about 3000 sites. One