Re: Nutch Incremental Crawl

2013-03-05 Thread feng lu
Aurl) url is re-fetched (it shows in log) but at > > Solr > > >> > level > > >> > > : for that url (aurl): content field and title field didn't get > > >> updated. > > >> > > Why? should I do any configuration for this to make sol

Re: Nutch Incremental Crawl

2013-03-05 Thread David Philip
> > Added new url to the crawling site > >> > > The url got indexed - This is success. So interested to know why the > >> > above > >> > > case failed? What configuration need to be made? > >> > > > >> > > >

Re: Nutch Incremental Crawl

2013-03-04 Thread feng lu
. So interested to know why the >> > above >> > > case failed? What configuration need to be made? >> > > >> > > >> > > Thanks - David >> > > >> > > >> > > *PS:* >> > > Apologies that I am still

Re: Nutch Incremental Crawl

2013-03-04 Thread feng lu
nd good way for incremental crawl so trying different approaches. > > Once I > > > am clear I will blog this and share it. Thanks lot for replies from > > mailer. > > > > > > > > > > > > > > > > > > > > > > > > On W

Re: Nutch Incremental Crawl

2013-03-04 Thread David Philip
> > > > > > > > > > > > > > On Wed, Feb 27, 2013 at 4:06 PM, Markus Jelsma > > wrote: > > > > > You can simply reinject the records. You can overwrite and/or update > the > > > current record. See the db.injector.update and overwrite

Re: Nutch Incremental Crawl

2013-03-04 Thread feng lu
s from mailer. > > > > > > > > On Wed, Feb 27, 2013 at 4:06 PM, Markus Jelsma > wrote: > > > You can simply reinject the records. You can overwrite and/or update the > > current record. See the db.injector.update and overwrite settings. > > > >

Re: Nutch Incremental Crawl

2013-03-04 Thread David Philip
ds. You can overwrite and/or update the > current record. See the db.injector.update and overwrite settings. > > -Original message- > > From:David Philip > > Sent: Wed 27-Feb-2013 11:23 > > To: user@nutch.apache.org > > Subject: Re: Nutch Incremental Crawl &g

RE: Nutch Incremental Crawl

2013-02-27 Thread Markus Jelsma
You can simply reinject the records. You can overwrite and/or update the current record. See the db.injector.update and overwrite settings. -Original message- > From:David Philip > Sent: Wed 27-Feb-2013 11:23 > To: user@nutch.apache.org > Subject: Re: Nutch Incremental C

Re: Nutch Incremental Crawl

2013-02-27 Thread David Philip
dFetchInterval=86400 > > > -Original message- > > From:David Philip > > Sent: Wed 27-Feb-2013 06:21 > > To: user@nutch.apache.org > > Subject: Re: Nutch Incremental Crawl > > > > Hi all, > > > > Thank you very much for the replies. Ve

RE: Nutch Incremental Crawl

2013-02-27 Thread Markus Jelsma
> To: user@nutch.apache.org > Subject: Re: Nutch Incremental Crawl > > Hi all, > > Thank you very much for the replies. Very useful information to > understand how incremental crawling can be achieved. > > Dear Markus: > Can you please tell me how do I over ride this fetch

Re: Nutch Incremental Crawl

2013-02-26 Thread feng lu
; -Original message----- > > > From:kemical > > > Sent: Thu 14-Feb-2013 10:15 > > > To: user@nutch.apache.org > > > Subject: Re: Nutch Incremental Crawl > > > > > > Hi David, > > > > > > You can also consider setting shorter fe

Re: Nutch Incremental Crawl

2013-02-26 Thread David Philip
> > Sent: Thu 14-Feb-2013 10:15 > > To: user@nutch.apache.org > > Subject: Re: Nutch Incremental Crawl > > > > Hi David, > > > > You can also consider setting shorter fetch interval time with nutch > inject. > > This way you'll set highe

RE: Nutch Incremental Crawl

2013-02-14 Thread Markus Jelsma
If you want records to be fetched at a fixed interval its easier to inject them with a fixed fetch interval. nutch.fixedFetchInterval=86400 -Original message- > From:kemical > Sent: Thu 14-Feb-2013 10:15 > To: user@nutch.apache.org > Subject: Re: Nutch Incremental C

Re: Nutch Incremental Crawl

2013-02-14 Thread kemical
the right choice. Best, Mike -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-Incremental-Crawl-tp4037903p4040400.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch Incremental Crawl

2013-02-05 Thread David Philip
Hi Sebastian, Thank you for the reply, the steps mentioned in previous email, worked.Thanks. One last question about incremental crawl: My understanding is when the crawler is run on daily basis(cron job), it should check for each url in its fetch list for the last date modified and if it is

Re: Nutch Incremental Crawl

2013-02-04 Thread Sebastian Nagel
Hi David, the first steps are right but maybe it's easier to run the Java classes via bin/nutch: bin/nutch freegen urls2/ freegen_segments/ # generated: freegen_segments/123 bin/nutch fetch freegen_segments/123 bin/nutch parse freegen_segments/123 (if fetcher.parse == false) bin/nutch updat

Re: Nutch Incremental Crawl

2013-02-04 Thread David Philip
Hi Sebastian, Thank you for the reply. Executed the following steps, please correct me if I am wrong. I do not see the changes updated. Run: - org.apache.nutch.tools.FreeGenerator *arguments* :urls2 crawl/segments [urls2/seed.txt - url of the page that was modified] - org.apache.nut

Re: Nutch Incremental Crawl

2013-02-01 Thread Sebastian Nagel
Hi David, > So even If there is any modification made on a fetched > page before this interval and the crawl job is run, it will still not be > re-fetched/updated unless this interval is crossed. Yes. That's correct. > is there any way to do immediate update? Yes, provided that you know which doc