Re: re-Crawl re-fetch all pages each time

2012-11-19 Thread vetus
Hello, Can you help me? I cannot solve it! thanks -- View this message in context: http://lucene.472066.n3.nabble.com/re-Crawl-re-fetch-all-pages-each-time-tp4020464p4020998.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: re-Crawl re-fetch all pages each time

2012-11-16 Thread vetus
No, I'm using the default nutch code, downloaded from web. I only put the gora properties to use Mysql driver, and I have modified the seed and url-filter files. I also have modified the Agent properties (name, etc) in nutch -site. Thanks A lot -- View this message in context: http://lucene

Re: re-Crawl re-fetch all pages each time

2012-11-16 Thread vetus
No, I'm new in nutch, but I think that I'm not using any backend -- View this message in context: http://lucene.472066.n3.nabble.com/re-Crawl-re-fetch-all-pages-each-time-tp4020464p4020668.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: re-Crawl re-fetch all pages each time

2012-11-15 Thread Lewis John Mcgibbney
Hi, Are you using the gora-cassandra backend with Nutch 2.1? On Thu, Nov 15, 2012 at 5:49 PM, vetus wrote: > Thanks you for you response, But it also re-fetch all webpages... > > This is the code that I'm using... >

RE: re-Crawl re-fetch all pages each time

2012-11-15 Thread vetus
Thanks you for you response, But it also re-fetch all webpages... This is the code that I'm using... status.put(Nutch.STAT_PHASE, "generate " + i); jobRes = runTool(GeneratorJob.class, args); if (jobRes != null) { subTools.put("generate " + i, jobRes); } sta

RE: re-Crawl re-fetch all pages each time

2012-11-15 Thread Markus Jelsma
Hi - this should not happen. The only thing i can imagine is that the update step doesn't succeed but that would mean nothing is going to be indexed either. You can inspect an URL using the readdb tool, check before and after. -Original message- > From:vetus > Sent: Thu 15-Nov-2012 15: