[Nutch-dev] Add segment/index procedure

Christophe Noel Mon, 07 Mar 2005 08:11:05 -0800

--- Begin Message ---
Hello,
Here's some easy but interesting questions... Thanks for help.
I crawled some pages with bin/nutch crawl? Some pages were not fetched (because of some timeouts from really slow web server).

To get the unfetched (but existing pages) I ran "bin/nutch generate db segments -addays 30" to get a full fetchlist. Question 1 : were the unfetched urls still present in DB ?
Then I refetched everything to add more urls in db
"bin/nutch crawl fetch newsegment"
"bin/nutch crawl update db newsegment"
To make the newly fetched urls indexed, I ran the following. "bin/nutch crawl index newsegment" Question 2 : is it enough for Nutch Searcher to take care of the new urls (after restarting tomcat server) ?

Then, with the newly fetched urls, I got new outlinks to fetch, so I ran again "generate", "fetch", "update" and "index" Question 3 : am i wrong ?
Then, this doesn't seem to work...
Note : when I tried the "merge" command I ran "bin/nutch merge index segment_to_add" and the newsegment overwrites everything. Can you tell me more about this too.
Thanks for your time.
--- End Message ---

[Nutch-dev] Add segment/index procedure

Reply via email to