--- Begin Message ---
Hello,
Here's some easy but interesting questions... Thanks for help.
I crawled some pages with bin/nutch crawl? Some pages were not fetched
(because of some timeouts from really slow web server).
To get the unfetched (but existing pages) I ran "bin/nutch generate db
segments -addays 30" to get a full fetchlist.
Question 1 : were the unfetched urls still present in DB ?
Then I refetched everything to add more urls in db
"bin/nutch crawl fetch newsegment"
"bin/nutch crawl update db newsegment"
To make the newly fetched urls indexed, I ran the following.
"bin/nutch crawl index newsegment"
Question 2 : is it enough for Nutch Searcher to take care of the new
urls (after restarting tomcat server) ?
Then, with the newly fetched urls, I got new outlinks to fetch, so I ran
again "generate", "fetch", "update" and "index"
Question 3 : am i wrong ?
Then, this doesn't seem to work...
Note : when I tried the "merge" command I ran "bin/nutch merge index
segment_to_add" and the newsegment overwrites everything. Can you tell
me more about this too.
Thanks for your time.
--- End Message ---