--- Begin Message ---
Hello,

Here's some easy but interesting questions... Thanks for help.

I crawled some pages with bin/nutch crawl? Some pages were not fetched (because of some timeouts from really slow web server).


To get the unfetched (but existing pages) I ran "bin/nutch generate db segments -addays 30" to get a full fetchlist.
Question 1 : were the unfetched urls still present in DB ?


Then I refetched everything to add more urls in db
"bin/nutch crawl fetch newsegment"
"bin/nutch crawl update db newsegment"

To make the newly fetched urls indexed, I ran the following.
"bin/nutch crawl index newsegment"
Question 2 : is it enough for Nutch Searcher to take care of the new urls (after restarting tomcat server) ?


Then, with the newly fetched urls, I got new outlinks to fetch, so I ran again "generate", "fetch", "update" and "index"
Question 3 : am i wrong ?


Then, this doesn't seem to work...

Note : when I tried the "merge" command I ran "bin/nutch merge index segment_to_add" and the newsegment overwrites everything. Can you tell me more about this too.

Thanks for your time.





--- End Message ---


Reply via email to