Michael,
You DON'T need to copy the segments or db to the root of tomcat, but you DO
need to start tomcat from the directory directly above the segments
directory (or from the crawl directory if you've done intranet crawling).
e.g. if you have /usr/local/nutch/segments, you might type:
cd /
The way I do it is thus:
When hits.totalIsExact(), the final page can be found simply from
hits.getTotal()
When NOT hits.totalIsExact(), I run the query again, this time retrieving
say 1000 urls (the max number of results I allow to be returned). Using a
loop (increment counter by number of res
I did a fetch last night of some 150,000 pages,
updated the db without a hitch, but the system appears to be at a standstill
running the "bin/nutch index " command. So far the process has
been running over 5 hours, with esstentially no disk i/o whatsoever
the whole time. The last (only) mes