Michael,
You DON'T need to copy the segments or db to the root of tomcat, but you DO
need to start tomcat from the directory directly above the segments
directory (or from the crawl directory if you've done intranet crawling).
e.g. if you have /usr/local/nutch/segments, you might type:
cd /
hi Fredrik:
After I did crawling in Nutch, I copy segments to root
of tomcat.
I wonder if I need to do the same thing for index and
db directory.
thanks,
Michael,
--- Fredrik Andersson <[EMAIL PROTECTED]>
wrote:
> No, I think you're right that indexing is done
> automatically after
> intrane
Hi Fredrik:
the command
"
bin/nutch crawl * -dir * -depth d
"
is only working for intranet? means only can fetch
within one particular domain?
I want to do a global fetching, but only limited to a
limited web list, so, should I use command set of
"bin/nutch admin db -create
...
"
instead?
but,
No, I think you're right that indexing is done automatically after
intranet crawls. Just try "bin/nutch index yourSegment", if it says
that 'index.done exists already',then well.. you get the point. I
don't know what platform you're using, but try doing a "grep -r *". The grep command should match
hi Fredrik:
Actually, I use nutch/crawl command as following:
"
bin/nutch crawl urls -dir crawl-s -depth 1 >&
crawl-s.log
"
I guess I don't need to do index explicitly after
crawl. Is it right?
My sample crawling doesn't go deeply and only stop at
the home page of the URL.
I guess the -depth is
Hi Michael.
Have you indexed the crawl/segment? Easy to forget sometimes : ) Also,
check the crawler-tools.xml or whatever it's called, so that ASP pages
aren't blocked or anything. The Nutch crawler doesn't by default
handle parameters (committees.asp?viewPerson=Ji), I guess that could
be an issu