subject:"\[Nutch\-dev\] Re\: search result"

[Nutch-dev] Re: search result

2005-07-23 Thread Roger Dunk

Michael, You DON'T need to copy the segments or db to the root of tomcat, but you DO need to start tomcat from the directory directly above the segments directory (or from the crawl directory if you've done intranet crawling). e.g. if you have /usr/local/nutch/segments, you might type: cd /

[Nutch-dev] Re: search result

2005-07-23 Thread Feng (Michael) Ji

hi Fredrik: After I did crawling in Nutch, I copy segments to root of tomcat. I wonder if I need to do the same thing for index and db directory. thanks, Michael, --- Fredrik Andersson <[EMAIL PROTECTED]> wrote: > No, I think you're right that indexing is done > automatically after > intrane

[Nutch-dev] Re: search result

2005-07-23 Thread Feng (Michael) Ji

Hi Fredrik: the command " bin/nutch crawl * -dir * -depth d " is only working for intranet? means only can fetch within one particular domain? I want to do a global fetching, but only limited to a limited web list, so, should I use command set of "bin/nutch admin db -create ... " instead? but,

[Nutch-dev] Re: search result

2005-07-23 Thread Fredrik Andersson

No, I think you're right that indexing is done automatically after intranet crawls. Just try "bin/nutch index yourSegment", if it says that 'index.done exists already',then well.. you get the point. I don't know what platform you're using, but try doing a "grep -r *". The grep command should match

[Nutch-dev] Re: search result

2005-07-23 Thread Feng (Michael) Ji

hi Fredrik: Actually, I use nutch/crawl command as following: " bin/nutch crawl urls -dir crawl-s -depth 1 >& crawl-s.log " I guess I don't need to do index explicitly after crawl. Is it right? My sample crawling doesn't go deeply and only stop at the home page of the URL. I guess the -depth is

[Nutch-dev] Re: search result

2005-07-23 Thread Fredrik Andersson

Hi Michael. Have you indexed the crawl/segment? Easy to forget sometimes : ) Also, check the crawler-tools.xml or whatever it's called, so that ASP pages aren't blocked or anything. The Nutch crawler doesn't by default handle parameters (committees.asp?viewPerson=Ji), I guess that could be an issu

[Nutch-dev] Re: search result

[Nutch-dev] Re: search result

[Nutch-dev] Re: search result

[Nutch-dev] Re: search result

[Nutch-dev] Re: search result

[Nutch-dev] Re: search result

6 matches

Site Navigation

Mail list logo

Footer information