Re: [Nutch-dev] Which tutorial to use for getting Nutch 9.12 up and running on a single machine?

2006-09-19 Thread Richard Braman
Jp Mutch wrote: > > My questions are regarding crawling and testing/searching: > Due to my local requirements, initially I just need to run all of nutch > on a single machine in its local filesystem, without really needing > Hadoop or DFS [I don't mind if they are running "under the hood"]. >

[Nutch-dev] CrawlDatum.modifiedTime ?

2006-09-19 Thread Kim, Greg
>From looking at the code, it doesn't look like anyone is setting the >modifiedTime in the CrawlDatum. Is this a bug? I guess we can kinda derive >the modifiedTime by looking at the fetchTime and possibly fetchInterval based >on status. But if the modifiedTime field is there in CrawlDatum I

[Nutch-dev] [jira] Resolved: (NUTCH-367) DistributedSearch thown ClassCastException

2006-09-19 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-367?page=all ] Sami Siren resolved NUTCH-367. -- Fix Version/s: 0.9.0 Resolution: Fixed Assignee: Sami Siren I just committed a fix for this together with testcase, thanks for reporting it. > Distr

Re: [Nutch-dev] [jira] Commented: (NUTCH-368) Message queueing system

2006-09-19 Thread Sami Siren
Andrzej Bialecki (JIRA) wrote: > [ > http://issues.apache.org/jira/browse/NUTCH-368?page=comments#action_12435710 > ] > > Andrzej Bialecki commented on NUTCH-368: > - > >> IMO a place for stuff like this is in hadoop more than nutch and

[Nutch-dev] [jira] Commented: (NUTCH-364) Javascript parser creates some fairly bogus URLs

2006-09-19 Thread Doug Cook (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-364?page=comments#action_12435945 ] Doug Cook commented on NUTCH-364: - I've been looking into this a little bit. I see two problems: (1) The current "two pass" heuristic URL-like string extractor has

[Nutch-dev] [jira] Resolved: (NUTCH-105) Network error during robots.txt fetch causes file to be ignored

2006-09-19 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-105?page=all ] Sami Siren resolved NUTCH-105. -- Resolution: Fixed This is now committed, thanks! > Network error during robots.txt fetch causes file to be ignored > ---

[Nutch-dev] [jira] Commented: (NUTCH-368) Message queueing system

2006-09-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-368?page=comments#action_12435710 ] Andrzej Bialecki commented on NUTCH-368: - > IMO a place for stuff like this is in hadoop more than nutch and i would like > to see this implemented there.