[Nutch-dev] ignore eclipse .project and .classpath

2006-02-07 Thread Chris Mattmann
Hi Folks, Just wondering if someone could add to the svn:ignore property for Nutch the files: .classpath .project I happen to use eclipse to do Nutch development and always ignore these files in my other eclipse projects as well. Cheers, Chris __

[Nutch-dev] No node available for block errors

2006-02-07 Thread Chris Schneider
Gang, At the risk of incurring cross-posting ire (and based on a suggestion from Stefan), I'm posting this to nutch-dev as well: We're now running into "No node available for block " errors, which are killing our MapReduce-based crawling jobs. I did some digging through our logs after one of

[Nutch-dev] [jira] Updated: (NUTCH-196) lib-xml and lib-log4j plugins

2006-02-07 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-196?page=all ] Jerome Charron updated NUTCH-196: - Attachment: NUTCH-196.lib-log4j.patch My two cents with this patch that: * provides a lib-log4j plugin (base on log4j 1.2.11) * remove log4j jars from pars

[Nutch-dev] [jira] Closed: (NUTCH-149) outlinks not shown properly in cached.jsp

2006-02-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-149?page=all ] Chris A. Mattmann closed NUTCH-149: --- Closed at request of reporter: not a bug > outlinks not shown properly in cached.jsp > - > > Key: NUTCH-

[Nutch-dev] [jira] Resolved: (NUTCH-149) outlinks not shown properly in cached.jsp

2006-02-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-149?page=all ] Chris A. Mattmann resolved NUTCH-149: - Resolution: Invalid Closed at request of the reporter: not a bug. > outlinks not shown properly in cached.jsp > -

[Nutch-dev] [jira] Commented: (NUTCH-158) Process Sitemap data in text, rss or xml format as well as OAI-PMH

2006-02-07 Thread raghavendra prabhu (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-158?page=comments#action_12365483 ] raghavendra prabhu commented on NUTCH-158: -- This is an important thing We should automaticall be able to insert the links parsed out of site map into webdb But curr

[Nutch-dev] Re: tool to mount nutch filesystem

2006-02-07 Thread John X
Hi, Mike, On Tue, Feb 07, 2006 at 10:18:11AM -0800, Michael Cafarella wrote: > > John, > > This is a pretty awesome idea. Do you have any performance > numbers or experience with it you can share? No number yet. Just created it for my immediate use of browsing and moving around files. It u

[Nutch-dev] Re: [OT] Mailing lists

2006-02-07 Thread Doug Cutting
Andrew McNabb wrote: Now that Hadoop is branched off, and since I'm more interested in Hadoop than in web indexing, I was just curious whether or not there are plans to branch off a hadoop-dev mailing list. Anything to reduce email. :) http://lucene.apache.org/hadoop/mailing_lists.html Doug

[Nutch-dev] Re: [jira] Created: (NUTCH-206) search server throws InstantiationException

2006-02-07 Thread Jimmy Forrester
unsubscribe

[Nutch-dev] [jira] Commented: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2006-02-07 Thread Rod Taylor (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-207?page=comments#action_12365462 ] Rod Taylor commented on NUTCH-207: -- Code was by Radu Mateescu with additional kibitzing by myself. > Bandwidth target for fetcher rather than a thread count >

[Nutch-dev] [jira] Created: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2006-02-07 Thread Rod Taylor (JIRA)
Bandwidth target for fetcher rather than a thread count --- Key: NUTCH-207 URL: http://issues.apache.org/jira/browse/NUTCH-207 Project: Nutch Type: New Feature Components: fetcher Versions: 0.8-dev R

[Nutch-dev] [jira] Updated: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2006-02-07 Thread Rod Taylor (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-207?page=all ] Rod Taylor updated NUTCH-207: - Attachment: ratelimit.patch > Bandwidth target for fetcher rather than a thread count > --- > > Key: NUTCH

[Nutch-dev] [jira] Commented: (NUTCH-205) Wrong 'fetch date' for non available pages

2006-02-07 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-205?page=comments#action_12365434 ] Andrzej Bialecki commented on NUTCH-205: - This is a design choice, not a bug. The errors you see are due to improper configuration - some threads cannot access the hos

[Nutch-dev] Re: [jira] Created: (NUTCH-206) search server throws InstantiationException

2006-02-07 Thread Dan Pothier
unsubscribe

[Nutch-dev] [jira] Created: (NUTCH-206) search server throws InstantiationException

2006-02-07 Thread jimmy (JIRA)
search server throws InstantiationException --- Key: NUTCH-206 URL: http://issues.apache.org/jira/browse/NUTCH-206 Project: Nutch Type: Bug Components: searcher Versions: 0.8-dev Environment: windows 2003 cygwin

[Nutch-dev] [OT] Mailing lists

2006-02-07 Thread Andrew McNabb
Now that Hadoop is branched off, and since I'm more interested in Hadoop than in web indexing, I was just curious whether or not there are plans to branch off a hadoop-dev mailing list. Anything to reduce email. :) -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1

[Nutch-dev] [jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

2006-02-07 Thread Mike Cafarella (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365458 ] Mike Cafarella commented on NUTCH-193: -- It should be noted that the name "Nutch" also comes from one of Doug's children. They seem to have a proud future in advertising

[Nutch-dev] [jira] Commented: (NUTCH-205) Wrong 'fetch date' for non available pages

2006-02-07 Thread M.Oliver Scheele (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-205?page=comments#action_12365446 ] M.Oliver Scheele commented on NUTCH-205: Thanks for comment. I'm using the standard properties in my configuration (which shouldn't be improper by default;)): fetcher.

[Nutch-dev] Re: tool to mount nutch filesystem

2006-02-07 Thread Michael Cafarella
John, This is a pretty awesome idea. Do you have any performance numbers or experience with it you can share? --Mike On Thu, 2006-02-02 at 23:19, John X wrote: > On Sat, Jan 21, 2006 at 09:23:01AM -0800, John X wrote: > > Hi, Sami, > > > > On Sat, Jan 21, 2006 at 05:32:37PM +0200, Sami

[Nutch-dev] Re: Some bugs I'm trying to characterize....

2006-02-07 Thread Michael Cafarella
Hi Bryan, On Thu, 2006-02-02 at 12:06, Bryan A. Pendleton wrote: > > 1) If you fill up the space of a datanode, it appears to fail with the wrong > exception and reload. This, combined with the currently simple > block-allocation method (random), means that one "full" node can cause a big > dr

[Nutch-dev] [jira] Commented: (NUTCH-192) meta data support for CrawlDatum

2006-02-07 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12365450 ] Doug Cutting commented on NUTCH-192: Sorry, I misspoke and overstated things too. There are problems, but not with MapWritable, rather with WritableName: this refers to so

[Nutch-dev] [jira] Created: (NUTCH-205) Wrong 'fetch date' for non available pages

2006-02-07 Thread M.Oliver Scheele (JIRA)
Wrong 'fetch date' for non available pages -- Key: NUTCH-205 URL: http://issues.apache.org/jira/browse/NUTCH-205 Project: Nutch Type: Bug Components: fetcher Versions: 0.7, 0.7.1 Environment: JDK 1.4.2_09 / Windows

[Nutch-dev] [jira] Commented: (NUTCH-192) meta data support for CrawlDatum

2006-02-07 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12365413 ] Andrzej Bialecki commented on NUTCH-192: - I have a different opinion on this (I think MapWritable is a sufficiently general-purpose data structure that would be useful

[Nutch-dev] [jira] Updated: (NUTCH-81) Webapp only works when deployed in root

2006-02-07 Thread Michael Nebel (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-81?page=all ] Michael Nebel updated NUTCH-81: --- Attachment: fix-faq-url.diff with the move from sf to apache the old faq isn' accessable any more. This patch changes the link from http://www.nutch.org/faq.html t