RE: Halloween Joke at Google

2005-11-04 Thread Fuad Efendi
Andrzej, I am trying to restore human-oriented web-site tree using anchor text! As a samle, page with anchor text "Motherboards" has many linked pages with concrete motherboards, etc; we can group information in many cases. Anchor text is the true subject of the page, but within same domain. BTW

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > Here you go. local filesystem and a single job tracker on another > > machine. When the tasktracker and jobtracker are on the same box there > > isn't a problem. When they are on different machines it runs into > > issue

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Here you go. local filesystem and a single job tracker on another machine. When the tasktracker and jobtracker are on the same box there isn't a problem. When they are on different machines it runs into issues. This is using mapred.local.dir on the local machine (not sharedd be

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 22:57 -0500, Rod Taylor wrote: > On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote: > > Rod Taylor wrote: > > > I tried running one datanode per machine connecting back to the same SAN > > > but it seemed pretty clunky. A crash of any datanode would take down > > > the en

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > I tried running one datanode per machine connecting back to the same SAN > > but it seemed pretty clunky. A crash of any datanode would take down > > the entire system (no data replication since it's a common data-store

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: I tried running one datanode per machine connecting back to the same SAN but it seemed pretty clunky. A crash of any datanode would take down the entire system (no data replication since it's a common data-store in the end). Reducing it to a single datanode did not have this im

[jira] Created: (NUTCH-124) protocol-httpclient does not follow redirects when fetching robots.txt

2005-11-04 Thread Doug Cutting (JIRA)
protocol-httpclient does not follow redirects when fetching robots.txt -- Key: NUTCH-124 URL: http://issues.apache.org/jira/browse/NUTCH-124 Project: Nutch Type: Bug Components: fetcher Versi

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 19:15 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > There is only a single datanode and there are 20 hosts. > > That's a lot of load on one datanode. I typically run a datanode on > every host, accessing the local drives on that host. I tried running one datanode per

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: There is only a single datanode and there are 20 hosts. That's a lot of load on one datanode. I typically run a datanode on every host, accessing the local drives on that host. Doug

[jira] Updated: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS

2005-11-04 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ] Paul Baclace updated NUTCH-116: --- Attachment: comments_msgs_and_local_renames_during_TestNDFS.patch > TestNDFS a JUnit test specifically for NDFS > --- > >

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > Every segment that I fetch seems to be missing a part when stored on the > > filesystem. The stranger thing is it is always the same part (very > > reproducible). > > This sounds strange. Are the datanode errors always

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > Every segment that I fetch seems to be missing a part when stored on the > > filesystem. The stranger thing is it is always the same part (very > > reproducible). > > This sounds strange. Are the datanode errors always

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Rod Taylor
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote: > Rod Taylor wrote: > > Every segment that I fetch seems to be missing a part when stored on the > > filesystem. The stranger thing is it is always the same part (very > > reproducible). > > This sounds strange. Are the datanode errors always

[jira] Updated: (NUTCH-116) TestNDFS a JUnit test specifically for NDFS

2005-11-04 Thread Paul Baclace (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ] Paul Baclace updated NUTCH-116: --- Attachment: required_by_TestNDFS_v3.patch I found and fixed a problem with a standalone DataNode process exiting too early (this was not detected by the current u

Re: mapred questions

2005-11-04 Thread Doug Cutting
Ken van Mulder wrote: First is that the fetcher slows down over time and continues to use more and more memory as it goes (which I think is eventually hanging the process). What parser plugins do you have enabled? These are usually the culprit. Try using 'kill -QUIT' to see what various thr

Re: mapred bug -- bad part calculation?

2005-11-04 Thread Doug Cutting
Rod Taylor wrote: Every segment that I fetch seems to be missing a part when stored on the filesystem. The stranger thing is it is always the same part (very reproducible). This sounds strange. Are the datanode errors always on the same host? How many hosts are you running this on? Doug

[jira] Created: (NUTCH-123) Cache.jsp some times generate NullPointerException

2005-11-04 Thread JIRA
Cache.jsp some times generate NullPointerException -- Key: NUTCH-123 URL: http://issues.apache.org/jira/browse/NUTCH-123 Project: Nutch Type: Bug Components: web gui Environment: All systems Reporter: Lutischán

Re: nutch cluster questions.

2005-11-04 Thread Stefan Groschupf
Please do not cross post questions! Checkout the map reduce branche in the svn. The map reduce will do all what you are looking for and it works well for me. Stefan Am 04.11.2005 um 14:32 schrieb Arsen Popovyan: At the moment we are using nutch-nightly (nutch-2005-07-20). We are not plea

nutch cluster questions.

2005-11-04 Thread Arsen Popovyan
At the moment we are using nutch-nightly (nutch-2005-07-20). We are not pleased with productivity of fetching, parsing, indexing, analyzing and scoring... information. Now our spider retrieves approx 25,000 new results per day. All processes now running on one computer (machine) and we are using

Re: Hiring a Nutch Developer

2005-11-04 Thread Nathan Gwilliam
I actually have several projects, but let's start with the first. We need to create a search engine that crawls about 20 adoption-related sites that we are affiliated with, such as: adoption.com fosterparenting.com crisispregnancy.com adoption.org adopting.org 123adoption.com (which includes a

Re: Hiring a Nutch Developer

2005-11-04 Thread Arun Kumar Sharma
Hi Nathan, Please send me more details Nathan Gwilliam <[EMAIL PROTECTED]> wrote: We're looking for a Nutch developer we can hire to build a nutch search engine for our sites. Are any of you doing side projects? Nathan Gwilliam Adoption.com & Families.com [EMAIL PROTECTED] >> >> WITH WA

Hiring a Nutch Developer

2005-11-04 Thread Nathan Gwilliam
We're looking for a Nutch developer we can hire to build a nutch search engine for our sites. Are any of you doing side projects? Nathan Gwilliam Adoption.com & Families.com [EMAIL PROTECTED]