Re: Too many fetch failures AND Shuffle error

2008-07-19 Thread brainstorm
Got this problem too, and fixed it just 5 minutes ago... there were wrong IP entries on the nodes referring to the frontend, it was slowing down the reduce process *a lot*... in numbers: Wrong hosts file using wordcount example: 3hrs, 45mins, 41sec (4 minutes map, the rest, reduce) Right hosts

Re: Too many fetch failures AND Shuffle error

2008-06-30 Thread Tarandeep Singh
I am getting this error as well. As Sayali mentioned in his mail, I updated the /etc/hosts file with the slave machines IP addresses, but I am still getting this error. Amar, which is the url that you were talking about in your mail - There will be a URL associated with a map that the reducer try

Re: Too many fetch failures AND Shuffle error

2008-06-30 Thread Amar Kamat
Tarandeep Singh wrote: I am getting this error as well. As Sayali mentioned in his mail, I updated the /etc/hosts file with the slave machines IP addresses, but I am still getting this error. Amar, which is the url that you were talking about in your mail - There will be a URL associated with a

Re: Too many fetch failures AND Shuffle error

2008-06-20 Thread Amar Kamat
Yeah. With 2 nodes the reducers will go up to 16% because the reducer are able to fetch maps from the same machine (locally) but fails to copy it from the remote machine. A common reason in such cases is the *restricted machine access* (firewall etc). The web-server on a machine/node hosts map

Re: Too many fetch failures AND Shuffle error

2008-06-20 Thread Sayali Kulkarni
Hi! My problem of Too many fetch failures as well as shuffle error was resolved when I added the list of all the slave machines in the /etc/hosts file. Earlier on every slave I just had the entries of the master and own machine in the /etc/hosts file. But now I have updated all the /etc/hosts

Too many fetch failures AND Shuffle error

2008-06-19 Thread Sayali Kulkarni
Hello, I have been getting Too many fetch failures (in the map operation) and shuffle error (in the reduce operation) and am unable to complete any job on the cluster. I have 5 slaves in the cluster. So I have the following values in the hadoop-site.xml file: namemapred.map.tasks/name

Re: Too many fetch failures AND Shuffle error

2008-06-19 Thread Amar Kamat
Sayali Kulkarni wrote: Hello, I have been getting Too many fetch failures (in the map operation) and shuffle error (in the reduce operation) Can you post the reducer logs. How many nodes are there in the cluster? Are you seeing this for all the maps and reducers? Are the reducers