Poor IO performance on a 10 node cluster.

2011-05-30 Thread Gyuribácsi
Hi, I have a 10 node cluster (IBM blade servers, 48GB RAM, 2x500GB Disk, 16 HT cores). I've uploaded 10 files to HDFS. Each file is 10GB. I used the streaming jar with 'wc -l' as mapper and 'cat' as reducer. I use 64MB block size and the default replication (3). The wc on the 100 GB took

Eclipse Hadoop Plugin Error creating New Hadoop location ....

2011-05-30 Thread Praveen Sripati
Hi, I am trying to run Hadoop from Eclipse using the Eclipse Hadoop Plugin and stuck with the following problem. First copied the hadoop-0.21.0-eclipse-plugin.jar to the Eclipse Plugin folder, started eclipse and switched to the Map/Reduce perspective. In the Map/Reduce Locations View when I try

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread praveen.peddi
That's because you are assuming that processing time for mappers and reducers to be 0? Counting words is processor intensive and it's likely that lot of those 220 seconds are spent in processing, not just reading the file. On May 30, 2011, at 8:28 AM, ext Gyuribácsi bogyo...@gmail.com wrote:

Hadoop Hackathon, Berlin, 9 June 2011

2011-05-30 Thread Oliver B. Fischer
Dear all, after the Berlin Buzzwords Conference (http://www.berlinbuzzwords.de/), we will have a one day Hadoop Hackathon on 9 June, 2011 in Berlin. If you attend Berlin Buzzwords and if you are interested in Hadoop, please have a look at http://berlinbuzzwords.de/wiki/hadoop-hackathon Bye

Hadoop Jar Files

2011-05-30 Thread Praveen Sripati
Hi, I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0 files. In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar, hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there. But in the hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files are

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread Brian Bockelman
On May 30, 2011, at 7:27 AM, Gyuribácsi wrote: Hi, I have a 10 node cluster (IBM blade servers, 48GB RAM, 2x500GB Disk, 16 HT cores). I've uploaded 10 files to HDFS. Each file is 10GB. I used the streaming jar with 'wc -l' as mapper and 'cat' as reducer. I use 64MB block size and

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread Boris Aleksandrovsky
Ljddfjfjfififfifjftjiifjfjjjffkxbznzsjxodiewisshsudddudsjidhddueiweefiuftttoitfiirriifoiffkllddiririiriioerorooiieirrioeekroooeoooirjjfdijdkkduddjudiiehs On May 30, 2011 5:28 AM, Gyuribácsi bogyo...@gmail.com wrote: Hi, I have a 10 node cluster (IBM blade servers, 48GB RAM, 2x500GB Disk,

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread James Seigel
Not sure that will help ;) Sent from my mobile. Please excuse the typos. On 2011-05-30, at 9:23 AM, Boris Aleksandrovsky balek...@gmail.com wrote:

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread Harsh J
Psst. The cats speak in their own language ;-) On Mon, May 30, 2011 at 10:31 PM, James Seigel ja...@tynt.com wrote: Not sure that will help ;) Sent from my mobile. Please excuse the typos. On 2011-05-30, at 9:23 AM, Boris Aleksandrovsky balek...@gmail.com wrote:

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread Jason Rutherglen
That's a small town in Iceland. On Mon, May 30, 2011 at 10:01 AM, James Seigel ja...@tynt.com wrote: Not sure that will help ;) Sent from my mobile. Please excuse the typos. On 2011-05-30, at 9:23 AM, Boris Aleksandrovsky balek...@gmail.com wrote:

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread He Chen
Hi Gyuribácsi I would suggest you divide MapReduce program execution time into 3 parts a) Map Stage In this stage, wc splits input data and generates map tasks. Each map task process one block (in default, you can change it in FileInputFormat.java). As Brian said, if you have larger blocks size,

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread jagaran das
Your Font block size got increased dynamically , check in core-site :) :) - Jagaran From: He Chen airb...@gmail.com To: common-user@hadoop.apache.org Sent: Mon, 30 May, 2011 11:39:35 AM Subject: Re: Poor IO performance on a 10 node cluster. Hi Gyuribácsi I

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread Lance Norskog
I'm sorry, but she's with me now. On Mon, May 30, 2011 at 8:22 AM, Boris Aleksandrovsky balek...@gmail.com wrote: Ljddfjfjfififfifjftjiifjfjjjffkxbznzsjxodiewisshsudddudsjidhddueiweefiuftttoitfiirriifoiffkllddiririiriioerorooiieirrioeekroooeoooirjjfdijdkkduddjudiiehs On May 30, 2011 5:28

Re: Is it safe to manually copy BLK files?

2011-05-30 Thread Joey Echeverria
The short answer is no. If you want to decommission a datanode, the safest way is to put hostnames of the datanodes you want to shutdown into a file on the namenode. Next, set the dfs.hosts.exclude parameter to point to the file. Finally, run hadoop dfsadmin -refreshNodes. As an FYI, I think you