Re: tuning performance

2009-03-16 Thread Scott Carey
Yes, I am referring to HDFS taking multiple mounts points and automatically round-robin block allocation across it. A single file block will only exist on a single disk, but the extra speed you can get with raid-0 within a block can't be used effectively by almost any mapper or reducer anyway.

Re: tuning performance

2009-03-14 Thread Vadim Zaliva
Scott, Thanks for interesting information. By JBOD, I assume you mean just listing multiple partition mount points in hadoop config? Vadim On Fri, Mar 13, 2009 at 12:48, Scott Carey wrote: > On 3/13/09 11:56 AM, "Allen Wittenauer" wrote: > > On 3/13/09 11:25 AM, "Vadim Zaliva" wrote: > >>>  

Re: tuning performance

2009-03-13 Thread Scott Carey
On 3/13/09 11:56 AM, "Allen Wittenauer" wrote: On 3/13/09 11:25 AM, "Vadim Zaliva" wrote: >>When you stripe you automatically make every disk in the system have the >> same speed as the slowest disk. In our experiences, systems are more likely >> to have a 'slow' disk than a dead one a

Re: tuning performance

2009-03-13 Thread Allen Wittenauer
On 3/13/09 11:25 AM, "Vadim Zaliva" wrote: >>    When you stripe you automatically make every disk in the system have the >> same speed as the slowest disk.  In our experiences, systems are more likely >> to have a 'slow' disk than a dead one and detecting that is really >> really hard.  I

Re: tuning performance

2009-03-13 Thread Vadim Zaliva
>    When you stripe you automatically make every disk in the system have the > same speed as the slowest disk.  In our experiences, systems are more likely > to have a 'slow' disk than a dead one and detecting that is really > really hard.  In a distributed system, that multiplier effect can h

Re: tuning performance

2009-03-12 Thread Allen Wittenauer
On 3/12/09 7:13 PM, "Vadim Zaliva" wrote: > The machines have 4 disk each, stripped. > However I do not see disks being a bottleneck. When you stripe you automatically make every disk in the system have the same speed as the slowest disk. In our experiences, systems are more likely to ha

Re: tuning performance

2009-03-12 Thread jason hadoop
For a simple test, set the replication on your entire cluster to 6 hadoop dfs -setRep -R -w 6 / This will triple your disk usage and probably take a while, but then you are guaranteed that all data is local. You can also get a rough idea from the Job Counters, 'Data-local map tasks' total field

Re: tuning performance

2009-03-12 Thread Vadim Zaliva
The machines have 4 disk each, stripped. However I do not see disks being a bottleneck. Monitoring system activity shows that CPU is utilized 2-70%, disk usage is moderate, while network activity seems to be quite high. In this particular cluster we have 6 machines and replication factor is 2. I wa

Re: tuning performance

2009-03-12 Thread Aaron Kimball
Xeon vs. Opteron is likely not going to be a major factor. More important than this is the number of disks you have per machine. Task performance is proportional to both the number of CPUs and the number of disks. You are probably using way too many tasks. Adding more tasks/node isn't necessarily

tuning performance

2009-03-11 Thread Vadim Zaliva
Hi! I have a question about fine-tunining hadoop performance on 8-core machines. I have 2 machines I am testing. One is 8-core Xeon and another is 8-core Opteron. 16Gb RAM each. They both run mapreduce and dfs nodes. Currently I've set up each of them to run 32 map and 8 reduce tasks. Also, HADOOP