Yes, I am referring to HDFS taking multiple mounts points and automatically
round-robin block allocation across it.
A single file block will only exist on a single disk, but the extra speed you
can get with raid-0 within a block can't be used effectively by almost any
mapper or reducer anyway.
Scott,
Thanks for interesting information. By JBOD, I assume you mean just listing
multiple partition mount points in hadoop config?
Vadim
On Fri, Mar 13, 2009 at 12:48, Scott Carey wrote:
> On 3/13/09 11:56 AM, "Allen Wittenauer" wrote:
>
> On 3/13/09 11:25 AM, "Vadim Zaliva" wrote:
>
>>>
On 3/13/09 11:56 AM, "Allen Wittenauer" wrote:
On 3/13/09 11:25 AM, "Vadim Zaliva" wrote:
>>When you stripe you automatically make every disk in the system have the
>> same speed as the slowest disk. In our experiences, systems are more likely
>> to have a 'slow' disk than a dead one a
On 3/13/09 11:25 AM, "Vadim Zaliva" wrote:
>> When you stripe you automatically make every disk in the system have the
>> same speed as the slowest disk. In our experiences, systems are more likely
>> to have a 'slow' disk than a dead one and detecting that is really
>> really hard. I
> When you stripe you automatically make every disk in the system have the
> same speed as the slowest disk. In our experiences, systems are more likely
> to have a 'slow' disk than a dead one and detecting that is really
> really hard. In a distributed system, that multiplier effect can h
On 3/12/09 7:13 PM, "Vadim Zaliva" wrote:
> The machines have 4 disk each, stripped.
> However I do not see disks being a bottleneck.
When you stripe you automatically make every disk in the system have the
same speed as the slowest disk. In our experiences, systems are more likely
to ha
For a simple test, set the replication on your entire cluster to 6 hadoop
dfs -setRep -R -w 6 /
This will triple your disk usage and probably take a while, but then you are
guaranteed that all data is local.
You can also get a rough idea from the Job Counters, 'Data-local map tasks'
total field
The machines have 4 disk each, stripped.
However I do not see disks being a bottleneck. Monitoring system activity
shows that CPU is utilized 2-70%, disk usage is moderate, while network
activity seems to be quite high. In this particular cluster we have 6 machines
and replication factor is 2. I wa
Xeon vs. Opteron is likely not going to be a major factor. More important
than this is the number of disks you have per machine. Task performance is
proportional to both the number of CPUs and the number of disks.
You are probably using way too many tasks. Adding more tasks/node isn't
necessarily
Hi!
I have a question about fine-tunining hadoop performance on 8-core machines.
I have 2 machines I am testing. One is 8-core Xeon and another is 8-core
Opteron. 16Gb RAM each. They both run mapreduce and dfs nodes. Currently
I've set up each of them to run 32 map and 8 reduce tasks.
Also, HADOOP
10 matches
Mail list logo