ng such disk contention in Hadoop? Is HDFS smart enough to
> serialize major disk access?
>
>
>
>
>
> From: Michael Segel [mailto:michael_se...@hotmail.com]
> Sent: Wednesday, October 24, 2012 6:51 PM
> To: user@hadoop.apache.org
> Subject: Re: How do map tasks get
serialize major disk access?
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Wednesday, October 24, 2012 6:51 PM
To: user@hadoop.apache.org
Subject: Re: How do map tasks get assigned efficiently?
So...
Data locality only works when you actually have data on the cluster it
So...
Data locality only works when you actually have data on the cluster itself.
Otherwise how can the data be local.
Assuming 3X replication, and you're not doing a custom split and your input
file is splittable...
You will split along the block delineation. So if your input file has 5
b
Even after reading O'reillys book on hadoop I don't feel like I have a clear
vision of how the map tasks get assigned.
They depend on splits right?
But I have 3 jobs running. And splits will come from various sources: HDFS,
S3, and slow HTTP sources.
So I've got some concern as to how t