s a connection.
Cheers,
John
From: Bertrand Dechoux [mailto:decho...@gmail.com]
Sent: Tuesday, June 18, 2013 3:54 PM
To: user@hadoop.apache.org
Subject: Re: Assignment of data splits to mappers
1) The tradeoff is between reducing the overhead of distributed computing and
reducing the cost of failu
mple
> to prevent them from spanning blocks, would that be of benefit?
>
> john
>
> ** **
>
> *From:* Bertrand Dechoux [mailto:decho...@gmail.com]
> *Sent:* Thursday, June 13, 2013 3:37 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Assignment of data spli
under most file
formats, records *will* span blocks. But if it were simple to prevent them
from spanning blocks, would that be of benefit?
john
From: Bertrand Dechoux [mailto:decho...@gmail.com]
Sent: Thursday, June 13, 2013 3:37 PM
To: user@hadoop.apache.org
Subject: Re: Assignment of data splits
Hey John,
I don't see the similarity. If you take the case of a normal record
file, such as a text file, you read data from the next block. That is,
n-1 blocks are "opened" twice, but not read entirely in both attempts.
In the link you refer to, a specific block will always be read by all
readers
The first question can be split (no pun intended) into two topics because
there is actually two distinct steps. First, the InputFormat partitions the
data source into InputSplits. Its implementation will determine the exact
logic. Then the scheduler is responsible for ordering where/when the
InputS
When MR assigns data splits to map tasks, does it assign a set of
non-contiguous blocks to one map? The reason I ask is, thinking through the
problem, if I were the MR scheduler I would attempt to hand a map task a bunch
of blocks that all exist on the same datanode, and then schedule the map t