subject:"Assignment of data splits to mappers"

RE: Assignment of data splits to mappers

2013-07-01 Thread John Lilley

s a connection. Cheers, John From: Bertrand Dechoux [mailto:decho...@gmail.com] Sent: Tuesday, June 18, 2013 3:54 PM To: user@hadoop.apache.org Subject: Re: Assignment of data splits to mappers 1) The tradeoff is between reducing the overhead of distributed computing and reducing the cost of failu

Re: Assignment of data splits to mappers

2013-06-18 Thread Bertrand Dechoux

mple > to prevent them from spanning blocks, would that be of benefit? > > john > > ** ** > > *From:* Bertrand Dechoux [mailto:decho...@gmail.com] > *Sent:* Thursday, June 13, 2013 3:37 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Assignment of data spli

RE: Assignment of data splits to mappers

2013-06-14 Thread John Lilley

under most file formats, records *will* span blocks. But if it were simple to prevent them from spanning blocks, would that be of benefit? john From: Bertrand Dechoux [mailto:decho...@gmail.com] Sent: Thursday, June 13, 2013 3:37 PM To: user@hadoop.apache.org Subject: Re: Assignment of data splits

Re: Assignment of data splits to mappers

2013-06-13 Thread Harsh J

Hey John, I don't see the similarity. If you take the case of a normal record file, such as a text file, you read data from the next block. That is, n-1 blocks are "opened" twice, but not read entirely in both attempts. In the link you refer to, a specific block will always be read by all readers

Re: Assignment of data splits to mappers

2013-06-13 Thread Bertrand Dechoux

The first question can be split (no pun intended) into two topics because there is actually two distinct steps. First, the InputFormat partitions the data source into InputSplits. Its implementation will determine the exact logic. Then the scheduler is responsible for ordering where/when the InputS

Assignment of data splits to mappers

2013-06-13 Thread John Lilley

When MR assigns data splits to map tasks, does it assign a set of non-contiguous blocks to one map? The reason I ask is, thinking through the problem, if I were the MR scheduler I would attempt to hand a map task a bunch of blocks that all exist on the same datanode, and then schedule the map t

RE: Assignment of data splits to mappers

Re: Assignment of data splits to mappers

RE: Assignment of data splits to mappers

Re: Assignment of data splits to mappers

Re: Assignment of data splits to mappers

Assignment of data splits to mappers

6 matches

Site Navigation

Mail list logo

Footer information