RE: Shuffle phase

2013-05-22 Thread John Lilley
.org] Sent: Tuesday, May 21, 2013 12:59 PM To: user@hadoop.apache.org Subject: Re: Shuffle phase replication factor The map output doesn't get written to HDFS. The map task writes its output to its local disk, the reduce tasks will pull the data through HTTP for further processing. Am 21.05

Re: Shuffle phase replication factor

2013-05-21 Thread Kai Voigt
The map output doesn't get written to HDFS. The map task writes its output to its local disk, the reduce tasks will pull the data through HTTP for further processing. Am 21.05.2013 um 19:57 schrieb John Lilley : > When MapReduce enters “shuffle” to partition the tuples, I am assuming that > it

Re: Shuffle phase replication factor

2013-05-21 Thread Ian Wrigley
Intermediate data is written to local disk, not to HDFS. Ian. On May 21, 2013, at 1:57 PM, John Lilley wrote: > When MapReduce enters “shuffle” to partition the tuples, I am assuming that > it writes intermediate data to HDFS. What replication factor is used for > those temporary files? > jo

RE: Shuffle phase replication factor

2013-05-22 Thread John Lilley
[mailto:k...@123.org] Sent: Tuesday, May 21, 2013 12:59 PM To: user@hadoop.apache.org Subject: Re: Shuffle phase replication factor The map output doesn't get written to HDFS. The map task writes its output to its local disk, the reduce tasks will pull the data through HTTP for further processing

Re: Shuffle phase replication factor

2013-05-22 Thread Shahab Yunus
code. > > john > > ** ** > > *From:* Kai Voigt [mailto:k...@123.org] > *Sent:* Tuesday, May 21, 2013 12:59 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Shuffle phase replication factor > > ** ** > > The map output doesn't get written to HDFS. The map tas

RE: Shuffle phase replication factor

2013-05-22 Thread John Lilley
l simply robust enough to allow a server-side to disconnect at any time to free up slots and the client-side will retry the request? Thanks john From: Shahab Yunus [mailto:shahab.yu...@gmail.com] Sent: Wednesday, May 22, 2013 8:38 AM To: user@hadoop.apache.org Subject: Re: Shuffle phase replica

Re: Shuffle phase replication factor

2013-05-22 Thread Rahul Bhattacharjee
ide will retry the request? > > Thanks > > john > > ** ** > > *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com] > *Sent:* Wednesday, May 22, 2013 8:38 AM > > *To:* user@hadoop.apache.org > *Subject:* Re: Shuffle phase replication factor > >

RE: Shuffle phase replication factor

2013-05-22 Thread John Lilley
pending/failing connection attempts that exceed the limit? Thanks! john From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] Sent: Wednesday, May 22, 2013 8:52 AM To: user@hadoop.apache.org Subject: Re: Shuffle phase replication factor There are properties/configuration to control the no. of

Re: Shuffle phase replication factor

2013-05-22 Thread Kun Ling
the pending/failing connection attempts that exceed the > limit? > > Thanks! > > john > > ** ** > > *From:* Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] > *Sent:* Wednesday, May 22, 2013 8:52 AM > > *To:* user@hadoop.apache.org > *Subje

RE: Shuffle phase replication factor

2013-05-23 Thread John Lilley
? Thanks, John From: erlv5...@gmail.com [mailto:erlv5...@gmail.com] On Behalf Of Kun Ling Sent: Wednesday, May 22, 2013 7:50 PM To: user Subject: Re: Shuffle phase replication factor Hi John, 1. for the number of simultaneous connection limitations. You can configure this using the

Re: Shuffle phase replication factor

2013-05-23 Thread Sandy Ryza
ask? Or something more persistent in MapReduce? > > > Thanks, > > John > > ** ** > > *From:* erlv5...@gmail.com [mailto:erlv5...@gmail.com] *On Behalf Of *Kun > Ling > *Sent:* Wednesday, May 22, 2013 7:50 PM > *To:* user > > *Subject:* Re: Shuffle phase