RE: Shuffle phase replication factor

John Lilley Wed, 22 May 2013 07:57:37 -0700

Ummmm, is that also the limit for the number of simultaneous connections?  In 
general, one does not need a 1:1 map between threads and connections.
If this is the connection limit, does it imply  that the client or server side 
aggressively disconnects after a transfer?
What happens to the pending/failing connection attempts that exceed the limit?
Thanks!
john

From: Rahul Bhattacharjee [mailto:rahul.rec....@gmail.com]
Sent: Wednesday, May 22, 2013 8:52 AM
To: user@hadoop.apache.org
Subject: Re: Shuffle phase replication factor

There are properties/configuration to control the no. of copying threads for 
copy.
tasktracker.http.threads=40
Thanks,
Rahul

On Wed, May 22, 2013 at 8:16 PM, John Lilley 
<john.lil...@redpoint.net<mailto:john.lil...@redpoint.net>> wrote:
This brings up another nagging question I’ve had for some time.  Between HDFS 
and shuffle, there seems to be the potential for “every node connecting to 
every other node” via TCP.  Are there explicit mechanisms in place to manage or 
limit simultaneous connections?  Is the protocol simply robust enough to allow 
a server-side to disconnect at any time to free up slots and the client-side 
will retry the request?
Thanks
john

From: Shahab Yunus 
[mailto:shahab.yu...@gmail.com<mailto:shahab.yu...@gmail.com>]
Sent: Wednesday, May 22, 2013 8:38 AM

To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Shuffle phase replication factor

As mentioned by Bertrand, Hadoop, The Definitive Guide, is well... really 
definitive :) place to start. It is pretty thorough for starts and once you are 
gone through it, the code will start making more sense too.

Regards,
Shahab

On Wed, May 22, 2013 at 10:33 AM, John Lilley 
<john.lil...@redpoint.net<mailto:john.lil...@redpoint.net>> wrote:
Oh I see.  Does this mean there is another service and TCP listen port for this 
purpose?
Thanks for your indulgence… I would really like to read more about this without 
bothering the group but not sure where to start to learn these internals other 
than the code.
john

From: Kai Voigt [mailto:k...@123.org<mailto:k...@123.org>]
Sent: Tuesday, May 21, 2013 12:59 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Shuffle phase replication factor

The map output doesn't get written to HDFS. The map task writes its output to 
its local disk, the reduce tasks will pull the data through HTTP for further 
processing.

Am 21.05.2013 um 19:57 schrieb John Lilley 
<john.lil...@redpoint.net<mailto:john.lil...@redpoint.net>>:

When MapReduce enters “shuffle” to partition the tuples, I am assuming that it 
writes intermediate data to HDFS.  What replication factor is used for those 
temporary files?
john

--
Kai Voigt
k...@123.org<mailto:k...@123.org>

RE: Shuffle phase replication factor

Reply via email to