Hello Sandeep,

As long as you have enabled short-circuit read as per the documentation [1], I 
expect any Hadoop process will take advantage of it while reading a local 
replica.  However, short-circuit read will not completely eliminate TCP 
connection activity to the DataNode.  There will still be a TCP connection from 
the client to the DataNode to perform a handshake and establish the Unix domain 
socket.  This is a very small payload though compared to the transfer of block 
data over the Unix domain socket.

[1] 
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html

--Chris Nauroth

From: sandeep das <yarnhad...@gmail.com<mailto:yarnhad...@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Wednesday, November 18, 2015 at 10:44 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Yarn application reading from Data node using short-circuit.

Hi,

I was going through some benchmarking and realized that there are lots of TCP 
connections are initiated while running my PIG jobs over YARN(MR2). These TCP 
connections are related to data node. Although short-circuit is enabled in my 
data nodes but still a lot TCP connections are being created.

I wanted to check that how can we enable YARN applicationMaster to read data 
from Data node using short-circuits i.e. unix domain sockets. I believe that 
will improve the performance of our jobs.


Can someone please help to understand how can I make sure that MR2 jobs created 
by PIG scripts are reading data from Data node using short-circuit instead of 
TCP connections?


Regards,
Sandeep

Reply via email to