Hi Gopal,
After setting the fds, somaxconn and DNS UDP packet loss configuration as you
provided, the result have no change. Please help me check whether I set wrong
configuration. Thanks very much.
>For FDs, we set the related value to 65536 in /etc/security/limits.conf
* - nofile 65536
* - nproc 65536
>For somaxconn, we set it to 16384 in /etc/sysctl.conf. We found the somaxconn
>value of shuffle port (15551) is still 50 not 16384 as below. And the shuffle
>stage is still slow.
State Recv-Q Send-Q Local Address:Port
Peer Address:Port
LISTEN 0 16384 *:15003
*:*
LISTEN 0 16384 *:50075
*:*
LISTEN 0 128 *:15004
*:*
LISTEN 0 50 *:15551
*:*
> For DNS UDP packet loss, we install nscd in all nodes.
About fds, whether the 65536 is suitable or not? And For somaxconn, why the
somaxconn value of the shuffle port (15551) is 50 not 16384? Thanks for your
help.
Regrads,
Ke
-----Original Message-----
From: Gopal Vijayaraghavan [mailto:[email protected]]
Sent: Thursday, November 23, 2017 4:03 AM
To: [email protected]
Subject: Re: Hive +Tez+LLAP does not have obvious performance improvement than
HIVE + Tez
Hi,
> With these configurations, the cpu utilization of llap is very low.
Low CPU usage has been observed with LLAP due to RPC starvation.
I'm going to assume that the build you're testing is a raw Hadoop 2.7.3 with no
additional patches?
Hadoop-RPC is single-threaded & has a single mutex lock in the 2.7.x branch,
which is fixed in 2.8.
Can you confirm if you have backported either
https://issues.apache.org/jira/browse/HADOOP-11772
or
https://issues.apache.org/jira/browse/HADOOP-12475
to your Hadoop implementation?
The secondary IO starvation comes from a series of HDFS performance problems
which are easily worked around. Here's the root-cause
https://issues.apache.org/jira/browse/HDFS-9146
The current workaround is to make sure that the HDFS & Hive user has a limits.d
entry to allow it to open a large number of sockets (which are fds).
https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/templates/hdfs.conf.j2
+
https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HIVE/2.1.0.3.0/package/templates/hive.conf.j2
This increases the FD limit for Hive & HDFS users (YARN also needs it, in case
of Tez due to shuffle being served out of the NodeManager).
After increasing the FDs, LLAP is fast enough to run through 128 socket
openings within the Linux TCP MSL (60 seconds)
The RHEL default for somaxconn is 128, which causes 120s timeouts when HDFS
silently loses packets & forces the packet timeout to expire before retrying.
To know whether the problem has already happened, check the SNMP traps
# netstat -s | grep "overflow"
<n> times the listen queue of a socket overflowed
Or to know when the SYN flood issue has been worked around by the kernel with
cookies.
# dmesg | grep cookies
After this, you get hit by the DNS starvation within LLAP where the DNS server
traffic (port 53 UDP) gets lost (or the DNS server bans an IP due to massive
number of packets).
This is a JDK internal detail, which ignores the DNS actual TTL values, which
can be worked around by running nscd or sssd on the host to cache dns lookups
without generating UDP network packets constantly.
If you need more detail on any of these, ask away. I've had to report and get
backports for several of these issues into HDP (mostly because perf issues are
not generally community backports & whatever has good workarounds remain off
the priority lists).
Cheers,
Gopal