Josh Elser created HDFS-8069: -------------------------------- Summary: Tracing implementation on DFSInputStream seriously degrades performance Key: HDFS-8069 URL: https://issues.apache.org/jira/browse/HDFS-8069 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.0 Reporter: Josh Elser Priority: Critical
I've been doing some testing of Accumulo with HDFS 2.7.0 and have noticed a serious performance impact when Accumulo registers itself as a SpanReceiver. The context of the test which I noticed the impact is that an Accumulo process reads a series of updates from a write-ahead log. This is just reading a series of Writable objects from a file in HDFS. With tracing enabled, I waited for at least 10 minutes and the server still hadn't read a ~300MB file. Doing a poor-man's inspection via repeated thread dumps, I always see something like the following: {noformat} "replication task 2" daemon prio=10 tid=0x0000000002842800 nid=0x794d runnable [0x00007f6c7b1ec000] java.lang.Thread.State: RUNNABLE at java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959) at org.apache.htrace.Tracer.deliver(Tracer.java:80) at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177) - locked <0x000000077a770730> (a org.apache.htrace.impl.MilliSpan) at org.apache.htrace.TraceScope.close(TraceScope.java:78) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898) - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697) - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream) at java.io.DataInputStream.readByte(DataInputStream.java:265) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) at org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:951) ... more accumulo code omitted... {noformat} What I'm seeing here is that reading a single byte (in WritableUtils.readVLong) is causing a new Span creation and close (which includes a flush to the SpanReceiver). This results in an extreme amount of spans for {{DFSInputStream.byteArrayRead}} just for reading a file from HDFS -- over 700k spans for just reading a few hundred MB file. Perhaps there's something different we need to do for the SpanReceiver in Accumulo? I'm not entirely sure, but this was rather unexpected. cc/ [~cmccabe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)