Vanco Buca created HDFS-14055: --------------------------------- Summary: Over-eager allocation in ByteBufferUtil.fallbackRead Key: HDFS-14055 URL: https://issues.apache.org/jira/browse/HDFS-14055 Project: Hadoop HDFS Issue Type: Improvement Components: fs Reporter: Vanco Buca
The heap-memory path of ByteBufferUtil.fallbackRead ([see master branch code here|[https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java#L95])] massively overallocates memory when the underlying input stream returns data in smaller chunks. This happens on a regular basis when using the S3 input stream as input. The behavior is an O(N^2)-ish. In a recent debug session, we were trying to read 6MB, but getting 16K at a time. The code would: * allocate 16M, use the first 16K * allocate 16M - 16K, use the first 16K of that * allocate 16M - 32K, use the first 16K of that * (etc) The patch is simple. Here's the text version of the patch: {code} @@ -88,10 +88,17 @@ public final class ByteBufferUtil { buffer.flip(); } else { buffer.clear(); - int nRead = stream.read(buffer.array(), - buffer.arrayOffset(), maxLength); - if (nRead >= 0) { - buffer.limit(nRead); + int totalRead = 0; + while (totalRead < maxLength) { + final int nRead = stream.read(buffer.array(), + buffer.arrayOffset() + totalRead, maxLength - totalRead); + if (nRead <= 0) { + break; + } + totalRead += nRead; + } + if (totalRead >= 0) { + buffer.limit(totalRead); success = true; } } {code} so, essentially, do the same thing that the code in the direct memory path is doing -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org