Vanco Buca created HDFS-14055:
---------------------------------

             Summary: Over-eager allocation in ByteBufferUtil.fallbackRead
                 Key: HDFS-14055
                 URL: https://issues.apache.org/jira/browse/HDFS-14055
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: fs
            Reporter: Vanco Buca


The heap-memory path of ByteBufferUtil.fallbackRead ([see master branch code 
here|[https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java#L95])]
 massively overallocates memory when the underlying input stream returns data 
in smaller chunks. This happens on a regular basis when using the S3 input 
stream as input.

The behavior is an O(N^2)-ish. In a recent debug session, we were trying to 
read 6MB, but getting 16K at a time. The code would:
 * allocate 16M, use the first 16K
 * allocate 16M - 16K, use the first 16K of that
 * allocate 16M - 32K, use the first 16K of that
 * (etc)

The patch is simple. Here's the text version of the patch:
{code}
@@ -88,10 +88,17 @@ public final class ByteBufferUtil {
         buffer.flip();
       } else {
         buffer.clear();
-        int nRead = stream.read(buffer.array(),
-          buffer.arrayOffset(), maxLength);
-        if (nRead >= 0) {
-          buffer.limit(nRead);
+        int totalRead = 0;
+        while (totalRead < maxLength) {
+          final int nRead = stream.read(buffer.array(),
+            buffer.arrayOffset() + totalRead, maxLength - totalRead);
+          if (nRead <= 0) {
+            break;
+          }
+          totalRead += nRead;
+        }
+        if (totalRead >= 0) {
+          buffer.limit(totalRead);
           success = true;
         }
       }
{code}

so, essentially, do the same thing that the code in the direct memory path is 
doing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to