BELUGA BEHR created HBASE-20197: ----------------------------------- Summary: Review of ByteBufferWriterOutputStream.java Key: HBASE-20197 URL: https://issues.apache.org/jira/browse/HBASE-20197 Project: HBase Issue Type: Improvement Components: hbase Affects Versions: 1.4.2, 2.0.0 Reporter: BELUGA BEHR Attachments: HBASE-20197.1.patch
In looking at this class, two things caught my eye. # Default buffer size of 4K # Re-sizing of buffer on demand Java's {{BufferedOutputStream}} uses an internal buffer size of 8K on modern JVMs. This is due to various bench-marking that showed optimal performance at this level. The Re-sizing buffer looks a bit "unsafe": {code:java} public void write(ByteBuffer b, int off, int len) throws IOException { byte[] buf = null; if (len > TEMP_BUF_LENGTH) { buf = new byte[len]; } else { if (this.tempBuf == null) { this.tempBuf = new byte[TEMP_BUF_LENGTH]; } buf = this.tempBuf; } ... } {code} If this method gets one call with a 'len' of 4000, then 4001, then 4002, then 4003, etc. then the 'tempBuf' will be re-created many times. Also, it seems unsafe to create a buffer as large as the 'len' input. This could theoretically lead to an internal buffer of 2GB for each instance of this class. I propose: # Increase the default buffer size to 8K # Create the buffer once and chunk the output instead of loading data into a single array and writing it to the output stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005)