[ https://issues.apache.org/jira/browse/HBASE-20197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
BELUGA BEHR reassigned HBASE-20197: ----------------------------------- Assignee: BELUGA BEHR > Review of ByteBufferWriterOutputStream.java > ------------------------------------------- > > Key: HBASE-20197 > URL: https://issues.apache.org/jira/browse/HBASE-20197 > Project: HBase > Issue Type: Improvement > Components: hbase > Affects Versions: 2.0.0, 1.4.2 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Minor > Attachments: HBASE-20197.1.patch > > > In looking at this class, two things caught my eye. > # Default buffer size of 4K > # Re-sizing of buffer on demand > > Java's {{BufferedOutputStream}} uses an internal buffer size of 8K on modern > JVMs. This is due to various bench-marking that showed optimal performance > at this level. > The Re-sizing buffer looks a bit "unsafe": > > {code:java} > public void write(ByteBuffer b, int off, int len) throws IOException { > byte[] buf = null; > if (len > TEMP_BUF_LENGTH) { > buf = new byte[len]; > } else { > if (this.tempBuf == null) { > this.tempBuf = new byte[TEMP_BUF_LENGTH]; > } > buf = this.tempBuf; > } > ... > } > {code} > If this method gets one call with a 'len' of 4000, then 4001, then 4002, then > 4003, etc. then the 'tempBuf' will be re-created many times. Also, it seems > unsafe to create a buffer as large as the 'len' input. This could > theoretically lead to an internal buffer of 2GB for each instance of this > class. > I propose: > # Increase the default buffer size to 8K > # Create the buffer once and chunk the output instead of loading data into a > single array and writing it to the output stream. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)