Daryn Sharp created HDFS-10662:
----------------------------------

             Summary: Optimize UTF8 string/byte conversions
                 Key: HDFS-10662
                 URL: https://issues.apache.org/jira/browse/HDFS-10662
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: hdfs
            Reporter: Daryn Sharp
            Assignee: Daryn Sharp


String/byte conversions may take either a Charset instance or its canonical 
name.  One might think a Charset instance would be faster due to avoiding a 
lookup and instantiation of a Charset, but it's not.  The canonical string name 
variants will cache the string encoder/decoder (obtained from a Charset) 
resulting in better performance.

LOG4J2-935 describes a real-world performance boost.  I micro-benched a 
marginal runtime improvement on jdk 7/8.  However for a 16 byte path, using the 
canonical name generated 50% less garbage.  For a 64 byte path, 25% of the 
garbage.  Given the sheer number of times that paths are (re)parsed, the cost 
adds up quickly.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to