[jira] Commented: (THRIFT-765) Improved string encoding and decoding performance

Doug Cutting (JIRA) Thu, 29 Apr 2010 15:40:22 -0700

    [ 
https://issues.apache.org/jira/browse/THRIFT-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862456#action_12862456
 ]


Doug Cutting commented on THRIFT-765:
-------------------------------------

Does using a switch instead of an 'if else' in decode help any?

Another thing to try might be to use a char[] buffer instead of StringBuffer 
and directly implement UTF-16 coding.  Codepoints less than 0x10000 are encoded 
as one Java char.  Codepoints over that are encoded in two chars, with the 
first containing 0xD8 plus the high-order ten bits of codepoint minus 0x10000, 
and the second 0xDC plus the low-order ten bits of codepoint minus 0x10000.

The one-, two- and three- byte cases of UTF-8 all produce characters less than 
0x10000 and so will always generate a single char.  And the four-byte case of 
UTF-8 will only and always generate a character greater than or equal to 
0x10000.  So you don't need to test that if you combine the UTF-16 encoding 
with the UTF-8 decoding.

> Improved string encoding and decoding performance
> -------------------------------------------------
>
>                 Key: THRIFT-765
>                 URL: https://issues.apache.org/jira/browse/THRIFT-765
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Library (Java)
>    Affects Versions: 0.2
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>             Fix For: 0.3
>
>         Attachments: thrift-765-redux.patch, thrift-765.patch
>
>
> One of the most consistent time-consuming spots of Thrift serialization and 
> deserialization is string encoding. For some inscrutable reason, 
> String.getBytes("UTF-8") is slow. 
> However, it's recently been brought to my attention that DataOutputStream's 
> writeUTF method has a faster implementation of UTF-8 encoding and decoding. 
> We should use this style of encoding.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (THRIFT-765) Improved string encoding and decoding performance

Reply via email to