[ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623114#action_12623114
 ] 

Noble Paul commented on THRIFT-110:
-----------------------------------

bq.Take a look at the DenseProtocol

Our own protocol is  extremely compact . I am looking for a true cross language 
library so that all our users can take advantage of that. Thrift has the 
potential but it is very inefficient compared to protocol buffers.  

bq. I'd also like to see the IDL extended so that hints can be provided for 
compactness - such as "small" or "small unsigned" or "hash"

This first step is to add 'unsigned' . unsigned means it can only contain +ve 
numbers so we can use a variable length format. 

Protocol buffers use something called zigzag encoding for signed integers

Other problem is with the list/map types . They are expected to be homogenous 
which is not acceptable (I guess so). So type information needs to be encoded 
with each element

Writing a string with length (i32) first  consumes four bytes. Which is 
inefficient for small strings . 

writing hash itself is not very efficient for strings a long may take 8 bytes. 
We have a special type called extern string . It  stores  the strings already 
written and assign an index to it . Subsequent writes just write that index if 
the string is already written. 

> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to