[ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623450#action_12623450
 ] 

David Reiss commented on THRIFT-110:
------------------------------------

bq.This first step is to add 'unsigned' . unsigned means it can only contain 
+ve numbers so we can use a variable length format.

I'm a little hesitant to do this.  For one thing, it would mean that clients in 
languages that didn't support this encoding would be unable to communicate with 
servers that used it.  This would require keeping track of which features were 
used by which servers, eliminating the current nice situation that any Thrift 
client can talk to any Thrift server.  Also, variable-length encoding and 
decoding as significant extra overhead to dynamic languages, so using the 
protocol accelerator modules would become pretty much mandatory.

bq.The idea is that the protocol must not dictate how users must use collection 
APIs.

You can use the APIs however you want within your program, but the data model 
supported by Thrift is necessarily going to be less expressive than whatever 
your language of choice uses.  For example, we don't support inheritance of 
data structures or shared structure

bq.We must handle all usecases gracefully

I disagree.  Designing an IDL requires striking a balance between the richness 
of data structures that your IDL can support and the richness of languages that 
can support your IDL.

bq.Other problem is with the list/map types . They are expected to be 
homogenous which is not acceptable (I guess so). So type information needs to 
be encoded with each element

There are a few problems with this.  First, it won't work with C++.  Second it 
would result in a huge increase in serialized size if we allowed all 
collections to have variant contents.  Third the amount of extra metadata 
required to have a list that could contain two different types of structures 
would be huge.  You would have to include the full name of the structure class. 
 In Java, you would have to do a get-class-by-name to instantiate each element, 
which is super slow.


The dense protocol is complete for (de)serializing standalone structures.  It 
can't be used as the protocol for calls/replies, but that is not a big change.

In order to implement extern strings, I would create a separate list of 
strings, store indexes into the list in your main data structure, and send both 
across.

> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to