[
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623450#action_12623450
]
David Reiss commented on THRIFT-110:
------------------------------------
bq.This first step is to add 'unsigned' . unsigned means it can only contain
+ve numbers so we can use a variable length format.
I'm a little hesitant to do this. For one thing, it would mean that clients in
languages that didn't support this encoding would be unable to communicate with
servers that used it. This would require keeping track of which features were
used by which servers, eliminating the current nice situation that any Thrift
client can talk to any Thrift server. Also, variable-length encoding and
decoding as significant extra overhead to dynamic languages, so using the
protocol accelerator modules would become pretty much mandatory.
bq.The idea is that the protocol must not dictate how users must use collection
APIs.
You can use the APIs however you want within your program, but the data model
supported by Thrift is necessarily going to be less expressive than whatever
your language of choice uses. For example, we don't support inheritance of
data structures or shared structure
bq.We must handle all usecases gracefully
I disagree. Designing an IDL requires striking a balance between the richness
of data structures that your IDL can support and the richness of languages that
can support your IDL.
bq.Other problem is with the list/map types . They are expected to be
homogenous which is not acceptable (I guess so). So type information needs to
be encoded with each element
There are a few problems with this. First, it won't work with C++. Second it
would result in a huge increase in serialized size if we allowed all
collections to have variant contents. Third the amount of extra metadata
required to have a list that could contain two different types of structures
would be huge. You would have to include the full name of the structure class.
In Java, you would have to do a get-class-by-name to instantiate each element,
which is super slow.
The dense protocol is complete for (de)serializing standalone structures. It
can't be used as the protocol for calls/replies, but that is not a big change.
In order to implement extern strings, I would create a separate list of
strings, store indexes into the list in your main data structure, and send both
across.
> A more compact format
> ----------------------
>
> Key: THRIFT-110
> URL: https://issues.apache.org/jira/browse/THRIFT-110
> Project: Thrift
> Issue Type: Improvement
> Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does
> not have the concept of variable length integers and various other
> optimizations possible . In Solr we use a lot of such optimizations to make a
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value in same byte, very fast
> writes of Strings, externalizable strings etc
> We could use a thrift format for non-java clients and I would like to see it
> as compact as the current java version
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.