[jira] Commented: (THRIFT-110) A more compact format

Bryan Duxbury (JIRA) Tue, 07 Oct 2008 22:46:40 -0700

    [ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637781#action_12637781
 ]


Bryan Duxbury commented on THRIFT-110:
--------------------------------------

Regarding negative field IDs - I agree, I don't think we need to be terribly 
worried about it. I'd be fine with just making it clear in the documentation. 
Note that we can't necessarily disallow them, since it would mean structs 
without specified field IDs couldn't be serialized by this protocol.

Doubles - yeah, let's just not worry about compressing those at all at this 
point.

Containers - I'm not sure that requiring all of the elements to be of a single 
type is that big of a deal. If the options are storing an additional byte per 
element to indicate type encoding vs just choosing the best compromise type for 
all of them, I think the compromise is probably best. The only time where it 
really becomes a problem is with ints, and that's easily resolvable. If the 
numbers are all positive and less than a 64bits long, then we can use positive 
varints. If they're all negative, we can use negative varints. If there's a mix 
and the values are no bigger than the threshold, we can use zigzag encoding. If 
none of those compromises yields the best size, then we can always go for fixed 
size. 

String externalization - the problem is how you design the scoping. When you're 
doing an RPC call, there's only one message, so it's obvious where to put the 
string table. When you are serializing a struct directly, it's impossible to 
discern that from when you're serializing a struct within an RPC call, so it's 
difficult to figure out where to put the string table. The last thing you want 
to do is have a string table per struct, unless you really have that many 
duplicated strings per struct. Additionally, this spec was designed with the 
intention that implementation would not require any additions to the Thrift IDL 
(such as the extern keyword). The string externalization described in this spec 
would essentially allow any string to be externalized if it was repeated. (As I 
think about it some more, there might be some ways to make the string table 
inline everywhere, alleviating this problem... more on this later.)

Bit field - as we've previously discussed in this issue, the bit field only 
gives you savings if you have dense structs and fields are stored ordered. I 
for one do not have dense structs, so I would definitely be paying a premium.

> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>         Attachments: compact_proto_spec.txt
>
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (THRIFT-110) A more compact format

Reply via email to