[jira] Commented: (THRIFT-110) A more compact format

Larry Hastings (JIRA) Wed, 04 Feb 2009 22:16:25 -0800

    [ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670639#action_12670639
 ]


Larry Hastings commented on THRIFT-110:
---------------------------------------

I'm not a Thrift guy, but I just talked with David Reiss about this thread.  I 
have a couple more crazy suggestions--but he tells me they'd require compiler 
changes.

First: allow casting from bool to int, so that you can send the integer values 
0 and 1 as boolean-false and boolean-true respectively.


Second: make up your mind--are you using zigzag ints or not?  If you are, you 
only need *one* integer type.  I think it'd be better to go the other way: have 
int types for int-8, int-16, int-24, int-32, int-40, int-64.  That would give 
back one additional bit per byte of the ints as you'd no longer need the zigzag 
marker bit.


Finally: this still leaves one type-header value for future expansion, which I 
suggest should be explicitly defined as "followed by a variable-length 
type-header value".  Even if you dropped zigzag ints for integer values, they 
might be worth keeping here.

I'm gonna mark up type-and-id below; for clarity I'm going to put square 
brackets around individual bytes.

If we use zigzag ints here, type-and-id becomes:

type-and-id
=> [ field-id-delta type-header ]
   | [ 0 type-header ] zigzag-varint
   | [ field-id-delta 0xF ] zigzag-varint
   | [ 0 0xF ] zigzag-varint zigzag-varint

If we remove zigzags entirely, then for extended field deltas or types we must 
follow the type-and-id with a byte containing two type-id-headers: the high one 
for the extended field-id-delta, and the low one for type-header.  If either 
isn't used, set those four bits to 0.

type-and-id
=> [ field-id-delta type-header ]
   | [ 0 type-header ] [ field-id-delta-int-type-header 0 ] n-bit-int
   | [ field-id-delta 0xF ] [ 0 type-header-int-type-header ] n-bit-int
   | [ 0 0xF ] [ field-id-delta-int-type-header type-header-int-type-header ] 
n-bit-int n-bit-int

In terms of compression, I suspect leaving the zigzag ints here is a win; given 
that real-world use would likely never see extended type-headers, the only 
variant we'd see was a field-id-delta that didn't fit in the range 1-15.  In 
that case, zigzag ints would be strictly either smaller or the same size as the 
second approach.

I'm probably crazy,


/larry/

> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>            Assignee: Bryan Duxbury
>         Attachments: compact-proto-spec-2.txt, compact_proto_spec.txt, 
> compact_proto_spec.txt, thrift-110-v2.patch, thrift-110-v3.patch, 
> thrift-110-v4.patch, thrift-110-v5.patch, thrift-110-v6.patch, 
> thrift-110-v7.patch, thrift-110-v8.patch, thrift-110-v9.patch, 
> thrift-110.patch
>
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (THRIFT-110) A more compact format

Reply via email to