[
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625978#action_12625978
]
noble.paul edited comment on THRIFT-110 at 8/26/08 10:44 PM:
-------------------------------------------------------------
bq. Exactly, you can use a single byte for lots of information:
We use it in our implementation and I did not specify it because it can get
very complex.
Here is how it is
* the 5 lsb are used to encode types with no extra info or the ones that cannot
benefit from it. VOID, STOP,BOOLEAN_TRUE, BOOLEAN_FALSE, DOUBLE, SIGNED INT ,
SIGNED LONG, BYTE etc
** That means, for those types the 3 msb will be set to '0' . And always read
the 3 msb first and if it is '0' read the 5 lsb
* The 3 msb will be used to represent types which can probably benefit from
some extra info. That means we can have a max of 7 such types .But those types
can use the 5 msbs (say values 0-31) for extra info .examples
** String/extern string . a lot of string lengths tend to be in the small range
say <31.So length can be put in the extra info
** list/map/set . their lengths are also usually low. lengths can be put into
the extra info
** struct with name (THRIFT-122) . tend to be few
** positive int . usually tend to be small. the value can be put into the extra
info
was (Author: noble.paul):
bq. Exactly, you can use a single byte for lots of information:
We use it in our implementation and I did not specify it because it can get
very complex.
Here is how it is
* the 5 lsb are used to encode types with no extra info or the ones that cannot
benefit from it. VOID, STOP,BOOLEAN_TRUE, BOOLEAN_FALSE, DOUBLE, signed INT ,
LONG, BYTE etc
** That means, for those types the 3 msb will be set to '0'
* The 3 msb will be used to represent types which can probably benefit from
some extra info. That means we can have a max of 7 such types .But those types
can use the 5 msbs (say values 0-31) for extra info .examples
** String/extern string . a lot of string lengths tend to be in the small range
say <31.
** list/map/set . their lengths are also usually low
** struct with name (THRIFT-122) . tend to be few
** positive int . usually tend to be small
> A more compact format
> ----------------------
>
> Key: THRIFT-110
> URL: https://issues.apache.org/jira/browse/THRIFT-110
> Project: Thrift
> Issue Type: Improvement
> Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does
> not have the concept of variable length integers and various other
> optimizations possible . In Solr we use a lot of such optimizations to make a
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value in same byte, very fast
> writes of Strings, externalizable strings etc
> We could use a thrift format for non-java clients and I would like to see it
> as compact as the current java version
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.