[ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625978#action_12625978
 ] 

noble.paul edited comment on THRIFT-110 at 8/26/08 10:44 PM:
-------------------------------------------------------------

bq. Exactly, you can use a single byte for lots of information: 

We use it in our implementation and I did not specify it because it can get 
very complex.

Here is how it is 

* the 5 lsb are used to encode types with no extra info or the ones that cannot 
benefit from it. VOID, STOP,BOOLEAN_TRUE, BOOLEAN_FALSE, DOUBLE, SIGNED INT , 
SIGNED LONG, BYTE etc
** That means, for those types the 3 msb will be set to '0' . And always read 
the 3 msb first and if it is '0' read the 5 lsb
* The 3 msb will be used to represent types which can probably benefit from 
some extra info. That means we can have a max of 7 such types .But those types 
can use the 5 msbs  (say values  0-31) for extra info  .examples
** String/extern string . a lot of string lengths tend to be in the small range 
say <31.So length can be put in the extra info
** list/map/set . their lengths are also usually low. lengths can be put into 
the extra info
** struct with name (THRIFT-122) . tend to be few
** positive int . usually tend to be small. the value can be put into the extra 
info





      was (Author: noble.paul):
    bq. Exactly, you can use a single byte for lots of information: 

We use it in our implementation and I did not specify it because it can get 
very complex.

Here is how it is 

* the 5 lsb are used to encode types with no extra info or the ones that cannot 
benefit from it. VOID, STOP,BOOLEAN_TRUE, BOOLEAN_FALSE, DOUBLE, signed INT , 
LONG, BYTE etc
** That means, for those types the 3 msb will be set to '0'
* The 3 msb will be used to represent types which can probably benefit from 
some extra info. That means we can have a max of 7 such types .But those types 
can use the 5 msbs  (say values  0-31) for extra info  .examples
** String/extern string . a lot of string lengths tend to be in the small range 
say <31.
** list/map/set . their lengths are also usually low
** struct with name (THRIFT-122) . tend to be few
** positive int . usually tend to be small




  
> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to