[ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637831#action_12637831
 ] 

Noble Paul commented on THRIFT-110:
-----------------------------------

bq.String externalization - the problem is how you design the scoping

String externalization may not need a special string table. When we write down 
a string that is to be externalized we store it in memory in a 
map(string->index) also with an index assigned to it.When we write subsequent 
strings we lookup in the map if it is present . If it is present write the type 
as EXTERN_STRING_IDX write down the index instead of the actual string .


While reading, when an external string is read it is stored into a vector and 
the position of the string in the vector is the index. If the type is 
EXTERN_STRING_IDX read the next int value and lookup in the vector for the 
string at that index

bq.Bit field - as we've previously discussed in this issue, the bit field only 
gives you savings if you have dense structs and fields are stored ordered.

This is the power we give to the user . If he knows how the fields are written 
he can make the judgement and benefit from the perf improvement. I assume that 
every struct will can have less than 2-3 bytes of overhead . if the user 
chooses ids like 1000,200 5000 then we end up using 2 bytes/field . 
So this can be documented and ask users to choose numbers which are small



> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>         Attachments: compact_proto_spec.txt
>
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to