[ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637943#action_12637943
 ] 

Bryan Duxbury commented on THRIFT-110:
--------------------------------------

@Chad - zigzag would of course work. The problem is that it would reduce the 
available number of compact field ids by 50%, which in my mind unduly punishes 
structs that are doing things right and specifying the field ids.

@Noble:

I understand you suggestion for string externalization. If we had the extern 
keyword in the IDL, then this would be trivial. However, at the moment, we do 
not, and I'm not willing to make that a prerequisite. The alternative is to 
consider ALL strings for externalization, but this is probably too coarse. The 
risk here would be that since every string is getting an ID, when the duplicate 
of that string comes up, the ID will be moderately large (ie greater than 4 
bits) and we'll have to use more bytes than we'd like in order to represent the 
relationship. Out of curiosity, do you have a sense of length and frequency of 
duplicated strings in your dataset?

I wonder if it would make sense to implement the core of this functionality in 
an abstract protocol and then fork it into DenseCompact and SpareCompact 
concrete protocols. That would enable us to make optimizations like using the 
bit field instead of field ids. Though, I think that the bit field approach 
would necessitate the way that we generate our struct code, since it expects 
there to be a field id per field, rather than a single global bit field. 

> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>         Attachments: compact_proto_spec.txt
>
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to