[jira] Commented: (THRIFT-110) A more compact format

Chad Walters (JIRA) Tue, 19 Aug 2008 17:31:08 -0700

    [ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623853#action_12623853
 ]


Chad Walters commented on THRIFT-110:
-------------------------------------

I for one am very pleased that the Solr team is interested in getting involved 
here -- very happy to see another Apache project adopt Thrift. Here are a few 
thoughts:

1. WRT David's concerns about adding 'unsigned' to the IDL, I see this and 
other modifiers as things that are hints to the protocol that can be ignored by 
the protocol if it chooses. The BinaryProtocol would treat them the same as the 
regular type, so it would not need to handle variable encodings or introduce 
any backward compatibility issue. The newer compact protocols could choose to 
use these hints.

One way to implement this would be to use some of the upper bits of the type 
bytes to represent the modifiers and extend the protocol interface to accept 
these bits. Alternatively, we could add new types and expand the protocol 
interface to support these types -- for the BinaryProtocol, these would just 
call the same routines as the unmodified types.

2. Perhaps 'unsigned' is the wrong word to use here since it doesn't make the 
type truly unsigned and could cause confusion. What modifier names would be 
good to represent the various possibilities here? Perhaps we could assume 
non-negative values as the default and thus use the current variable length 
encoding scheme from the DenseProtocol by default -- this would make this 
particular set of changes backward compatible for the DenseProtocol. Adding a 
modifier like 'fixed' would hint that variable-length encoding is not advised. 
Adding a different modifier like "zipper" (better name please!) would hint that 
the values are likely to be small integers but include negatives and thus the 
zipper encoding from protocol buffers would work well for the data.

3. 
bq.bq. We must handle all usecases gracefully

bq. I disagree. Designing an IDL requires striking a balance between the 
richness of data structures that your IDL can support and the richness of 
languages that can support your IDL.

I agree with David here -- we want to support data types that can be expressed 
naturally in a lot of different languages which sometimes leads to a 
lowest-common denominator approach. True unsigned types were left out of Thrift 
intentionally, for example, because many languages (Java included) don't 
support them naturally. We could look into extending the type system to allow 
for non-homogeneous collections types but I'd like to understand first how this 
plays itself out in detail for C++, where there is no base Object type.

4. I am not sure of the best approach for the extern string issue but I will 
mention that cross-language string handling is complicated due to Java's 2-byte 
strings (per the string vs binary handling issue we had to address) so extra 
attention probably needs to be paid here. Are you sure that Bryan's enumeration 
idea wouldn't handle a large percentage of your use cases?

> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (THRIFT-110) A more compact format

Reply via email to