[
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623853#action_12623853
]
Chad Walters commented on THRIFT-110:
-------------------------------------
I for one am very pleased that the Solr team is interested in getting involved
here -- very happy to see another Apache project adopt Thrift. Here are a few
thoughts:
1. WRT David's concerns about adding 'unsigned' to the IDL, I see this and
other modifiers as things that are hints to the protocol that can be ignored by
the protocol if it chooses. The BinaryProtocol would treat them the same as the
regular type, so it would not need to handle variable encodings or introduce
any backward compatibility issue. The newer compact protocols could choose to
use these hints.
One way to implement this would be to use some of the upper bits of the type
bytes to represent the modifiers and extend the protocol interface to accept
these bits. Alternatively, we could add new types and expand the protocol
interface to support these types -- for the BinaryProtocol, these would just
call the same routines as the unmodified types.
2. Perhaps 'unsigned' is the wrong word to use here since it doesn't make the
type truly unsigned and could cause confusion. What modifier names would be
good to represent the various possibilities here? Perhaps we could assume
non-negative values as the default and thus use the current variable length
encoding scheme from the DenseProtocol by default -- this would make this
particular set of changes backward compatible for the DenseProtocol. Adding a
modifier like 'fixed' would hint that variable-length encoding is not advised.
Adding a different modifier like "zipper" (better name please!) would hint that
the values are likely to be small integers but include negatives and thus the
zipper encoding from protocol buffers would work well for the data.
3.
bq.bq. We must handle all usecases gracefully
bq. I disagree. Designing an IDL requires striking a balance between the
richness of data structures that your IDL can support and the richness of
languages that can support your IDL.
I agree with David here -- we want to support data types that can be expressed
naturally in a lot of different languages which sometimes leads to a
lowest-common denominator approach. True unsigned types were left out of Thrift
intentionally, for example, because many languages (Java included) don't
support them naturally. We could look into extending the type system to allow
for non-homogeneous collections types but I'd like to understand first how this
plays itself out in detail for C++, where there is no base Object type.
4. I am not sure of the best approach for the extern string issue but I will
mention that cross-language string handling is complicated due to Java's 2-byte
strings (per the string vs binary handling issue we had to address) so extra
attention probably needs to be paid here. Are you sure that Bryan's enumeration
idea wouldn't handle a large percentage of your use cases?
> A more compact format
> ----------------------
>
> Key: THRIFT-110
> URL: https://issues.apache.org/jira/browse/THRIFT-110
> Project: Thrift
> Issue Type: Improvement
> Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does
> not have the concept of variable length integers and various other
> optimizations possible . In Solr we use a lot of such optimizations to make a
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value in same byte, very fast
> writes of Strings, externalizable strings etc
> We could use a thrift format for non-java clients and I would like to see it
> as compact as the current java version
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.