[
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623899#action_12623899
]
Noble Paul commented on THRIFT-110:
-----------------------------------
bq.The BinaryProtocol would treat them the same as the regular type, so it
would not need to handle variable encodings or introduce any backward
compatibility issue. The newer compact protocols could choose to use these
hints.
I would say , let us not meddle with the existing binary protocol. So no
existing users must have a problem. Let us develop a new "DenseProtocol" which
is a totally different format and we will have the freedom to create a very
efficient format
bq.Perhaps 'unsigned' is the wrong word to use here since..
Your are right. unsigned is misleading. . we call it vint (variable length int
) and vlong (variable length int ) in lucene/solr. I guess that is not quite a
bad thing. If the code generated is for the old binary protocol it can ignore
the 'v' part treat it like an ordinary integer/long . the new format can have
different encoding for these
bq.We could look into extending the type system to allow for non-homogeneous
collections types but I'd like to understand first how this plays itself out in
detail for C++, where there is no base Object type
The fact that somebody is using non-homogeneous collection means that he is
using a language that supports that. Let us say there is wildcard support in
IDL like {{list<?>}} or {{map<?,?>}} if the users tries to generate code for a
language which does not have that support it can easily throw an error and
fail. We can explicitly document that in the IDL documentation. Most of the
users actually do not generate code for all languages . They only use max 2-3
languages . Just because some language which he does not need doesn't support
it shouldn't prevent him from using a powerful feature of his language.
bq.I am not sure of the best approach for the extern string issue but I will
mention that cross-language string handling is complicated
What is represented in the binary form does not have to be the same as the in
memory representation of the language. Though java has 2-byte characters in
memory we use utf-8 *always* to serialize /deserialize strings (which is not
java's native format) . And it has worked well.
bq.Are you sure that Bryan's enumeration idea wouldn't handle a large
percentage of your use cases?
one thing I can say is that the solution does not work. We tried to do it
initially but quite did not work .
extern string can be lowered in priority. But it can be a very nice addition.
But what I can say is implementation is easier than you imagine (easier than
the other things. Just look at the writeExternString() method in [this class
|http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup]
).
> A more compact format
> ----------------------
>
> Key: THRIFT-110
> URL: https://issues.apache.org/jira/browse/THRIFT-110
> Project: Thrift
> Issue Type: Improvement
> Reporter: Noble Paul
>
> Thrift is not very compact in writing out data as (say protobuf) . It does
> not have the concept of variable length integers and various other
> optimizations possible . In Solr we use a lot of such optimizations to make a
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value in same byte, very fast
> writes of Strings, externalizable strings etc
> We could use a thrift format for non-java clients and I would like to see it
> as compact as the current java version
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.