Randy,

[There -is- a way that this could be done without modifying the interface
of TProtocol, but it's pretty involved/abstruse; I'll write about this in a
second email.]

I think that adding such self-describing type-information would vitiate the
value of the format.  Let me try to convince you.

TL;DR -- config-files need to be human-readable/editable, and maintaining
what is arguably superfluous type-information in the config-file is never
going to be appealing to humans.

(1) in a former life, I worked for a company that used protobufs
extensively for comms.  It was really really nice (for all the reasons that
people who use Thrift a lot are aware of) -- you could count on data being
strongly-typed, and yet the marshalling tech was efficient and had a
modicum of version-to-version compatibility.

(2) One thing that I found really lovely, was that they also expressed
config-files in protobufs.  So almost everywhere, you found code dealing
with configuration -objects-, and not YAML files, CSV, INI, etc.  There was
-one- representation in-memory, and it was as protobuf objects.  This was
great, for the same reason that storing data in protobufs was great.

(3) Now, config-files are different from -data- -- Humans read and edit
config-files.  So having a nice human-readable/editable format for
protobufs, was critical to this use-case.  Turns out, protobufs 2.0 has
one: "CompactText".  And protobufs 3.0 has added a new one: "JSON".  Both
of these have the property that there's no (to the human) superfluous type
information to keep accurately up-to-date.  In the "CompactText"
representation, the example from my first email would look like:

a: 1 b: "ugh" c: < a: 2 b: "argh" >

So: a lot like JSON, but even more bare-bones.

The key thing here, and in the JSON example, is that the config-file needs
to be somewhat reasonably near to what a human would have written in the
first place.  Adding type-information for the marshaller breaks that
property.

(4) Another use-case: where you could have an RPC server endpoint that
presented a human-readable wireline, and users could invoke it (for
learning, debugging, etc) using something as simple as "nc" and a
shellscript) is also pretty much vitiated by requring the invoker to pass
along complex type-information.

--chet--



On Mon, Oct 2, 2017 at 10:20 AM, Randy Abernethy <[email protected]>
wrote:

> I see. Are you opposed to serializing the mapping? The proto could buffer
> writes and collect the mappings, then on writeStructEnd() you could emit
> the map (maybe as an __map__ attribute or something) followed by the data.
> The read side could read the map in response to readStructBegin(). Not only
> would this require no mods to Thrift but it would have the added advantage
> of making your wire format self describing. Kind of like Avro.
>
> Thoughts?
>
> On Mon, Oct 2, 2017 at 10:00 AM, Chet Murthy <[email protected]>
> wrote:
>
> > Randy,
> >
> > Thank you for your questions!  I'm hoping that I'm mistaken, and maybe
> via
> > this conversation, you can help me figure out that indeed I am.
> >
> > (1) you're right, that the writeFieldBegin method is passed the
> field-name,
> > so it can write it on the JSON wire.
> >
> > (2) the problem is, readFieldBegin can read that back, but it cannot
> > *infer* the fieldid from that name, and the *fieldid* is what's used in
> > generated code to drive the switch for demarshalling.  Concretely, in
> your
> > example, even if "fname" were set to either "f1" or "f2", the switch
> logic
> > is driven by fid being set to either 1 or 2.  And there's no way for that
> > to happen in a TProtocol, and specifically TSimpleJSONProtocol doesn't do
> > it.  But generally, there's no way for it to happen, b/c inferring
> fieldid
> > from fieldname depends in which message is being demarshalled, and the
> > *protocol* object doesn't have access to type-(IDL)-information at all.
> >
> > I haven't yet implemented the change I contemplate, only b/c I wanted to
> > find out how open Thrift was, to such a change.  But I can do so, if
> it'll
> > help to explain what I mean -- it isn't difficult.
> >
> > --chet--
> >
> > On Mon, Oct 2, 2017 at 9:41 AM, Randy Abernethy <[email protected]> wrote:
> >
> > > Hi Chet,
> > >
> > > You say there is no mapping between the field names and type/ids, yet
> > every
> > > struct (including param structs) hands just such data to the proto on
> > > write. Why are the field string names supplied to the
> > > TProtocol::writeFieldBegin method by the generated struct code
> > > insufficient? The write code passes the proto the field name, type and
> > id;
> > > and the read is offered the opportunity to return them. Sounds like
> > > everything your new protocol would need is supplied. As per Jens you
> just
> > > need to serialize the data provided the way you want it (swapping field
> > > names for ids).
> > >
> > > What am I missing (I'm guessing something :-)?
> > >
> > > For example, thrift IDL (notice bold items):
> > >
> > > >>>>>>>>>>>>
> > >
> > > struct data {
> > >     1: i16 *f1*
> > >     2: i16 *f2*
> > > }
> > >
> > > <<<<<<<<<<<<<
> > >
> > > generates c++ write:
> > >
> > > >>>>>>>>>>
> > >
> > > uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot)
> > const {
> > >   uint32_t xfer = 0;
> > >   ::apache::thrift::protocol::TOutputRecursionTracker tracker(*oprot);
> > >   xfer += oprot->writeStructBegin("data");
> > >
> > >   xfer += oprot->writeFieldBegin("*f1*", ::apache::thrift::protocol::T_
> > > I16,
> > > 1);
> > >   xfer += oprot->writeI16(this->f1);
> > >   xfer += oprot->writeFieldEnd();
> > >
> > >   xfer += oprot->writeFieldBegin("*f2*", ::apache::thrift::protocol::T_
> > > I16,
> > > 2);
> > >   xfer += oprot->writeI16(this->f2);
> > >   xfer += oprot->writeFieldEnd();
> > >
> > >   xfer += oprot->writeFieldStop();
> > >   xfer += oprot->writeStructEnd();
> > >   return xfer;
> > > }
> > >
> > > <<<<<<<<<<<<<
> > >
> > > and read:
> > >
> > > >>>>>>>>>>>>>
> > >
> > > uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) {
> > >
> > >   ::apache::thrift::protocol::TInputRecursionTracker tracker(*iprot);
> > >   uint32_t xfer = 0;
> > >   std::string fname;
> > >   ::apache::thrift::protocol::TType ftype;
> > >   int16_t fid;
> > >
> > >   xfer += iprot->readStructBegin(fname);
> > >
> > >   using ::apache::thrift::protocol::TProtocolException;
> > >
> > >   while (true)
> > >   {
> > >     xfer += iprot->readFieldBegin(*fname*, ftype, fid);
> > >     if (ftype == ::apache::thrift::protocol::T_STOP) {
> > >       break;
> > >     }
> > >     switch (fid)
> > >     {
> > >       case 1:
> > >         if (ftype == ::apache::thrift::protocol::T_I16) {
> > >           xfer += iprot->readI16(this->f1);
> > >           this->__isset.f1 = true;
> > >         } else {
> > >           xfer += iprot->skip(ftype);
> > >         }
> > >         break;
> > >       case 2:
> > >         if (ftype == ::apache::thrift::protocol::T_I16) {
> > >           xfer += iprot->readI16(this->f2);
> > >           this->__isset.f2 = true;
> > >         } else {
> > >           xfer += iprot->skip(ftype);
> > >         }
> > >         break;
> > >       default:
> > >         xfer += iprot->skip(ftype);
> > >         break;
> > >     }
> > >     xfer += iprot->readFieldEnd();
> > >   }
> > >
> > >   xfer += iprot->readStructEnd();
> > >
> > >   return xfer;
> > > }
> > >
> > > <<<<<<<<<<<<<
> > >
> > >
> > > --Randy
> > >
> > >
> > > On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo <
> [email protected]>
> > > wrote:
> > >
> > > > Also i wonder if what is meant by human readable is simple a clever
> way
> > > to
> > > > generate pcap modules so tools like wireshark/tcp dump can read the
> > data.
> > > >
> > > >
> > > >
> > > > On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Chet,
> > > > >
> > > > > well, Thrift is primarily about efficiency, not human readability.
> If
> > > > > machines and programs talk to each other, nobody really needs human
> > > > > readable
> > > > > messages, because there are no humans involved, except maybe for
> > > > debugging
> > > > > (but that's not a real production use case).  If one asked you to
> > pick
> > > > just
> > > > > one single feature about any Serialization and RPC library,
> > potentially
> > > > > sacrificing any other requirement if needed, you probably would
> > answer
> > > > that
> > > > > it should be as fast and efficient as possible.
> > > > >
> > > > > I only wonder if the human readability has sth to do with the fact
> > that
> > > > > gRPC
> > > > > is often found being slower than Thrift ...  ;-)
> > > > >
> > > > > You still want a human readable fomat? Ok, here's how to do it.
> > Thrift
> > > > > indeed offers the ability to achieve that, because it is a
> framework.
> > > For
> > > > > example, look at the implementation of the TSimpleJSONProtocol
> (link
> > > > below)
> > > > > and use this as a starting point to write your own JSON-like
> > TProtocol
> > > > > implementation that suits your needs. That's what makes Thrift so
> > > > flexible
> > > > > -
> > > > > even if you have special needs, you need to replace only those
> parts
> > > and
> > > > it
> > > > > still simply works. If you prefer XML or some other format, even
> that
> > > > > should
> > > > > be feasible, but you have to invest some work either way.
> > > > >
> > > > > https://github.com/apache/thrift/blob/master/lib/java/
> > > > > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java
> > > > >
> > > > > Does that help you?
> > > > >
> > > > > Have fun,
> > > > > JensG
> > > > >
> > > > >
> > > > > -----Ursprüngliche Nachricht-----
> > > > > From: Chet Murthy
> > > > > Sent: Thursday, September 28, 2017 3:04 AM
> > > > > To: [email protected]
> > > > > Subject: Human-readable wire-format for Thrift?
> > > > >
> > > > > [I hope I'm sending this mail to the right list -- it wasn't clear
> to
> > > me
> > > > > that it should go to thrift-dev, so I figured I'd send it here
> > first.]
> > > > >
> > > > > The -one- thing that protobufs has going for it, over Thrift, is
> that
> > > > > protobufs has "CompactTextFormat" (and JSON too) as full
> > wire-formats.
> > > > > This is .... incredibly useful for the following use-case:
> > > > >
> > > > > You want to write a config-file format, and you want to get the
> > > benefits
> > > > of
> > > > > version-to-version compatibility.  In your program, you'd like to
> > > access
> > > > a
> > > > > strongly-typed "config object" with typed fields, and you'd -like-
> > for
> > > > > marshalling to/from flat-text to be automatically generated.
> > > > >
> > > > > I have personal experience with using protobufs in exactly this
> way,
> > > and
> > > > > it's really, really, really nice.
> > > > >
> > > > > The current Thrift JSON protocol isn't designed for this, and given
> > the
> > > > > interface of the (C++) TProtocol class, I think it isn't possible.
> > But
> > > > > with a small change, it -would- be possible, so I thought I'd
> > describe
> > > > the
> > > > > change, and see what you all thought (b/c it would require a change
> > to
> > > > > generated code, and to the TProtocol base class interfaces
> > > (specifically
> > > > to
> > > > > the readFieldBegin method):
> > > > >
> > > > > [I'll describe this for the C++ generated code; I haven't looked
> > > > carefully
> > > > > into the rest of the languages, but I'd guess that something could
> be
> > > > > done.]
> > > > >
> > > > > (0) Let me first note that these datastructures are constant, and
> > we're
> > > > > talking about passing an extra parameter to the read method listed
> > > above.
> > > > > That's it.
> > > > >
> > > > > (1) For concreteness, imagine a couple of message types
> > > > >
> > > > > struct Bar {
> > > > >   4: required i32 a ,
> > > > >   5: required string b,
> > > > > }
> > > > >
> > > > > struct Foo {
> > > > >   1: required i32 a ,
> > > > >   2: required string b,
> > > > >   3: required Bar c,
> > > > > }
> > > > >
> > > > > Again for concreteness, here's an example of the JSON protoocol
> for a
> > > > value
> > > > > of type Foo:
> > > > >
> > > > > {
> > > > >     "1": {
> > > > >         "i32": 1
> > > > >     },
> > > > >     "2": {
> > > > >         "str": "ugh"
> > > > >     },
> > > > >     "3": {
> > > > >         "rec": {
> > > > >             "4": {
> > > > >                 "i32": 2
> > > > >             },
> > > > >             "5": {
> > > > >                 "str": "argh"
> > > > >             }
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > (2) I'd prefer that that look like:
> > > > > {
> > > > >     "a": 1,
> > > > >     "b": "ugh",
> > > > >     "c": {
> > > > >          "a": 2,
> > > > >           "b": "argh"
> > > > >     }
> > > > > }
> > > > >
> > > > > (3) For each message-type, we need a mapping field-name ->
> > > > > pair<Thrift-type, field-id>.  So, generate a constant
> data-structure
> > of
> > > > > type
> > > > >
> > > > > map<string, pair<Type, int16_t> >
> > > > >
> > > > > for each message-type.
> > > > >
> > > > > (3) Marshalling is easy -- all the field-names are known, and we
> > could
> > > > just
> > > > > emit those instead of field-ids; similarly, we could skip putting
> > > > > type-information in the wire-format too.
> > > > >
> > > > > (4) At demarshalling time, we always know the type of the message
> > we're
> > > > > demarshalling.  So as we read field-names, we can use the map in #3
> > to
> > > > look
> > > > > up TType and field-id, and then just demarshal in the normal way.
> We
> > > > just
> > > > > need to pass that map as a constref to readFieldBegin.
> > > > >
> > > > > I -think- that that works, and can't find any problems with what
> I've
> > > > > described.
> > > > >
> > > > > I can make this change to the C++ library and code-generator, but
> > > before
> > > > I
> > > > > start down that path, I figured I should get some input on whether
> > this
> > > > is
> > > > > something that the Thrift community (and maintainers) would accept?
> > > > >
> > > > > I think that a human-readable/writable wire would be immensely
> > > valuable,
> > > > > and not just for the example of config-files.
> > > > >
> > > > > Your feedback appreciated,
> > > > > --chet--
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to