Randy, [There -is- a way that this could be done without modifying the interface of TProtocol, but it's pretty involved/abstruse; I'll write about this in a second email.]
I think that adding such self-describing type-information would vitiate the value of the format. Let me try to convince you. TL;DR -- config-files need to be human-readable/editable, and maintaining what is arguably superfluous type-information in the config-file is never going to be appealing to humans. (1) in a former life, I worked for a company that used protobufs extensively for comms. It was really really nice (for all the reasons that people who use Thrift a lot are aware of) -- you could count on data being strongly-typed, and yet the marshalling tech was efficient and had a modicum of version-to-version compatibility. (2) One thing that I found really lovely, was that they also expressed config-files in protobufs. So almost everywhere, you found code dealing with configuration -objects-, and not YAML files, CSV, INI, etc. There was -one- representation in-memory, and it was as protobuf objects. This was great, for the same reason that storing data in protobufs was great. (3) Now, config-files are different from -data- -- Humans read and edit config-files. So having a nice human-readable/editable format for protobufs, was critical to this use-case. Turns out, protobufs 2.0 has one: "CompactText". And protobufs 3.0 has added a new one: "JSON". Both of these have the property that there's no (to the human) superfluous type information to keep accurately up-to-date. In the "CompactText" representation, the example from my first email would look like: a: 1 b: "ugh" c: < a: 2 b: "argh" > So: a lot like JSON, but even more bare-bones. The key thing here, and in the JSON example, is that the config-file needs to be somewhat reasonably near to what a human would have written in the first place. Adding type-information for the marshaller breaks that property. (4) Another use-case: where you could have an RPC server endpoint that presented a human-readable wireline, and users could invoke it (for learning, debugging, etc) using something as simple as "nc" and a shellscript) is also pretty much vitiated by requring the invoker to pass along complex type-information. --chet-- On Mon, Oct 2, 2017 at 10:20 AM, Randy Abernethy <[email protected]> wrote: > I see. Are you opposed to serializing the mapping? The proto could buffer > writes and collect the mappings, then on writeStructEnd() you could emit > the map (maybe as an __map__ attribute or something) followed by the data. > The read side could read the map in response to readStructBegin(). Not only > would this require no mods to Thrift but it would have the added advantage > of making your wire format self describing. Kind of like Avro. > > Thoughts? > > On Mon, Oct 2, 2017 at 10:00 AM, Chet Murthy <[email protected]> > wrote: > > > Randy, > > > > Thank you for your questions! I'm hoping that I'm mistaken, and maybe > via > > this conversation, you can help me figure out that indeed I am. > > > > (1) you're right, that the writeFieldBegin method is passed the > field-name, > > so it can write it on the JSON wire. > > > > (2) the problem is, readFieldBegin can read that back, but it cannot > > *infer* the fieldid from that name, and the *fieldid* is what's used in > > generated code to drive the switch for demarshalling. Concretely, in > your > > example, even if "fname" were set to either "f1" or "f2", the switch > logic > > is driven by fid being set to either 1 or 2. And there's no way for that > > to happen in a TProtocol, and specifically TSimpleJSONProtocol doesn't do > > it. But generally, there's no way for it to happen, b/c inferring > fieldid > > from fieldname depends in which message is being demarshalled, and the > > *protocol* object doesn't have access to type-(IDL)-information at all. > > > > I haven't yet implemented the change I contemplate, only b/c I wanted to > > find out how open Thrift was, to such a change. But I can do so, if > it'll > > help to explain what I mean -- it isn't difficult. > > > > --chet-- > > > > On Mon, Oct 2, 2017 at 9:41 AM, Randy Abernethy <[email protected]> wrote: > > > > > Hi Chet, > > > > > > You say there is no mapping between the field names and type/ids, yet > > every > > > struct (including param structs) hands just such data to the proto on > > > write. Why are the field string names supplied to the > > > TProtocol::writeFieldBegin method by the generated struct code > > > insufficient? The write code passes the proto the field name, type and > > id; > > > and the read is offered the opportunity to return them. Sounds like > > > everything your new protocol would need is supplied. As per Jens you > just > > > need to serialize the data provided the way you want it (swapping field > > > names for ids). > > > > > > What am I missing (I'm guessing something :-)? > > > > > > For example, thrift IDL (notice bold items): > > > > > > >>>>>>>>>>>> > > > > > > struct data { > > > 1: i16 *f1* > > > 2: i16 *f2* > > > } > > > > > > <<<<<<<<<<<<< > > > > > > generates c++ write: > > > > > > >>>>>>>>>> > > > > > > uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot) > > const { > > > uint32_t xfer = 0; > > > ::apache::thrift::protocol::TOutputRecursionTracker tracker(*oprot); > > > xfer += oprot->writeStructBegin("data"); > > > > > > xfer += oprot->writeFieldBegin("*f1*", ::apache::thrift::protocol::T_ > > > I16, > > > 1); > > > xfer += oprot->writeI16(this->f1); > > > xfer += oprot->writeFieldEnd(); > > > > > > xfer += oprot->writeFieldBegin("*f2*", ::apache::thrift::protocol::T_ > > > I16, > > > 2); > > > xfer += oprot->writeI16(this->f2); > > > xfer += oprot->writeFieldEnd(); > > > > > > xfer += oprot->writeFieldStop(); > > > xfer += oprot->writeStructEnd(); > > > return xfer; > > > } > > > > > > <<<<<<<<<<<<< > > > > > > and read: > > > > > > >>>>>>>>>>>>> > > > > > > uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) { > > > > > > ::apache::thrift::protocol::TInputRecursionTracker tracker(*iprot); > > > uint32_t xfer = 0; > > > std::string fname; > > > ::apache::thrift::protocol::TType ftype; > > > int16_t fid; > > > > > > xfer += iprot->readStructBegin(fname); > > > > > > using ::apache::thrift::protocol::TProtocolException; > > > > > > while (true) > > > { > > > xfer += iprot->readFieldBegin(*fname*, ftype, fid); > > > if (ftype == ::apache::thrift::protocol::T_STOP) { > > > break; > > > } > > > switch (fid) > > > { > > > case 1: > > > if (ftype == ::apache::thrift::protocol::T_I16) { > > > xfer += iprot->readI16(this->f1); > > > this->__isset.f1 = true; > > > } else { > > > xfer += iprot->skip(ftype); > > > } > > > break; > > > case 2: > > > if (ftype == ::apache::thrift::protocol::T_I16) { > > > xfer += iprot->readI16(this->f2); > > > this->__isset.f2 = true; > > > } else { > > > xfer += iprot->skip(ftype); > > > } > > > break; > > > default: > > > xfer += iprot->skip(ftype); > > > break; > > > } > > > xfer += iprot->readFieldEnd(); > > > } > > > > > > xfer += iprot->readStructEnd(); > > > > > > return xfer; > > > } > > > > > > <<<<<<<<<<<<< > > > > > > > > > --Randy > > > > > > > > > On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo < > [email protected]> > > > wrote: > > > > > > > Also i wonder if what is meant by human readable is simple a clever > way > > > to > > > > generate pcap modules so tools like wireshark/tcp dump can read the > > data. > > > > > > > > > > > > > > > > On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <[email protected]> > > > wrote: > > > > > > > > > Hi Chet, > > > > > > > > > > well, Thrift is primarily about efficiency, not human readability. > If > > > > > machines and programs talk to each other, nobody really needs human > > > > > readable > > > > > messages, because there are no humans involved, except maybe for > > > > debugging > > > > > (but that's not a real production use case). If one asked you to > > pick > > > > just > > > > > one single feature about any Serialization and RPC library, > > potentially > > > > > sacrificing any other requirement if needed, you probably would > > answer > > > > that > > > > > it should be as fast and efficient as possible. > > > > > > > > > > I only wonder if the human readability has sth to do with the fact > > that > > > > > gRPC > > > > > is often found being slower than Thrift ... ;-) > > > > > > > > > > You still want a human readable fomat? Ok, here's how to do it. > > Thrift > > > > > indeed offers the ability to achieve that, because it is a > framework. > > > For > > > > > example, look at the implementation of the TSimpleJSONProtocol > (link > > > > below) > > > > > and use this as a starting point to write your own JSON-like > > TProtocol > > > > > implementation that suits your needs. That's what makes Thrift so > > > > flexible > > > > > - > > > > > even if you have special needs, you need to replace only those > parts > > > and > > > > it > > > > > still simply works. If you prefer XML or some other format, even > that > > > > > should > > > > > be feasible, but you have to invest some work either way. > > > > > > > > > > https://github.com/apache/thrift/blob/master/lib/java/ > > > > > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java > > > > > > > > > > Does that help you? > > > > > > > > > > Have fun, > > > > > JensG > > > > > > > > > > > > > > > -----Ursprüngliche Nachricht----- > > > > > From: Chet Murthy > > > > > Sent: Thursday, September 28, 2017 3:04 AM > > > > > To: [email protected] > > > > > Subject: Human-readable wire-format for Thrift? > > > > > > > > > > [I hope I'm sending this mail to the right list -- it wasn't clear > to > > > me > > > > > that it should go to thrift-dev, so I figured I'd send it here > > first.] > > > > > > > > > > The -one- thing that protobufs has going for it, over Thrift, is > that > > > > > protobufs has "CompactTextFormat" (and JSON too) as full > > wire-formats. > > > > > This is .... incredibly useful for the following use-case: > > > > > > > > > > You want to write a config-file format, and you want to get the > > > benefits > > > > of > > > > > version-to-version compatibility. In your program, you'd like to > > > access > > > > a > > > > > strongly-typed "config object" with typed fields, and you'd -like- > > for > > > > > marshalling to/from flat-text to be automatically generated. > > > > > > > > > > I have personal experience with using protobufs in exactly this > way, > > > and > > > > > it's really, really, really nice. > > > > > > > > > > The current Thrift JSON protocol isn't designed for this, and given > > the > > > > > interface of the (C++) TProtocol class, I think it isn't possible. > > But > > > > > with a small change, it -would- be possible, so I thought I'd > > describe > > > > the > > > > > change, and see what you all thought (b/c it would require a change > > to > > > > > generated code, and to the TProtocol base class interfaces > > > (specifically > > > > to > > > > > the readFieldBegin method): > > > > > > > > > > [I'll describe this for the C++ generated code; I haven't looked > > > > carefully > > > > > into the rest of the languages, but I'd guess that something could > be > > > > > done.] > > > > > > > > > > (0) Let me first note that these datastructures are constant, and > > we're > > > > > talking about passing an extra parameter to the read method listed > > > above. > > > > > That's it. > > > > > > > > > > (1) For concreteness, imagine a couple of message types > > > > > > > > > > struct Bar { > > > > > 4: required i32 a , > > > > > 5: required string b, > > > > > } > > > > > > > > > > struct Foo { > > > > > 1: required i32 a , > > > > > 2: required string b, > > > > > 3: required Bar c, > > > > > } > > > > > > > > > > Again for concreteness, here's an example of the JSON protoocol > for a > > > > value > > > > > of type Foo: > > > > > > > > > > { > > > > > "1": { > > > > > "i32": 1 > > > > > }, > > > > > "2": { > > > > > "str": "ugh" > > > > > }, > > > > > "3": { > > > > > "rec": { > > > > > "4": { > > > > > "i32": 2 > > > > > }, > > > > > "5": { > > > > > "str": "argh" > > > > > } > > > > > } > > > > > } > > > > > } > > > > > > > > > > (2) I'd prefer that that look like: > > > > > { > > > > > "a": 1, > > > > > "b": "ugh", > > > > > "c": { > > > > > "a": 2, > > > > > "b": "argh" > > > > > } > > > > > } > > > > > > > > > > (3) For each message-type, we need a mapping field-name -> > > > > > pair<Thrift-type, field-id>. So, generate a constant > data-structure > > of > > > > > type > > > > > > > > > > map<string, pair<Type, int16_t> > > > > > > > > > > > for each message-type. > > > > > > > > > > (3) Marshalling is easy -- all the field-names are known, and we > > could > > > > just > > > > > emit those instead of field-ids; similarly, we could skip putting > > > > > type-information in the wire-format too. > > > > > > > > > > (4) At demarshalling time, we always know the type of the message > > we're > > > > > demarshalling. So as we read field-names, we can use the map in #3 > > to > > > > look > > > > > up TType and field-id, and then just demarshal in the normal way. > We > > > > just > > > > > need to pass that map as a constref to readFieldBegin. > > > > > > > > > > I -think- that that works, and can't find any problems with what > I've > > > > > described. > > > > > > > > > > I can make this change to the C++ library and code-generator, but > > > before > > > > I > > > > > start down that path, I figured I should get some input on whether > > this > > > > is > > > > > something that the Thrift community (and maintainers) would accept? > > > > > > > > > > I think that a human-readable/writable wire would be immensely > > > valuable, > > > > > and not just for the example of config-files. > > > > > > > > > > Your feedback appreciated, > > > > > --chet-- > > > > > > > > > > > > > > > > > > > >
