Protocol umbrella proposalI'd push back on this implementation simply because I think requiring a round-trip prior to any data transfer is too much. If the problem we're aiming to solve is adding backwards compatibility, I'd recomment we just split the version header into two halves, one being a MAJOR version identifier and one being a MINOR version identifier. We could require that all changes to the MINOR version identifier be made in a backwards-compatible, and that any breaking change introduce a MAJOR version change. Basically, pack more information into the version identifier without requiring any handshake or protocol negotiation.
Also, the version numbers are more defensive than anything else. One of Thrift's design values is to be as "dumb" as possible in situations like this and to avoid smart dynamic checks/negotiations in favor of simple, straightforward procedures that are optimized for performance. Breaking protocol changes should generally be far and few between. What is the major issue that you're running into that's bringing up this problem? Do you have examples other than adding true/false types? Doesn't TBinaryProtocol already support reading/writing the bool type? Do you mean offering a different implementation? Cheers, mcslee ----- Original Message ----- From: Alexander Shigin To: [email protected] Sent: Thursday, October 30, 2008 10:08 AM Subject: Protocol umbrella proposal Protocol umbrella proposal The current versions of binary and dense protocols throw an exception if it reads a bad version id. This makes really difficult to extend protocol, for example to add true and false types to TBinaryProtocol. I see two ways to add protocol negotiation to Thrift: 1. Handshake. 2. Negotiation in-place. Handshake. --------- Before sending the actual message protocol the client will send a special message with a null-size method name and a new TMessageType TNegotiate. Server reads version-id and sends backward max(server max supported version, client max supported version). Client should use this version to talk with the server. I see some disadvantages of this decision: 1. I don't see any ways to make backward compatible realization. 2. Before sending any packet we should send two extra packet of data. In many application the data channel opens for every single call and the overhead will be unacceptable. 3. We need a way to avoid the using of TDenseProtocol (it has version 2) it can be solved by using compatible chains: 1->3->... 4. A problem with async messages. It can be solved by using negotiation on the opening of transport, but it breaks compatibility. And advantages are: 1. We can add some extra information to the negotiation message. 2. It doesn't need to change any protocols. Just add new TUmbrellaProtocol or something like that. Disadvantage #2 can be partially solved if the server sends a list of supported protocols on connect. Negotiation in-place. -------------------- Client sends the message using the lowest protocol version. The version info has the following structure: <1 bit> <7-bit major-version> <4-bit max-version> <4-bit my-version> Major-version shows A protocol family: 1 --- TBinaryProtocol 2 --- TDenseProtocol 3 --- THRIFT-110 protocol max-version is maximum minor version for this protocol. my-version is version of current message. The compatibility between the old and the new app is supported by using one bit after version: I can't find a place where the byte between the version and the message type is used. The server sends reply using max(supported server version, max-version). Client sets my-version to the new value from reply. Here is an example: client can get binary protocol v3 server can get binary protocol v1 1. client calls 0x8001 0101 version 0x8001 - binary protocol (old version-set) option 0x01 - umbrella protocol type 0x01 - call 2. server replies 0x8110 0102 version 0x8110 0x01 - binary protocol (new version-set) 0x1 - max-version 0x0 - message-version option 0x01 - umbrella protocol type 0x02 - reply 3. client calls 0x8131 0101 version 0x8131 0x01 - binary protocol 0x3 - max-version 0x1 - message-version option 0x01 - umbrella protocol type 0x01 - call 4. server replies 0x8111 0102 version 0x8111 # the same #2 0x01 - binary protocol 0x1 - max-version 0x1 - message-version option 0x01 - umbrella protocol type 0x02 - reply I see some disadvantages of THE negotiation in-place: 1. When an application sends a lot of data to the server and gets only a couple bytes, the application gains an advantage only on second call. 2. If an application uses thrift only for async calls (logging or something like it) or uses connect-call-disconnect strategy, it doesn't get any advantages. 3. Protocol becomes stateful. 4. You can't send a first message with the new type. You should send echo message or something like it. Advantages: 1. No additional messages. 2. Fully compatible. Summary: it can work with 127 protocol family with 16 version for each. If someone thinks that 16 versions is not enough we can create A walk-around solution: if max version == 16 switch up to the next protocol. For example if someone creates binary protocol v16, it will have new major number (0x2a for example): 2. server replies 0x81f00102 3. client calls 0xaa000101 - switch to next protocol.
