Protocol umbrella proposalI'd push back on this implementation simply because I 
think requiring a round-trip prior to any data transfer is too much. If the 
problem we're aiming to solve is adding backwards compatibility, I'd recomment 
we just split the version header into two halves, one being a MAJOR version 
identifier and one being a MINOR version identifier. We could require that all 
changes to the MINOR version identifier be made in a backwards-compatible, and 
that any breaking change introduce a MAJOR version change. Basically, pack more 
information into the version identifier without requiring any handshake or 
protocol negotiation.

Also, the version numbers are more defensive than anything else. One of 
Thrift's design values is to be as "dumb" as possible in situations like this 
and to avoid smart dynamic checks/negotiations in favor of simple, 
straightforward procedures that are optimized for performance. Breaking 
protocol changes should generally be far and few between.

What is the major issue that you're running into that's bringing up this 
problem? Do you have examples other than adding true/false types? Doesn't 
TBinaryProtocol already support reading/writing the bool type? Do you mean 
offering a different implementation?

Cheers,
mcslee
  ----- Original Message ----- 
  From: Alexander Shigin 
  To: [email protected] 
  Sent: Thursday, October 30, 2008 10:08 AM
  Subject: Protocol umbrella proposal




  Protocol umbrella proposal

  The current versions of binary and dense protocols throw an exception if
  it reads a bad version id. This makes really difficult to extend
  protocol, for example to add true and false types to TBinaryProtocol.

  I see two ways to add protocol negotiation to Thrift:
     1. Handshake.
     2. Negotiation in-place.

   Handshake.
   ---------

  Before sending the actual message protocol the client will send a
  special message with a null-size method name and a new TMessageType
  TNegotiate. Server reads version-id and sends backward max(server max
  supported version, client max supported version). Client should use this
  version to talk with the server.

  I see some disadvantages of this decision:
     1. I don't see any ways to make backward compatible realization.
     2. Before sending any packet we should send two extra packet of data.
  In many application the data channel opens for every single call and the
  overhead will be unacceptable.
     3. We need a way to avoid the using of TDenseProtocol (it has version
  2) it can be solved by using compatible chains: 1->3->...
     4. A problem with async messages. It can be solved by using
  negotiation on the opening of transport, but it breaks compatibility.

  And advantages are:
     1. We can add some extra information to the negotiation message.
     2. It doesn't need to change any protocols. Just add new
  TUmbrellaProtocol or something like that.

  Disadvantage #2 can be partially solved if the server sends a list of
  supported protocols on connect.

   Negotiation in-place.
   --------------------

  Client sends the message using the lowest protocol version. The version
  info has the following structure:
     <1 bit> <7-bit major-version> <4-bit max-version> <4-bit my-version>

  Major-version shows A protocol family:
     1 --- TBinaryProtocol
     2 --- TDenseProtocol
     3 --- THRIFT-110 protocol

   max-version is maximum minor version for this protocol.
   my-version is version of current message.

  The compatibility between the old and the new app is supported by using
  one bit after version: I can't find a place where the byte between the
  version and the message type is used.

  The server sends reply using max(supported server version, max-version).
  Client sets my-version to the new value from reply.

  Here is an example:
     client can get binary protocol v3
     server can get binary protocol v1

   1. client calls   0x8001 0101
             version 0x8001 - binary protocol (old version-set)
             option  0x01   - umbrella protocol
             type    0x01   - call

   2. server replies 0x8110 0102
             version 0x8110
                     0x01   - binary protocol (new version-set)
                       0x1  - max-version
                        0x0 - message-version
             option  0x01   - umbrella protocol
             type    0x02   - reply

   3. client calls   0x8131 0101
             version 0x8131
                     0x01   - binary protocol
                       0x3  - max-version
                        0x1 - message-version
             option  0x01   - umbrella protocol
             type    0x01   - call

   4. server replies 0x8111 0102
             version 0x8111 # the same #2
                     0x01   - binary protocol
                       0x1  - max-version
                        0x1 - message-version
             option  0x01   - umbrella protocol
             type    0x02   - reply

  I see some disadvantages of THE negotiation in-place:
     1. When an application sends a lot of data to the server and gets
  only a couple bytes, the application gains an advantage only on second
  call.
     2. If an application uses thrift only for async calls (logging or
  something like it) or uses connect-call-disconnect strategy, it doesn't
  get any advantages.
     3. Protocol becomes stateful.
     4. You can't send a first message with the new type. You should send
  echo message or something like it.

  Advantages:
     1. No additional messages.
     2. Fully compatible.

  Summary: it can work with 127 protocol family with 16 version for each.
  If someone thinks that 16 versions is not enough we can create A
  walk-around solution: if max version == 16 switch up to the next
  protocol.

  For example if someone creates binary protocol v16, it will have new
  major number (0x2a for example):
   2. server replies 0x81f00102
   3. client calls   0xaa000101 - switch to next protocol.


Reply via email to