Hey all,

I wanted to propose an idea I've been toying around with for a little while. At Rapleaf, we've developed a sort of pattern for creating Thrift objects that are all of the same type, but by convention only contain one of many fields. That is, every one of our objects has two required fields and ~80 optional fields, of which only one is ever set. This gives us union-like functionality, so our object could conceivably be of any one the many subtypes. One of the two required fields is an i32 that contains the Thrift field id of the field that should be set.

There are a lot of things that are good about this approach: it's simple, it's pretty sparse on the wire, and its very flexible. However, there are some things about it that aren't so great: nothing in Thrift validates the relationship between the field type specifier field and the data field or guarantees only one of the data fields is set. We've been able to work around these limitations for the most part, but it's been something we've had to deal with at the application level. Another limitation is that the cpu impact attempting to serialize 79 unset fields seems to be as much as 100% overhead to serialization performance.

What I would really like is for Thrift to support this behavior natively. I was thinking that we could add a "union" construct to Thrift, which would exist parallel to "struct". In some languages, this could potentially map to an actual union; in languages like Java and Ruby, we'd probably have to make some sort of TUnion class to mimic the behavior. I don't think we'd need a new wire type or anything. All of the behavior changes would be in the generated code, making sure that it didn't read two values for the same union field, etc.

What do people think about this idea? If we like it we can start to flesh it out some more and then open a ticket to get implementations going.

-Bryan

Reply via email to