Hey all,
I wanted to propose an idea I've been toying around with for a little
while. At Rapleaf, we've developed a sort of pattern for creating
Thrift objects that are all of the same type, but by convention only
contain one of many fields. That is, every one of our objects has two
required fields and ~80 optional fields, of which only one is ever
set. This gives us union-like functionality, so our object could
conceivably be of any one the many subtypes. One of the two required
fields is an i32 that contains the Thrift field id of the field that
should be set.
There are a lot of things that are good about this approach: it's
simple, it's pretty sparse on the wire, and its very flexible.
However, there are some things about it that aren't so great: nothing
in Thrift validates the relationship between the field type specifier
field and the data field or guarantees only one of the data fields is
set. We've been able to work around these limitations for the most
part, but it's been something we've had to deal with at the
application level. Another limitation is that the cpu impact
attempting to serialize 79 unset fields seems to be as much as 100%
overhead to serialization performance.
What I would really like is for Thrift to support this behavior
natively. I was thinking that we could add a "union" construct to
Thrift, which would exist parallel to "struct". In some languages,
this could potentially map to an actual union; in languages like Java
and Ruby, we'd probably have to make some sort of TUnion class to
mimic the behavior. I don't think we'd need a new wire type or
anything. All of the behavior changes would be in the generated code,
making sure that it didn't read two values for the same union field,
etc.
What do people think about this idea? If we like it we can start to
flesh it out some more and then open a ticket to get implementations
going.
-Bryan
- Addition of union type Bryan Duxbury
-