I'm fine with implementing compareTo.  With proper type annotations,
we could provide a way to define that a given map be instantiated as
a TreeMap instead of a HashMap when deserializing, which would enforce
an order.

--David

Bryan Duxbury wrote:
> We use Thrift structures within Hadoop Map/Reduce. Occasionally, a 
> Thrift object will be our grouping or join key. Usually, this works 
> great, but occasionally, there are some issues. In particular, we 
> have trouble with maps and sets. The problem is that the ordering of 
> the map/set internally is arbitrary, and we serialize in that 
> arbitrary order. The result is that two 'equal' objects might not 
> serialize into the same byte array, and therefore fail equality 
> checks based only on the serialized data.
> 
> I was wondering if it would make sense to enforce some sort of 
> ordering scheme for collections where order might be arbitrary, at 
> least during serialization. This would necessitate implementing a 
> decent compareTo on generated Thrift structs so we could sort before 
> writing, and obviously, it would include sorting overhead.
> 
> Are other people interested in making this use case work acceptably?
> 
> -Bryan
> 

Reply via email to