Hallo, I'm working with avro as the serialization framework for my hadoop map-reduce jobs, and am emitting GenericRecord/null as my K/V values from my mapper classes. Having looked at the code, I see that the "key" objects (i.e. my records) are only recognised as being discrete by my reducer if it sees that the .equals() method called on the record shows a distinction. However, if the schema is the same (which it is for most of my mappers), then .equals() calls .compare(), which in turn depends on the ORDER attributes set on the fields. This means that if I have no sorting defined in my schema, that all records are treated as being equal to one another. Have I understood this correctly, and if so, is that not a violation of the equals contract? (for one thing, it would mean GenericRecord objects will often cause confusion when used with maps and other containers).
Regards, Andrew