[ https://issues.apache.org/jira/browse/BEAM-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Moravek resolved BEAM-6812. --------------------------------- Resolution: Fixed Fix Version/s: 2.12.0 > Convert keys to ByteArray in Combine.perKey for Spark > ----------------------------------------------------- > > Key: BEAM-6812 > URL: https://issues.apache.org/jira/browse/BEAM-6812 > Project: Beam > Issue Type: Bug > Components: runner-spark > Reporter: Ankit Jhalaria > Assignee: Ankit Jhalaria > Priority: Critical > Fix For: 2.12.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > * During calls to Combine.perKey, we want they keys used to have consistent > hashCode when invoked from different JVM's. > * However, while testing this in our company we found out that when using > protobuf as keys during combine, the hashCodes can be different for the same > key when invoked from different JVMs. This results in duplicates. > * `ByteArray` class in Spark has a stable has code when dealing with arrays > as well. > * GroupByKey correctly converts keys to `ByteArray` and uses coders for > serialization. > * The fix does something similar when dealing with combines. -- This message was sent by Atlassian JIRA (v7.6.3#76005)