I just started learning Java today to attempt to convert our python pipelines to Java to take advantage of key features that Java has. I have no idea how I would create a new coder and include it in for beam to recognize.
If you can point me in the right direction of where it hooks together I might be able to figure that out. I can duplicate MapCoder and try to make changes, but how will beam know to pick up that coder for a groupByKey? Thanks! Shannon On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote: > It could be just straightforward to create a SortedMapCoder for TreeMap. > Just add checks on map instances and then change verifyDeterministic. > > If this is a common need we could just submit it into Beam repo. > > [1]: > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146 > > On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <m...@mikepedersen.dk> > wrote: > >> There isn't a coder for deterministic maps in Beam, so even if your >> datastructure is deterministic, Beam will assume the serialized bytes >> aren't deterministic. >> >> You could make one using the MapCoder as a guide: >> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java >> Just change it such that the exception in VerifyDeterministic is removed >> and when decoding it instantiates a TreeMap or such instead of a HashMap. >> >> Alternatively, you could just represent your key as a sorted list of KV >> pairs. Lookups could be done using binary search if necessary. >> >> Mike >> >> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan < >> joseph.dun...@liveramp.com>: >> >>> So I'm working on essentially doing a word-count on a complex data >>> structure. >>> >>> I tried just using a HashMap as the Structure, but that didn't work >>> because it is non-deterministic. >>> >>> However when Given a LinkedHashMap or TreeMap which is deterministic the >>> SDK complains that it's non-deterministic when trying to use it as a key >>> for GroupByKey. >>> >>> What would be an appropriate Map style data structure that would be >>> deterministic enough for Apache Beam to accept it as a key? >>> >>> Thanks, >>> Shannon >>> >>