I just started learning Java today to attempt to convert our python
pipelines to Java to take advantage of key features that Java has. I have
no idea how I would create a new coder and include it in for beam to
recognize.

If you can point me in the right direction of where it hooks together I
might be able to figure that out. I can duplicate MapCoder and try to make
changes, but how will beam know to pick up that coder for a groupByKey?

Thanks!
Shannon

On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ruw...@google.com> wrote:

> It could be just straightforward to create a SortedMapCoder for TreeMap.
> Just add checks on map instances and then change verifyDeterministic.
>
> If this is a common need we could just submit it into Beam repo.
>
> [1]:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>
> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <m...@mikepedersen.dk>
> wrote:
>
>> There isn't a coder for deterministic maps in Beam, so even if your
>> datastructure is deterministic, Beam will assume the serialized bytes
>> aren't deterministic.
>>
>> You could make one using the MapCoder as a guide:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>> Just change it such that the exception in VerifyDeterministic is removed
>> and when decoding it instantiates a TreeMap or such instead of a HashMap.
>>
>> Alternatively, you could just represent your key as a sorted list of KV
>> pairs. Lookups could be done using binary search if necessary.
>>
>> Mike
>>
>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>> joseph.dun...@liveramp.com>:
>>
>>> So I'm working on essentially doing a word-count on a complex data
>>> structure.
>>>
>>> I tried just using a HashMap as the Structure, but that didn't work
>>> because it is non-deterministic.
>>>
>>> However when Given a LinkedHashMap or TreeMap which is deterministic the
>>> SDK complains that it's non-deterministic when trying to use it as a key
>>> for GroupByKey.
>>>
>>> What would be an appropriate Map style data structure that would be
>>> deterministic enough for Apache Beam to accept it as a key?
>>>
>>> Thanks,
>>> Shannon
>>>
>>

Reply via email to