Re: [JAVA] Handling repeated elements when merging two pcollections

2022-08-10 Thread Luke Cwik via user
Sorry, I should have said that you should Flatten and do a GroupByKey, not a CoGroupByKey making the pipeline like: PCollectionA -> Flatten -> GroupByKey -> ParDo(EmitOnlyFirstElementPerKey) PCollectionB -/ The CoGroupByKey will have one iterable per PCollection containing zero or more elements

Re: [JAVA] Handling repeated elements when merging two pcollections

2022-08-10 Thread Shivam Singhal
Think this should solve my problem. Thanks Evan ans Luke! On Thu, 11 Aug 2022 at 1:49 AM, Luke Cwik via user wrote: > Use CoGroupByKey to join the two PCollections and emit only the first > value of each iterable with the key. > > Duplicates will appear as iterables with more then one value

Re: [JAVA] Handling repeated elements when merging two pcollections

2022-08-10 Thread Luke Cwik via user
Use CoGroupByKey to join the two PCollections and emit only the first value of each iterable with the key. Duplicates will appear as iterables with more then one value while keys without duplicates will have iterables containing exactly one value. On Wed, Aug 10, 2022 at 12:25 PM Shivam Singhal

Re: [JAVA] Handling repeated elements when merging two pcollections

2022-08-10 Thread Evan Galpin
Hi Shivam, When you say "merge the PCollections" do you mean Flatten, or somehow join? CoGroupByKey[1] would be a good choice if you need to join based on key. You would then be able to implement application logic to keep 1 of the 2 records if there is a way to decipher an element from

[JAVA] Handling repeated elements when merging two pcollections

2022-08-10 Thread Shivam Singhal
I have two PCollections, CollectionA & CollectionB of type KV. I would like to merge them into one PCollection but CollectionA & CollectionB might have some elements with the same key. In those repeated cases, I would like to keep the element from CollectionA & drop the repeated element from