I would go with an outer join as Stefano suggested.
Outer joins can be executed as hash joins which will probably be more
efficient than using a sort based groupBy/reduceGroup.
Also outer joins are a more intuitive and simpler, IMO.

2016-04-07 12:35 GMT+02:00 Stefano Baghino <stefano.bagh...@radicalbit.io>:

> Perhaps an outer join can do the trick as well but I don't know which one
> would perform better.
>
> On Thu, Apr 7, 2016 at 12:05 PM, Lydia Ickler <ickle...@googlemail.com>
> wrote:
>
>>  Nevermind! I figured it out with groupby and
>> Reducegroup
>>
>> Von meinem iPhone gesendet
>>
>> > Am 07.04.2016 um 11:51 schrieb Lydia Ickler <ickle...@googlemail.com>:
>> >
>> > Hi,
>> >
>> > If i have 2 DataSets A and B of Type Tuple3<Integer,Integer,Double> how
>> would I get a subset of A (based on the fields (0,1)) that does not occur
>> in B?
>> > Is there maybe an already implemented method?
>> >
>> > Best regards,
>> > Lydia
>> >
>> > Von meinem iPhone gesendet
>>
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>

Reply via email to