Follow the first approach. Joins are expensive, union comes for free. Best, Fabian
2014-12-22 11:47 GMT+01:00 Flavio Pompermaier <[email protected]>: > Hi guys, > > In my use case I have multiple Datasets with the same structure (e.g. > Tuple3) and I want to produce an output Dataset containing all Tuple3 > grouped by the first field (0). > I can obtain the same results performing a union of all datasets and then > a group by (simplest implementation) or join all of them pairwise > (((A->B)->C)->D)..) or I don't know if there is any other solution. When > should I use the first or the second approach? Could you help me in > figuring out the internals of the two approaches? I always have some fear > when using multiple joins when I don't know exactly their size.. > > Best, > Flavio >
