Cogroup hints/performance

2017-02-07 Thread Newport, Billy
We have a cogroup where sometimes we cogroup like this: Dataset z = larger.coGroup(small).where... The strategy is printed as hash on key and a sort asc on the other key. Which is which? Naively, we'd want to hash larger and sort the small? Or is that wrong? What factors would impact the perfo

Re: Cogroup hints/performance

2017-02-07 Thread Fabian Hueske
Hi Billy, A CoGroup does not have any freedom in its execution strategy. It requires that both inputs are partitioned on the grouping keys and are then performs a local sort-merge join, i.e, both inputs are sorted. Existing partitioning or sort orders can be reused. Since there is only one execut