subject:"RE\: Dataset API \| Setting number of partitions during join\/groupBy"

Re: Dataset API | Setting number of partitions during join/groupBy

2016-11-11 Thread Aniket Bhatnagar

Hi Shreya Initial partitions in the Datasets were more than 1000 and after a group by operation, the resultant Dataset had only 200 partitions (because by default number of partitions being set to 200). Any further operations on the resultant Dataset will have a maximum of 200 parallelism

RE: Dataset API | Setting number of partitions during join/groupBy

2016-11-11 Thread Shreya Agarwal

Curious – why do you want to repartition? Is there a subsequent step which fails because the number of partitions is less? Or you want to do it for a perf gain? Also, what were your initial Dataset partitions and how many did you have for the result of join? From: Aniket Bhatnagar