Re: Dataset API | Setting number of partitions during join/groupBy

2016-11-11 Thread Aniket Bhatnagar
:* Friday, November 11, 2016 9:22 AM > *To:* user <user@spark.apache.org> > *Subject:* Dataset API | Setting number of partitions during join/groupBy > > > > Hi > > > > I can't seem to find a way to pass number of partitions while join 2 > Datasets or doing

RE: Dataset API | Setting number of partitions during join/groupBy

2016-11-11 Thread Shreya Agarwal
[mailto:aniket.bhatna...@gmail.com] Sent: Friday, November 11, 2016 9:22 AM To: user <user@spark.apache.org> Subject: Dataset API | Setting number of partitions during join/groupBy Hi I can't seem to find a way to pass number of partitions while join 2 Datasets or doing a groupBy operation on the D

Dataset API | Setting number of partitions during join/groupBy

2016-11-11 Thread Aniket Bhatnagar
Hi I can't seem to find a way to pass number of partitions while join 2 Datasets or doing a groupBy operation on the Dataset. There is an option of repartitioning the resultant Dataset but it's inefficient to repartition after the Dataset has been joined/grouped into default number of partitions.