Re: map-side-combine in Spark SQL

2016-02-10 Thread Reynold Xin
I'm not 100% sure I understand your question, but yes, Spark (both the RDD API and SQL/DataFrame) does partial aggregation. On Tue, Feb 9, 2016 at 8:37 PM, Rishitesh Mishra wrote: > Can anybody confirm, whether ANY operator in Spark SQL uses > map-side-combine ? If not, is it safe to assume Sor

map-side-combine in Spark SQL

2016-02-09 Thread Rishitesh Mishra
Can anybody confirm, whether ANY operator in Spark SQL uses map-side-combine ? If not, is it safe to assume SortShuffleManager will always use Serialized sorting in case of queries from Spark SQL ?