[ https://issues.apache.org/jira/browse/SPARK-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin resolved SPARK-4367. -------------------------------- Resolution: Fixed Assignee: Yin Huai Fix Version/s: 1.5.0 > Partial aggregation support the DISTINCT aggregation > ---------------------------------------------------- > > Key: SPARK-4367 > URL: https://issues.apache.org/jira/browse/SPARK-4367 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Cheng Hao > Assignee: Yin Huai > Fix For: 1.5.0 > > > Most of aggregate function(e.g average) with "distinct" value will requires > all of the records in the same group to be shuffled into a single node, > however, as part of the optimization, those records can be partially > aggregated before shuffling, that probably reduces the overhead of shuffling > significantly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org