Hi everyone, Consider the following code:
val result = df.groupBy("col1").agg(min("col2")) I know that rdd.reduceByKey(func) produces the same RDD as rdd.groupByKey().mapValues(value => value.reduce(func)) However reducerByKey is more efficient as it avoids shipping each value to the reducer doing the aggregation (it ships partial aggregations instead). I wonder whether the DataFrame API optimizes the code doing something similar to what RDD.reduceByKey does. I am using Spark 1.6.2. Regards, Luis -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-reduceByKey-functionality-in-DataFrame-API-tp27508.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org