Hi, I would like to answer the following customized aggregation query on Spark SQL 1. Group the table by the value of Name 2. For each group, choose the tuple with the max value of Age (the ages are distinct for every name)
I am wondering what's the best way to do it on Spark SQL? Should I use UDAF? Previously I am doing something like the following on Spark: personRDD.map(t => (t.name, t)) .reduceByKey((a, b) => if (a.age > b.age) a else b) Thank you! Best, Wenlei