Hi,

I would like to answer the following customized aggregation query on Spark
SQL
1. Group the table by the value of Name
2. For each group, choose the tuple with the max value of Age (the ages are
distinct for every name)

I am wondering what's the best way to do it on Spark SQL? Should I use
UDAF? Previously I am doing something like the following on Spark:

personRDD.map(t => (t.name, t))
    .reduceByKey((a, b) => if (a.age > b.age) a else b)

Thank you!

Best,
Wenlei

Reply via email to