Can we avoid multiple group by , l have a million records and its a performance concern.
Below is my query , even with Windows functions also i guess it is a performance hit, can you please advice if there is a better alternative. I need to get max no of equipments for that house for list of dates ds.groupBy("house", "date").agg(countDistinct("equiId") as "count"). drop("date").groupBy("house").agg(max("count") as "noOfEquipments") Regards, Kumar