HI All, I have data in a emp_df (DataFrame) as mentioned below: EmpId Sal DeptNo 001 100 10 002 120 20 003 130 10 004 140 20 005 150 10
ordrd_emp_df = emp_df.orderBy($"DeptNo",$"Sal".desc) which results as below: DeptNo Sal EmpId 10 150 005 10 130 003 10 100 001 20 140 004 20 120 002 Now I want to pick highest paid EmpId of each DeptNo.,hence applied agg First method as below ordrd_emp_df.groupBy("DeptNo").agg($"DeptNo",first("EmpId").as("TopSal")).select($"DeptNo",$"TopSal") Expected output is DeptNo TopSal 10 005 20 004 But my output varies for each iteration such as First Iteration results as Dept TopSal 10 003 20 004 Secnd Iteration results as Dept TopSal 10 005 20 004 Third Iteration results as Dept TopSal 10 003 20 002 Not sure why output varies on each iteration as no change in code and values in DataFrame Please let me know if any inputs on this Regards, Satish Chandra J