HI All,
I have data in a emp_df (DataFrame) as mentioned below:

EmpId   Sal   DeptNo
001       100   10
002       120   20
003       130   10
004       140   20
005       150   10

ordrd_emp_df = emp_df.orderBy($"DeptNo",$"Sal".desc)  which results as
below:

DeptNo  Sal   EmpId
10         150   005
10         130   003
10         100   001
20         140   004
20         120   002

Now I want to pick highest paid EmpId of each DeptNo.,hence applied agg
First method as below

ordrd_emp_df.groupBy("DeptNo").agg($"DeptNo",first("EmpId").as("TopSal")).select($"DeptNo",$"TopSal")

Expected output is DeptNo  TopSal
                              10        005
                               20       004
But my output varies for each iteration such as

First Iteration results as  Dept  TopSal
                                      10     003
                                       20     004

Secnd Iteration results as Dept  TopSal
                                      10     005
                                      20     004

Third Iteration results as  Dept  TopSal
                                      10     003
                                      20     002

Not sure why output varies on each iteration as no change in code and
values in DataFrame

Please let me know if any inputs on this

Regards,
Satish Chandra J

Reply via email to