Hi Hemant, My dataframe "ordrd_emd_df" consist data in order as I have applied oderBy in the first step And also tried having "orderBy" method before "groupBy" than also getting different results in each iteration
Regards, Satish Chandra On Wed, Feb 3, 2016 at 4:28 PM, Hemant Bhanawat <hemant9...@gmail.com> wrote: > Missing order by? > > Hemant Bhanawat > SnappyData (http://snappydata.io/) > > > On Wed, Feb 3, 2016 at 3:45 PM, satish chandra j <jsatishchan...@gmail.com > > wrote: > >> HI All, >> I have data in a emp_df (DataFrame) as mentioned below: >> >> EmpId Sal DeptNo >> 001 100 10 >> 002 120 20 >> 003 130 10 >> 004 140 20 >> 005 150 10 >> >> ordrd_emp_df = emp_df.orderBy($"DeptNo",$"Sal".desc) which results as >> below: >> >> DeptNo Sal EmpId >> 10 150 005 >> 10 130 003 >> 10 100 001 >> 20 140 004 >> 20 120 002 >> >> Now I want to pick highest paid EmpId of each DeptNo.,hence applied agg >> First method as below >> >> >> ordrd_emp_df.groupBy("DeptNo").agg($"DeptNo",first("EmpId").as("TopSal")).select($"DeptNo",$"TopSal") >> >> Expected output is DeptNo TopSal >> 10 005 >> 20 004 >> But my output varies for each iteration such as >> >> First Iteration results as Dept TopSal >> 10 003 >> 20 004 >> >> Secnd Iteration results as Dept TopSal >> 10 005 >> 20 004 >> >> Third Iteration results as Dept TopSal >> 10 003 >> 20 002 >> >> Not sure why output varies on each iteration as no change in code and >> values in DataFrame >> >> Please let me know if any inputs on this >> >> Regards, >> Satish Chandra J >> > >