Hi, I am trying something like this..
val sesDS: Dataset[XXX] = hiveContext.sql(select).as[XXX] The select statement is something like this : "select * from sometable .... DISTRIBUTE by col1, col2, col3" Then comes groupByKey... val gpbyDS = sesDS .groupByKey(x => (x.col1, x.col2, x.col3)) As my select is already Distribute the data based on columns which are same as what I used in groupByKey, Why does groupByKey still doing the shuffle ? Is this an issue or I am missing something ? Regards, Dibyendu