Hello, I am new to Spark and just running some tests to get familiar with the APIs.
When calling the rollup function on my DataFrame, I get different results when I alias the columns I am grouping on (see below for example data set). I was expecting alias function to only affect the column name. Why is it also affecting the rollup results? (I know I can rename my columns after the rollup call, using withColumnRenamed function, my question is just to get better understanding of alias function.) scala> df.show +----+------+-----+ |Name| Game|Score| +----+------+-----+ | Bob|Game 1| 20| | Bob|Game 2| 30| | Lea|Game 1| 25| | Lea|Game 2| 30| | Ben|Game 1| 5| | Ben|Game 3| 35| | Bob|Game 3| 15| +----+------+-----+ //rollup results as expected scala> df.rollup(df("Name"), df("Game")).sum().orderBy("Name", "Game").show +----+------+----------+ |Name| Game|SUM(Score)| +----+------+----------+ |null| null| 160| | Ben| null| 40| | Ben|Game 1| 5| | Ben|Game 3| 35| | Bob| null| 65| | Bob|Game 1| 20| | Bob|Game 2| 30| | Bob|Game 3| 15| | Lea| null| 55| | Lea|Game 1| 25| | Lea|Game 2| 30| +----+------+----------+ //rollup with aliases return strange results scala> df.rollup(df("Name") as "Player", df("Game") as "Round").sum().orderBy("Player", "Round").show +------+------+----------+ |Player| Round|SUM(Score)| +------+------+----------+ | Ben|Game 1| 5| | Ben|Game 1| 5| | Ben|Game 1| 5| | Ben|Game 3| 35| | Ben|Game 3| 35| | Ben|Game 3| 35| | Bob|Game 1| 20| | Bob|Game 1| 20| | Bob|Game 1| 20| | Bob|Game 2| 30| | Bob|Game 2| 30| | Bob|Game 2| 30| | Bob|Game 3| 15| | Bob|Game 3| 15| | Bob|Game 3| 15| | Lea|Game 1| 25| | Lea|Game 1| 25| | Lea|Game 1| 25| | Lea|Game 2| 30| | Lea|Game 2| 30| +------+------+----------+ Thanks in advance for your help, Isabelle