Eric Liang created SPARK-18393: ---------------------------------- Summary: DataFrame pivot output column names should respect aliases Key: SPARK-18393 URL: https://issues.apache.org/jira/browse/SPARK-18393 Project: Spark Issue Type: Improvement Components: SQL Reporter: Eric Liang Priority: Minor
For example {code} val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b") df .groupBy('x) .pivot("a", Seq(0, 1)) .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo")) .show() +---+--------------------+---------------------+--------------------+---------------------+ | x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS `blah`|1_count(`b`) AS `foo`| +---+--------------------+---------------------+--------------------+---------------------+ | 0| 450| 10| 500| 10| | 1| 510| 10| 460| 10| | 3| 530| 10| 480| 10| | 2| 470| 10| 520| 10| | 4| 490| 10| 540| 10| +---+--------------------+---------------------+--------------------+---------------------+ {code} The column names here are quite hard to read. Ideally we would respect the aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org