[jira] [Created] (SPARK-18393) DataFrame pivot output column names should respect aliases

Eric Liang (JIRA) Wed, 09 Nov 2016 16:11:39 -0800

Eric Liang created SPARK-18393:
----------------------------------

             Summary: DataFrame pivot output column names should respect aliases
                 Key: SPARK-18393
                 URL: https://issues.apache.org/jira/browse/SPARK-18393
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Eric Liang
            Priority: Minor



For example

{code}
val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b")
df
  .groupBy('x)
  .pivot("a", Seq(0, 1))
  .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo"))
  .show()
+---+--------------------+---------------------+--------------------+---------------------+
|  x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS 
`blah`|1_count(`b`) AS `foo`|
+---+--------------------+---------------------+--------------------+---------------------+
|  0|                 450|                   10|                 500|           
        10|
|  1|                 510|                   10|                 460|           
        10|
|  3|                 530|                   10|                 480|           
        10|
|  2|                 470|                   10|                 520|           
        10|
|  4|                 490|                   10|                 540|           
        10|
+---+--------------------+---------------------+--------------------+---------------------+
{code}

The column names here are quite hard to read. Ideally we would respect the 
aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18393) DataFrame pivot output column names should respect aliases

Reply via email to