Enrico Minack created SPARK-39292:
-------------------------------------

             Summary: Make Dataset.melt work with struct fields
                 Key: SPARK-39292
                 URL: https://issues.apache.org/jira/browse/SPARK-39292
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Enrico Minack


In SPARK-38864, the melt function was added to Dataset.

It would be nice if fields of struct fields could be used as id and value 
columns. This would allow for the following:

Given a Dataset with following schema:
{code:java}
root
 |-- an: struct (nullable = false)
 |    |-- id: integer (nullable = false)
 |-- str: struct (nullable = false)
 |    |-- one: string (nullable = true)
 |    |-- two: string (nullable = true)
{code}

For example:
{code:java}
+---+-------------+
| an|          str|
+---+-------------+
|{1}|   {one, One}|
|{2}|  {two, null}|
|{3}|{null, three}|
|{4}| {null, null}|
+---+-------------+
{code}
Melting with value columns {{Seq("str.one", "str.two")}} on id columns 
{{Seq("an.id")}} would result in
{code:java}
+--+--------+-----+
|an|variable|value|
+--+--------+-----+
| 1| str.one|  one|
| 1| str.two|  One|
| 2| str.one|  two|
| 2| str.two| null|
| 3| str.one| null|
| 3| str.two|three|
| 4| str.one| null|
| 4| str.two| null|
+--+--------+-----+
{code}

See test in {{org.apache.spark.sql.MeltSuite}}:
{code:java}
  test("melt with struct fields") {
    val df = meltWideDataDs.select(
      struct($"id").as("an"),
      struct(
        $"str1".as("one"),
        $"str2".as("two")
      ).as("str")
    )

    checkAnswer(
      Melt.of(df, Seq("an.id"), Seq("str.one", "str.two")),
      meltedWideDataRows.map(row => Row(
        row.getInt(0),
        row.getString(1) match {
          case "str1" => "str.one"
          case "str2" => "str.two"
        },
        row.getString(2)
      ))
    )
  }
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to