Enrico Minack created SPARK-39292: ------------------------------------- Summary: Make Dataset.melt work with struct fields Key: SPARK-39292 URL: https://issues.apache.org/jira/browse/SPARK-39292 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Enrico Minack
In SPARK-38864, the melt function was added to Dataset. It would be nice if fields of struct fields could be used as id and value columns. This would allow for the following: Given a Dataset with following schema: {code:java} root |-- an: struct (nullable = false) | |-- id: integer (nullable = false) |-- str: struct (nullable = false) | |-- one: string (nullable = true) | |-- two: string (nullable = true) {code} For example: {code:java} +---+-------------+ | an| str| +---+-------------+ |{1}| {one, One}| |{2}| {two, null}| |{3}|{null, three}| |{4}| {null, null}| +---+-------------+ {code} Melting with value columns {{Seq("str.one", "str.two")}} on id columns {{Seq("an.id")}} would result in {code:java} +--+--------+-----+ |an|variable|value| +--+--------+-----+ | 1| str.one| one| | 1| str.two| One| | 2| str.one| two| | 2| str.two| null| | 3| str.one| null| | 3| str.two|three| | 4| str.one| null| | 4| str.two| null| +--+--------+-----+ {code} See test in {{org.apache.spark.sql.MeltSuite}}: {code:java} test("melt with struct fields") { val df = meltWideDataDs.select( struct($"id").as("an"), struct( $"str1".as("one"), $"str2".as("two") ).as("str") ) checkAnswer( Melt.of(df, Seq("an.id"), Seq("str.one", "str.two")), meltedWideDataRows.map(row => Row( row.getInt(0), row.getString(1) match { case "str1" => "str.one" case "str2" => "str.two" }, row.getString(2) )) ) } {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org