sql: select inline(arrays_zip(col1, col2, col3)) as (c1, c2, c3) from t1
---- Replied Message ----
| From | Enrico Minack<i...@enrico.minack.dev> |
| Date | 02/16/2023 16:06 |
| To | <user@spark.apache.org> ,
sam smith<qustacksm2123...@gmail.com> |
| Subject | Re: How to explode array columns of a dataframe having the same
length |
You have to take each row and zip the lists, each element of the result becomes
one new row.
So turn write a method that turns
Row(List("A","B","null"), List("C","D","null"), List("E","null","null"))
into
List(List("A","C","E"), List("B","D","null"), List("null","null","null"))
and use flatmap with that method.
In Scala, this would read:
df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1),
row.getSeq[String](2)).zipped.toIterable }.show()
Enrico
Am 14.02.23 um 22:54 schrieb sam smith:
Hello guys,
I have the following dataframe:
|
|
col1
|
col2
|
col3
|
|
["A","B","null"]
|
["C","D","null"]
|
["E","null","null"]
|
|
|
|
I want to explode it to the following dataframe:
|
col1
|
col2
|
col3
|
|
"A"
|
"C"
|
"E"
|
|
"B"
|
"D"
|
"null"
|
|
"null"
|
"null"
|
"null"
|
How to do that (preferably in Java) using the explode() method ? knowing that
something like the following won't yield correct output:
for (String colName: dataset.columns())
dataset=dataset.withColumn(colName,explode(dataset.col(colName)));