Hi,

Imagine you have a structure like this:

val events = sqlContext.createDataFrame(
   Seq(
     ("a", Map("a"->1,"b"->1)),
     ("b", Map("b"->1,"c"->1)),
     ("c", Map("a"->1,"c"->1))
   )
 ).toDF("id","map")

What I want to achieve is have the map values as a separate columns.
Basically I want to achieve this:

+---+----+----+----+
| id|   a|   b|   c|
+---+----+----+----+
|  a|   1|   1|null|
|  b|null|   1|   1|
|  c|   1|null|   1|
+---+----+----+----+

I managed to create it with an explode-pivot combo, but for large dataset,
and a list of map keys around 1000 I imagine this will
be prohibitively expensive. I reckon there must be a much easier way to
achieve that, than:

val exploded =
events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")

Any help would be appreciated. :)

Reply via email to