Thanks Sujit, Michael, The list of columns is data driven (and in the order of 100s), but your 2nd example looks exactly like the thing I want.
Appreciate the help! On 24 March 2016 at 20:20, Michael Armbrust <mich...@databricks.com> wrote: > If you know the map keys ahead of time then you can just extract them > directly. > > Here are a few examples > <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1826874816150656/2840265927289860/latest.html> > . > > On Thu, Mar 24, 2016 at 12:01 PM, Michał Zieliński < > zielinski.mich...@gmail.com> wrote: > >> Hi, >> >> Imagine you have a structure like this: >> >> val events = sqlContext.createDataFrame( >> Seq( >> ("a", Map("a"->1,"b"->1)), >> ("b", Map("b"->1,"c"->1)), >> ("c", Map("a"->1,"c"->1)) >> ) >> ).toDF("id","map") >> >> What I want to achieve is have the map values as a separate columns. >> Basically I want to achieve this: >> >> +---+----+----+----+ >> | id| a| b| c| >> +---+----+----+----+ >> | a| 1| 1|null| >> | b|null| 1| 1| >> | c| 1|null| 1| >> +---+----+----+----+ >> >> I managed to create it with an explode-pivot combo, but for large >> dataset, and a list of map keys around 1000 I imagine this will >> be prohibitively expensive. I reckon there must be a much easier way to >> achieve that, than: >> >> val exploded = >> events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value") >> >> Any help would be appreciated. :) >> > >