Thanks Sujit, Michael,

The list of columns is data driven (and in the order of 100s), but your 2nd
example looks exactly like the thing I want.

Appreciate the help!

On 24 March 2016 at 20:20, Michael Armbrust <mich...@databricks.com> wrote:

> If you know the map keys ahead of time then you can just extract them
> directly.
>
> Here are a few examples
> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1826874816150656/2840265927289860/latest.html>
> .
>
> On Thu, Mar 24, 2016 at 12:01 PM, Michał Zieliński <
> zielinski.mich...@gmail.com> wrote:
>
>> Hi,
>>
>> Imagine you have a structure like this:
>>
>> val events = sqlContext.createDataFrame(
>>    Seq(
>>      ("a", Map("a"->1,"b"->1)),
>>      ("b", Map("b"->1,"c"->1)),
>>      ("c", Map("a"->1,"c"->1))
>>    )
>>  ).toDF("id","map")
>>
>> What I want to achieve is have the map values as a separate columns.
>> Basically I want to achieve this:
>>
>> +---+----+----+----+
>> | id|   a|   b|   c|
>> +---+----+----+----+
>> |  a|   1|   1|null|
>> |  b|null|   1|   1|
>> |  c|   1|null|   1|
>> +---+----+----+----+
>>
>> I managed to create it with an explode-pivot combo, but for large
>> dataset, and a list of map keys around 1000 I imagine this will
>> be prohibitively expensive. I reckon there must be a much easier way to
>> achieve that, than:
>>
>> val exploded =
>> events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")
>>
>> Any help would be appreciated. :)
>>
>
>

Reply via email to