There is already an explode function on DataFrame btw

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L712

I think something like this would work. You might need to play with the
type.

df.explode("arrayBufferColumn") { x => x }



On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee <denny.g....@gmail.com> wrote:

> Thanks Dean - fun hack :)
>
> On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler <deanwamp...@gmail.com> wrote:
>
>> A hack workaround is to use flatMap:
>>
>> rdd.flatMap{ case (date, array) => for (x <- array) yield (date, x) }
>>
>> For those of you who don't know Scala, the for comprehension iterates
>> through the ArrayBuffer, named "array" and yields new tuples with the date
>> and each element. The case expression to the left of the => pattern matches
>> on the input tuples.
>>
>> Dean Wampler, Ph.D.
>> Author: Programming Scala, 2nd Edition
>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>> Typesafe <http://typesafe.com>
>> @deanwampler <http://twitter.com/deanwampler>
>> http://polyglotprogramming.com
>>
>> On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee <denny.g....@gmail.com> wrote:
>>
>>> Thanks Michael - that was it!  I was drawing a blank on this one for
>>> some reason - much appreciated!
>>>
>>>
>>> On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust <mich...@databricks.com>
>>> wrote:
>>>
>>>> A lateral view explode using HiveQL.  I'm hopping to add explode
>>>> shorthand directly to the df API in 1.4.
>>>>
>>>> On Thu, Apr 2, 2015 at 7:10 PM, Denny Lee <denny.g....@gmail.com>
>>>> wrote:
>>>>
>>>>> Quick question - the output of a dataframe is in the format of:
>>>>>
>>>>> [2015-04, ArrayBuffer(A, B, C, D)]
>>>>>
>>>>> and I'd like to return it as:
>>>>>
>>>>> 2015-04, A
>>>>> 2015-04, B
>>>>> 2015-04, C
>>>>> 2015-04, D
>>>>>
>>>>> What's the best way to do this?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>>>
>>

Reply via email to