There is already an explode function on DataFrame btw https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L712
I think something like this would work. You might need to play with the type. df.explode("arrayBufferColumn") { x => x } On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee <denny.g....@gmail.com> wrote: > Thanks Dean - fun hack :) > > On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler <deanwamp...@gmail.com> wrote: > >> A hack workaround is to use flatMap: >> >> rdd.flatMap{ case (date, array) => for (x <- array) yield (date, x) } >> >> For those of you who don't know Scala, the for comprehension iterates >> through the ArrayBuffer, named "array" and yields new tuples with the date >> and each element. The case expression to the left of the => pattern matches >> on the input tuples. >> >> Dean Wampler, Ph.D. >> Author: Programming Scala, 2nd Edition >> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) >> Typesafe <http://typesafe.com> >> @deanwampler <http://twitter.com/deanwampler> >> http://polyglotprogramming.com >> >> On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee <denny.g....@gmail.com> wrote: >> >>> Thanks Michael - that was it! I was drawing a blank on this one for >>> some reason - much appreciated! >>> >>> >>> On Thu, Apr 2, 2015 at 8:27 PM Michael Armbrust <mich...@databricks.com> >>> wrote: >>> >>>> A lateral view explode using HiveQL. I'm hopping to add explode >>>> shorthand directly to the df API in 1.4. >>>> >>>> On Thu, Apr 2, 2015 at 7:10 PM, Denny Lee <denny.g....@gmail.com> >>>> wrote: >>>> >>>>> Quick question - the output of a dataframe is in the format of: >>>>> >>>>> [2015-04, ArrayBuffer(A, B, C, D)] >>>>> >>>>> and I'd like to return it as: >>>>> >>>>> 2015-04, A >>>>> 2015-04, B >>>>> 2015-04, C >>>>> 2015-04, D >>>>> >>>>> What's the best way to do this? >>>>> >>>>> Thanks in advance! >>>>> >>>>> >>>>> >>>> >>