Re: Nested DataFrame(SchemaRDD)

2015-06-24 Thread Richard Catlin
Michael, I have two Dataframes. A "users" DF, and an "investments" DF. The "investments" DF has a column that matches the "users" id. I would like to nest the collection of investments for each user and save to a parquet file. Is there a straightforward way to do this? Thanks. Richard Catlin

Re: Nested DataFrame(SchemaRDD)

2015-06-23 Thread Michael Armbrust
You can also do this using a sequence of case classes (in the example stored in a tuple, though the outer container could also be a case class): case class MyRecord(name: String, location: String) val df = Seq((1, Seq(MyRecord("Michael", "Berkeley"), MyRecord("Andy", "Oakland".toDF("id", "peop

Re: Nested DataFrame(SchemaRDD)

2015-06-23 Thread Roberto Congiu
I wrote a brief howto on building nested records in spark and storing them in parquet here: http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/ 2015-06-23 16:12 GMT-07:00 Richard Catlin : > How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a > column? Is there an

RE: Nested DataFrame(SchemaRDD)

2015-06-23 Thread Richard Catlin
How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a column? Is there an example? Will this store as a nested parquet file? Thanks. Richard Catlin