Hi Kevin, This should help: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-spark.html
On 29 February 2016 at 16:54, Kevin Mellott <kevin.r.mell...@gmail.com> wrote: > Fellow Sparkers, > > I'm trying to "flatten" my view of data within a DataFrame, and am having > difficulties doing so. The DataFrame contains product information, which > includes multiple levels of categories (primary, secondary, etc). > > *Example Data (Raw):* > *Name Level Category* > Baked Code Food 1 > Baked Code Seafood 2 > Baked Code Fish 3 > Hockey Stick Sports 1 > Hockey Stick Hockey 2 > Hockey Stick Equipment 3 > > *Desired Data:* > *Name Category1 Category2 Category3* > Baked Code Food Seafood Fish > Hockey Stick Sports Hockey Equipment > > *Approach:* > After parsing the "raw" information into two separate DataFrames (called > *products > *and *categories*) and registering them as a Spark SQL tables, I was > attempting to perform the following query to flatten this all into the > "desired data" (depicted above). > > products.registerTempTable("products") > categories.registerTempTable("categories") > > val productList = sqlContext.sql( > " SELECT p.Name, " + > " c1.Description AS Category1, " + > " c2.Description AS Category2, " + > " c3.Description AS Category3 " + > " FROM products AS p " + > " JOIN categories AS c1 " + > " ON c1.Name = p.Name AND c1.Level = '1' " > " JOIN categories AS c2 " + > " ON c2.Name = p.Name AND c2.Level = '2' " > " JOIN categories AS c3 " + > " ON c3.Name = p.Name AND c3.Level = '3' " > > *Issue:* > I get an error when running my query above, because I am not able to JOIN > the *categories* table more than once. Has anybody dealt with this type > of use case before, and if so how did you achieve the desired behavior? > > Thank you in advance for your thoughts. > > Kevin >