Re: PySpark DataFrame: Preserving nesting when selecting a nested field

2015-05-11 Thread Reynold Xin
In 1.4, you can use "struct" function to create a struct, e.g. you can explicitly select out the "version" column, and then create a new struct named "settings". The current semantics of select basically follows closely relational database's SQL, which is well understood and defined. I wouldn't a

PySpark DataFrame: Preserving nesting when selecting a nested field

2015-05-09 Thread Nicholas Chammas
Take a look: >>> df = sqlContext.jsonRDD(sc.parallelize(['{"settings": {"os": "OS X", >>> "version": "10.10"}}']))>>> df.printSchema() root |-- settings: struct (nullable = true) ||-- os: string (nullable = true) ||-- version: string (nullable = true) >>> # Now I want to "drop" the ver