Thanks Yin! best, -Brad
On Thu, Aug 7, 2014 at 1:39 PM, Yin Huai <yh...@databricks.com> wrote: > Hi Brad, > > It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908 > to track it. It will be fixed soon. > > Thanks, > > Yin > > > On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller <bmill...@eecs.berkeley.edu> > wrote: > >> Hi All, >> >> I'm having a bit of trouble with nested data structures in pyspark with >> saveAsParquetFile. I'm running master (as of yesterday) with this pull >> request added: https://github.com/apache/spark/pull/1802. >> >> *# these all work* >> > sqlCtx.jsonRDD(sc.parallelize(['{"record": >> null}'])).saveAsParquetFile('/tmp/test0') >> > sqlCtx.jsonRDD(sc.parallelize(['{"record": >> []}'])).saveAsParquetFile('/tmp/test1') >> > sqlCtx.jsonRDD(sc.parallelize(['{"record": {"children": >> null}}'])).saveAsParquetFile('/tmp/test2') >> > sqlCtx.jsonRDD(sc.parallelize(['{"record": {"children": >> []}}'])).saveAsParquetFile('/tmp/test3') >> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": "foobar"}]* >> }'])).saveAsParquetFile('/tmp/test4') >> >> *# this FAILS* >> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": null}]* >> }'])).saveAsParquetFile('/tmp/test5') >> Py4JJavaError: An error occurred while calling o706.saveAsParquetFile. >> : java.lang.RuntimeException: *Unsupported datatype NullType* >> >> *# this FAILS* >> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": []}]* >> }'])).saveAsParquetFile('/tmp/test6') >> Py4JJavaError: An error occurred while calling o719.saveAsParquetFile. >> : java.lang.RuntimeException: *Unsupported datatype NullType* >> >> Based on the documentation and the examples that work, it seems like the >> failing examples are probably meant to be supported features. I was unable >> to find an open issue for this. Does anybody know if there is an open >> issue, or whether an issue should be created? >> >> best, >> -Brad >> > >