Thanks Yin!

best,
-Brad


On Thu, Aug 7, 2014 at 1:39 PM, Yin Huai <yh...@databricks.com> wrote:

> Hi Brad,
>
> It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908
> to track it. It will be fixed soon.
>
> Thanks,
>
> Yin
>
>
> On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller <bmill...@eecs.berkeley.edu>
> wrote:
>
>> Hi All,
>>
>> I'm having a bit of trouble with nested data structures in pyspark with
>> saveAsParquetFile.  I'm running master (as of yesterday) with this pull
>> request added: https://github.com/apache/spark/pull/1802.
>>
>> *# these all work*
>> > sqlCtx.jsonRDD(sc.parallelize(['{"record":
>> null}'])).saveAsParquetFile('/tmp/test0')
>> > sqlCtx.jsonRDD(sc.parallelize(['{"record":
>> []}'])).saveAsParquetFile('/tmp/test1')
>> > sqlCtx.jsonRDD(sc.parallelize(['{"record": {"children":
>> null}}'])).saveAsParquetFile('/tmp/test2')
>> > sqlCtx.jsonRDD(sc.parallelize(['{"record": {"children":
>> []}}'])).saveAsParquetFile('/tmp/test3')
>> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": "foobar"}]*
>> }'])).saveAsParquetFile('/tmp/test4')
>>
>> *# this FAILS*
>> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": null}]*
>> }'])).saveAsParquetFile('/tmp/test5')
>> Py4JJavaError: An error occurred while calling o706.saveAsParquetFile.
>> : java.lang.RuntimeException: *Unsupported datatype NullType*
>>
>> *# this FAILS*
>> > sqlCtx.jsonRDD(sc.parallelize(['{"record": *[{"children": []}]*
>> }'])).saveAsParquetFile('/tmp/test6')
>> Py4JJavaError: An error occurred while calling o719.saveAsParquetFile.
>> : java.lang.RuntimeException: *Unsupported datatype NullType*
>>
>> Based on the documentation and the examples that work, it seems like the
>> failing examples are probably meant to be supported features.  I was unable
>> to find an open issue for this.  Does anybody know if there is an open
>> issue, or whether an issue should be created?
>>
>> best,
>> -Brad
>>
>
>

Reply via email to