Re: trouble with saveAsParquetFile

2014-08-07 Thread Yin Huai
The PR is https://github.com/apache/spark/pull/1840. On Thu, Aug 7, 2014 at 1:48 PM, Yin Huai wrote: > Actually, the issue is if values of a field are always null (or this field > is missing), we cannot figure out the data type. So, we use NullType (it is > an internal data type). Right now, we

Re: trouble with saveAsParquetFile

2014-08-07 Thread Yin Huai
Actually, the issue is if values of a field are always null (or this field is missing), we cannot figure out the data type. So, we use NullType (it is an internal data type). Right now, we have a step to convert the data type from NullType to StringType. This logic in the master has a bug. We will

Re: trouble with saveAsParquetFile

2014-08-07 Thread Brad Miller
Thanks Yin! best, -Brad On Thu, Aug 7, 2014 at 1:39 PM, Yin Huai wrote: > Hi Brad, > > It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908 > to track it. It will be fixed soon. > > Thanks, > > Yin > > > On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller > wrote: > >> Hi All,

Re: trouble with saveAsParquetFile

2014-08-07 Thread Yin Huai
Hi Brad, It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908 to track it. It will be fixed soon. Thanks, Yin On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller wrote: > Hi All, > > I'm having a bit of trouble with nested data structures in pyspark with > saveAsParquetFile.

trouble with saveAsParquetFile

2014-08-07 Thread Brad Miller
Hi All, I'm having a bit of trouble with nested data structures in pyspark with saveAsParquetFile. I'm running master (as of yesterday) with this pull request added: https://github.com/apache/spark/pull/1802. *# these all work* > sqlCtx.jsonRDD(sc.parallelize(['{"record": null}'])).saveAsParquet