The PR is https://github.com/apache/spark/pull/1840.
On Thu, Aug 7, 2014 at 1:48 PM, Yin Huai wrote:
> Actually, the issue is if values of a field are always null (or this field
> is missing), we cannot figure out the data type. So, we use NullType (it is
> an internal data type). Right now, we
Actually, the issue is if values of a field are always null (or this field
is missing), we cannot figure out the data type. So, we use NullType (it is
an internal data type). Right now, we have a step to convert the data type
from NullType to StringType. This logic in the master has a bug.
We will
Thanks Yin!
best,
-Brad
On Thu, Aug 7, 2014 at 1:39 PM, Yin Huai wrote:
> Hi Brad,
>
> It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908
> to track it. It will be fixed soon.
>
> Thanks,
>
> Yin
>
>
> On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller
> wrote:
>
>> Hi All,
Hi Brad,
It is a bug. I have filed https://issues.apache.org/jira/browse/SPARK-2908
to track it. It will be fixed soon.
Thanks,
Yin
On Thu, Aug 7, 2014 at 10:55 AM, Brad Miller
wrote:
> Hi All,
>
> I'm having a bit of trouble with nested data structures in pyspark with
> saveAsParquetFile.
Hi All,
I'm having a bit of trouble with nested data structures in pyspark with
saveAsParquetFile. I'm running master (as of yesterday) with this pull
request added: https://github.com/apache/spark/pull/1802.
*# these all work*
> sqlCtx.jsonRDD(sc.parallelize(['{"record":
null}'])).saveAsParquet