Re: Spark 1.4 DataFrame Parquet file writing - missing random rows/partitions

2015-06-17 Thread Cheng Lian
Hi Nathan, Thanks a lot for the detailed report, especially the information about nonconsecutive part numbers. It's confirmed to be a race condition bug and just filed https://issues.apache.org/jira/browse/SPARK-8406 to track this. Will deliver a fix ASAP and this will be included in 1.4.1.

Re: Spark 1.4 DataFrame Parquet file writing - missing random rows/partitions

2015-06-17 Thread Cheng Lian
mailto:user@spark.apache.org Subject: Re: Spark 1.4 DataFrame Parquet file writing - missing random rows/partitions Hi Nathan, Thanks a lot for the detailed report, especially the information about nonconsecutive part numbers. It's confirmed to be a race condition bug and just filed https

Re: Spark 1.4 DataFrame Parquet file writing - missing random rows/partitions

2015-06-17 Thread Nathan McCarthy
, user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Spark 1.4 DataFrame Parquet file writing - missing random rows/partitions Hi Nathan, Thanks a lot for the detailed report, especially the information about nonconsecutive part numbers. It's confirmed to be a race condition bug and just

Spark 1.4 DataFrame Parquet file writing - missing random rows/partitions

2015-06-16 Thread Nathan McCarthy
Hi all, Looks like data frame parquet writing is very broken in Spark 1.4.0. We had no problems with Spark 1.3. When trying to save a data frame with 569610608 rows. dfc.write.format(parquet).save(“/data/map_parquet_file) We get random results between runs. Caching the data frame in memory