Hi Nathan,
Thanks a lot for the detailed report, especially the information about
nonconsecutive part numbers. It's confirmed to be a race condition bug
and just filed https://issues.apache.org/jira/browse/SPARK-8406 to track
this. Will deliver a fix ASAP and this will be included in 1.4.1.
mailto:user@spark.apache.org
Subject: Re: Spark 1.4 DataFrame Parquet file writing - missing random
rows/partitions
Hi Nathan,
Thanks a lot for the detailed report, especially the information about
nonconsecutive part numbers. It's confirmed to be a race condition bug
and just filed https
, user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Spark 1.4 DataFrame Parquet file writing - missing random
rows/partitions
Hi Nathan,
Thanks a lot for the detailed report, especially the information about
nonconsecutive part numbers. It's confirmed to be a race condition bug and just
Hi all,
Looks like data frame parquet writing is very broken in Spark 1.4.0. We had no
problems with Spark 1.3.
When trying to save a data frame with 569610608 rows.
dfc.write.format(parquet).save(“/data/map_parquet_file)
We get random results between runs. Caching the data frame in memory