subject:"Spark\-2.0.0 fails reading a parquet dataset generated by Spark\-1.6.2"

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

2016-08-12 Thread Cheng Lian

OK, I've merged this PR to master and branch-2.0. On 8/11/16 8:27 AM, Cheng Lian wrote: Haven't figured out the exactly way how it failed, but the leading underscore in the partition directory name looks suspicious. Could you please try this PR to see whether it fixes the issue: https://githu

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

2016-08-10 Thread Cheng Lian

Haven't figured out the exactly way how it failed, but the leading underscore in the partition directory name looks suspicious. Could you please try this PR to see whether it fixes the issue: https://github.com/apache/spark/pull/14585/files Cheng On 8/9/16 5:38 PM, immerrr again wrote: Anot

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

2016-08-09 Thread immerrr again

Another follow-up: I have narrowed it down to the first 32 partitions, but from that point it gets strange. Here's the error: In [68]: spark.read.parquet(*subdirs[:32]) ... AnalysisException: u'Unable to infer schema for ParquetFormat at /path/to/data/_locality_code=AQ,/path/to/data/_locality_cod

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

2016-08-09 Thread immerrr again

Some follow-up information: - dataset size is ~150G - the data is partitioned by one of the columns, _locality_code: $ ls -1 _locality_code=AD _locality_code=AE _locality_code=AF _locality_code=AG _locality_code=AI _locality_code=AL _locality_code=AM _locality_code=AN _locality_code=YE _loca

Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

2016-08-09 Thread immerrr again

Hi everyone I tried upgrading Spark-1.6.2 to Spark-2.0.0 but run into an issue reading the existing data. Here's how the traceback looks in spark-shell: scala> spark.read.parquet("/path/to/data") org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at /path/to/data. It

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

Re: Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

Spark-2.0.0 fails reading a parquet dataset generated by Spark-1.6.2

5 matches

Site Navigation

Mail list logo

Footer information