Hi, devs,

I met a problem when using spark to read to parquet files with two
different versions of schemas. For example, the first file has one field
with "int" type, while the same field in the second file is a "long". I
thought spark would automatically generate a merged schema "long", and use
that schema to process both files. However, the following code cannot work:

DataFrame df = sqlContext.parquetFile(inputPath);
df.registerTempTable("data");
sqlContext.sql("select count(msg.actual_eta) from data").collect();


Exception:
parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in
file f1.parquet
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
at
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)


BTW, I use spark 1.3.1, and already set
"spark.sql.parquet.useDataSourceApi" to false.

Any help would be appreciated.

-Wei

Reply via email to