Hi, devs, I met a problem when using spark to read to parquet files with two different versions of schemas. For example, the first file has one field with "int" type, while the same field in the second file is a "long". I thought spark would automatically generate a merged schema "long", and use that schema to process both files. However, the following code cannot work:
DataFrame df = sqlContext.parquetFile(inputPath); df.registerTempTable("data"); sqlContext.sql("select count(msg.actual_eta) from data").collect(); Exception: parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file f1.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201) BTW, I use spark 1.3.1, and already set "spark.sql.parquet.useDataSourceApi" to false. Any help would be appreciated. -Wei