Chao Sun created SPARK-35461: -------------------------------- Summary: Error when reading dictionary-encoded Parquet int column when read schema is bigint Key: SPARK-35461 URL: https://issues.apache.org/jira/browse/SPARK-35461 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1, 3.0.2 Reporter: Chao Sun
When reading a dictionary-encoded integer column from a Parquet file, and users specify read schema to be bigint, Spark currently will fail with the following exception: {code} java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) {code} To reproduce: {code} val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, i.toString)) withParquetFile(data) { path => val readSchema = StructType(Seq(StructField("_1", LongType))) spark.read.schema(readSchema).parquet(path).first() } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org