[jira] [Created] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

Chao Sun (Jira) Thu, 20 May 2021 11:07:04 -0700

Chao Sun created SPARK-35461:
--------------------------------

             Summary: Error when reading dictionary-encoded Parquet int column 
when read schema is bigint
                 Key: SPARK-35461
                 URL: https://issues.apache.org/jira/browse/SPARK-35461
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.1.1, 3.0.2
            Reporter: Chao Sun



When reading a dictionary-encoded integer column from a Parquet file, and users 
specify read schema to be bigint, Spark currently will fail with the following 
exception:
{code}
java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
        at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
        at 
org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50)
        at 
org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344)
{code}

To reproduce:
{code}
    val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, i.toString))
    withParquetFile(data) { path =>
      val readSchema = StructType(Seq(StructField("_1", LongType)))
      spark.read.schema(readSchema).parquet(path).first()
    }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35461) Error when reading dictionary-encoded Parquet int column when read schema is bigint

Reply via email to