[ 
https://issues.apache.org/jira/browse/SPARK-48234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48234:
-----------------------------------
    Labels: pull-request-available  (was: )

> Invalid previous reader checks in Vectorized DELTA_BYTE_ARRAY parquet decoder
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-48234
>                 URL: https://issues.apache.org/jira/browse/SPARK-48234
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.5.1, 3.3.4, 3.4.3
>            Reporter: Yulia Tsareva
>            Priority: Major
>              Labels: pull-request-available
>
> The vectorized DELTA_BYTE_ARRAY Parquet decoder can cause read failures when 
> reading columns with varying page encodings and if some pages are encoded 
> using DELTA_BYTE_ARRAY.
> Same bug existed in parquet-mr reader but was fixed 3 months ago. There is no 
> separate bug fix commit, it was silently fixed along with other changes. 
> https://github.com/apache/parquet-mr/blob/c241170d9bc2cd8415b04e06ecea40ed3d80f64d/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnReaderBase.java#L732



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to