Chao Sun created SPARK-36527: -------------------------------- Summary: Implement lazy materialization for the vectorized Parquet reader Key: SPARK-36527 URL: https://issues.apache.org/jira/browse/SPARK-36527 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Chao Sun
At the moment the Parquet vectorized reader will eagerly decode all the columns that are in the read schema, before any filter has been applied to them. This is costly. Instead it's better to only materialize these column vectors when the data are actually read. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org