Chao Sun created SPARK-35003: -------------------------------- Summary: Improve performance for reading smallint in vectorized Parquet reader Key: SPARK-35003 URL: https://issues.apache.org/jira/browse/SPARK-35003 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Chao Sun
Currently {{VectorizedRleValuesReader}} reads short in the following way: {code:java} for (int i = 0; i < n; i++) { c.putShort(rowId + i, (short)data.readInteger()); } {code} For PLAIN encoding {{readInteger}} is done via: {code:java} public final int readInteger() { return getBuffer(4).getInt(); } {code} which means it needs to repeatedly call {{slice}} buffer which is more expensive than calling it once in a big chunk and then reading the ints out. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org