Zamil Majdy created SPARK-43264: ----------------------------------- Summary: Avoid allocation of unwritten ColumnVector in VectorizedReader Key: SPARK-43264 URL: https://issues.apache.org/jira/browse/SPARK-43264 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.4.1, 3.5.0 Reporter: Zamil Majdy
Spark Vectorized Reader allocates the array for every fields for each value count even the array is ended up empty. This causes a high memory consumption when reading a table with large struct+array or many columns with sparse value. One way to fix this is by lazily allocating the column vector and only allocates the array only when it is needed (array is written). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org