[jira] [Created] (SPARK-43264) Avoid allocation of unwritten ColumnVector in VectorizedReader

Zamil Majdy (Jira) Mon, 24 Apr 2023 07:41:06 -0700

Zamil Majdy created SPARK-43264:
-----------------------------------

             Summary: Avoid allocation of unwritten ColumnVector in 
VectorizedReader
                 Key: SPARK-43264
                 URL: https://issues.apache.org/jira/browse/SPARK-43264
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core, SQL
    Affects Versions: 3.4.1, 3.5.0
            Reporter: Zamil Majdy



Spark Vectorized Reader allocates the array for every fields for each value 
count even the array is ended up empty. This causes a high memory consumption 
when reading a table with large struct+array or many columns with sparse value. 
One way to fix this is by lazily allocating the column vector and only 
allocates the array only when it is needed (array is written).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43264) Avoid allocation of unwritten ColumnVector in VectorizedReader

Reply via email to