[ 
https://issues.apache.org/jira/browse/IMPALA-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Luksa resolved IMPALA-9226.
-----------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> Improve string allocations of the ORC scanner
> ---------------------------------------------
>
>                 Key: IMPALA-9226
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9226
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Norbert Luksa
>            Priority: Major
>              Labels: orc
>             Fix For: Impala 4.0
>
>
> Currently the ORC scanner allocates new memory for each string values (except 
> for fixed size strings):
> [https://github.com/apache/impala/blob/85425b81f04c856d7d5ec375242303f78ec7964e/be/src/exec/orc-column-readers.cc#L172]
> Besides the too many allocations and copying it's also bad for memory 
> locality.
> Since ORC-501 StringVectorBatch has a member named 'blob' that contains the 
> strings in the batch: 
> [https://github.com/apache/orc/blob/branch-1.6/c%2B%2B/include/orc/Vector.hh#L126]
> 'blob' has type DataBuffer which is movable, so Impala might be able to get 
> ownership of it. Or, at least we could copy the whole blob array instead of 
> copying the strings one-by-one.
> ORC-501 is included in ORC version 1.6, but Impala currently only uses ORC 
> 1.5.5.
> ORC 1.6 also introduces a new string vector type, EncodedStringVectorBatch:
> [https://github.com/apache/orc/blob/e40b9a7205d51995f11fe023c90769c0b7c4bb93/c%2B%2B/include/orc/Vector.hh#L153]
> It uses dictionary encoding for storing the values. Impala could copy/move 
> the dictionary as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to