Daniel Becker created IMPALA-12783:
--------------------------------------

             Summary: Nested struct with varlen data crashes
                 Key: IMPALA-12783
                 URL: https://issues.apache.org/jira/browse/IMPALA-12783
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Daniel Becker
            Assignee: Daniel Becker


If a struct ("main") is within an array and contains two child structs ("s1" 
ans "s2") which both contain strings (or other varlen data), it crashes when 
re-materialised (for example in a sort with limit) if codegen is enabled.

To reproduce:

In Hive:
{code:java}
create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2: 
STRUCT<str2: STRING>>>) stored as parquet;
insert into nested values (array( named_struct("s1", named_struct("str1", "A 
string that is long"), "s2", named_struct("str2", "Another string that is 
long") )));{code}
In Impala:
{code:java}
select 1, arr from nested order by 1 limit 1;{code}
This seems to be because in the codegen'd code, when checking if the strings 
("str1" and "str2" in the example) are NULL, we incorrectly calculate the 
offset of the null indicator byte from the memory adress of their containing 
struct, not from the beginning of the "master tuple", which in this case is the 
item tuple of the array.

Note that the null indicators of the struct members are at the end of the tuple 
containing the struct (recursively), i.e. the master tuple.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to