Daniel Becker created IMPALA-12783: -------------------------------------- Summary: Nested struct with varlen data crashes Key: IMPALA-12783 URL: https://issues.apache.org/jira/browse/IMPALA-12783 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Daniel Becker Assignee: Daniel Becker
If a struct ("main") is within an array and contains two child structs ("s1" ans "s2") which both contain strings (or other varlen data), it crashes when re-materialised (for example in a sort with limit) if codegen is enabled. To reproduce: In Hive: {code:java} create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2: STRUCT<str2: STRING>>>) stored as parquet; insert into nested values (array( named_struct("s1", named_struct("str1", "A string that is long"), "s2", named_struct("str2", "Another string that is long") )));{code} In Impala: {code:java} select 1, arr from nested order by 1 limit 1;{code} This seems to be because in the codegen'd code, when checking if the strings ("str1" and "str2" in the example) are NULL, we incorrectly calculate the offset of the null indicator byte from the memory adress of their containing struct, not from the beginning of the "master tuple", which in this case is the item tuple of the array. Note that the null indicators of the struct members are at the end of the tuple containing the struct (recursively), i.e. the master tuple. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org