Joris Van den Bossche created ARROW-6157: --------------------------------------------
Summary: [Python][C++] UnionArray with invalid data passes validation / leads to segfaults Key: ARROW-6157 URL: https://issues.apache.org/jira/browse/ARROW-6157 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Joris Van den Bossche >From the Python side, you can create an "invalid" UnionArray: {code} binary = pa.array([b'a', b'b', b'c', b'd'], type='binary') int64 = pa.array([1, 2, 3], type='int64') types = pa.array([0, 1, 0, 0, 2, 1, 0], type='int8') # <- value of 2 is out of bound for number of childs value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32') a = pa.UnionArray.from_dense(types, value_offsets, [binary, int64]) {code} Eg on conversion to python this leads to a segfault: {code} In [7]: a.to_pylist() Segmentation fault (core dumped) {code} On the other hand, doing an explicit validation does not give an error: {code} In [8]: a.validate() {code} Should the validation raise errors for this case? (the C++ {{ValidateVisitor}} for UnionArray does nothing) -- This message was sent by Atlassian JIRA (v7.6.14#76016)