&res created ARROW-18439: ---------------------------- Summary: Misleading message when loading parquet data with invalid null data Key: ARROW-18439 URL: https://issues.apache.org/jira/browse/ARROW-18439 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 10.0.1 Reporter: &res
I'm saving an arrow table to parquet. One column is a list of structs, which elements are marked as non nullable. But the data isn't valid because I've put a null in one of the nested field. When I save this data to parquet and try to load it back I get a very misleading message: {code:java} Length spanned by list offsets (2) larger than values array (length 1){code} I would rather arrow complains when creating the table or when saving it to parquet. Here's how to reproduce the issue: {code:java} struct = pa.struct( [ pa.field("nested_string", pa.string(), nullable=False), ] ) schema = pa.schema( [pa.field("list_column", pa.list_(pa.field("item", struct, nullable=False)))] ) table = pa.table( {"list_column": [[{"nested_string": ""}, {"nested_string": None}]]}, schema=schema ) with io.BytesIO() as file: pq.write_table(table, file) file.seek(0) pq.read_table(file) # Raises pa.ArrowInvalid {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)