Kai Fricke created ARROW-17641: ---------------------------------- Summary: [python] Deserializing ParseOptions does not set up invalid row handler correctly Key: ARROW-17641 URL: https://issues.apache.org/jira/browse/ARROW-17641 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 9.0.0 Reporter: Kai Fricke
Serializing and deserializing a {{csv.ParseOptions}} object with an {{invalid_row_handler}} will render the handler unusable. This is likely because the setter is not called correctly in the {{__setstate__}} method. Reproduction script: {code:python} import cloudpickle from pyarrow import csv invalid_csv = """f1,f2 3,4 5,6 \x00\x00 7,8""" source = "test.csv" with open(source, "w") as f: f.write(invalid_csv) def read_file(path, parse_options): # Uncomment this for a fix! # parse_options.invalid_row_handler = parse_options.invalid_row_handler with open(path, "rb") as f: return csv.read_csv(f, parse_options=parse_options) parse_options = csv.ParseOptions(delimiter=",", invalid_row_handler=lambda i: "skip") # Will succeed print(read_file(source, parse_options=parse_options)) parse_options = cloudpickle.loads(cloudpickle.dumps(parse_options)) # Will fail print(read_file(source, parse_options=parse_options)) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)