Vadym Dytyniak created ARROW-18269: -------------------------------------- Summary: Slash character in partition value handling Key: ARROW-18269 URL: https://issues.apache.org/jira/browse/ARROW-18269 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 10.0.0 Reporter: Vadym Dytyniak
Provided example shows that pyarrow does not handle partition value that contains '/' correctly: {code:java} import pandas as pd import pyarrow as pa from pyarrow import dataset as ds df = pd.DataFrame({ 'value': [1, 2], 'instrument_id': ['A/Z', 'B'], }) ds.write_dataset( data=pa.Table.from_pandas(df), base_dir='data', format='parquet', partitioning=['instrument_id'], partitioning_flavor='hive', ) table = ds.dataset( source='data', format='parquet', partitioning='hive', ).to_table() tables = [table] df = pa.concat_tables(tables).to_pandas() tables = [table] df = pa.concat_tables(tables).to_pandas() print(df.head()){code} {code:java} value instrument_id 0 1 A 1 2 B {code} Expected behaviour: Option 1: Result should be: {code:java} value instrument_id 0 1 A/Z 1 2 B {code} Option 2: Error should be raised to avoid '/' in partition value. -- This message was sent by Atlassian Jira (v8.20.10#820010)