SHIMA Tatsuya created ARROW-18123: ------------------------------------- Summary: [Python] Cannot use multi-byte characters in file names Key: ARROW-18123 URL: https://issues.apache.org/jira/browse/ARROW-18123 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 9.0.0 Reporter: SHIMA Tatsuya
Error when specifying a file path containing multi-byte characters in {{pyarrow.parquet.write_table}}. For example, use {{例.parquet}} as the file path. {code:python} Python 3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> import numpy as np >>> import pyarrow as pa >>> df = pd.DataFrame({'one': [-1, np.nan, 2.5], ... 'two': ['foo', 'bar', 'baz'], ... 'three': [True, False, True]}, ... index=list('abc')) >>> table = pa.Table.from_pandas(df) >>> import pyarrow.parquet as pq >>> pq.write_table(table, '例.parquet') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 2920, in write_table with ParquetWriter( File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py", line 911, in __init__ filesystem, path = _resolve_filesystem_and_path( File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/fs.py", line 184, in _resolve_filesystem_and_path filesystem, path = FileSystem.from_uri(path) File "pyarrow/_fs.pyx", line 463, in pyarrow._fs.FileSystem.from_uri File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Cannot parse URI: '例.parquet' {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)