Mackenzie created ARROW-3916: -------------------------------- Summary: [Python] Support caller-provided filesystem in `ParquetWriter` constructor Key: ARROW-3916 URL: https://issues.apache.org/jira/browse/ARROW-3916 Project: Apache Arrow Issue Type: New Feature Components: Python Affects Versions: 0.11.1 Reporter: Mackenzie
Currently to write files incrementally to S3, the following pattern appears necessary: {{def write_dfs_to_s3(dfs, fname):}} {{ first_df = dfs[0]}} {{ table = pa.Table.from_pandas(first_df, preserve_index=False)}} {{ fs = s3fs.S3FileSystem()}} {{ fh = fs.open(fname, 'wb')}} {{ with pq.ParquetWriter(fh, table.schema) as writer:}} {{ # set file handle on writer so writer manages closing it when it is itself closed}} {{ writer.file_handle = fh}} {{ writer.write_table(table=table)}} {{ for df in dfs[1:]:}} {{ table = pa.Table.from_pandas(df, preserve_index=False)}} {{ writer.write_table(table=table)}} This works as expected, but is quite roundabout. It would be much easier if `ParquetWriter` supported `filesystem` as a keyword argument in its constructor, in which case the `_get_fs_from_path` would be overriden by the usual pattern of using the kwarg after ensuring it is a proper file system with `_ensure_filesystem`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)