Mackenzie created ARROW-3916:
--------------------------------

             Summary: [Python] Support caller-provided filesystem in 
`ParquetWriter` constructor
                 Key: ARROW-3916
                 URL: https://issues.apache.org/jira/browse/ARROW-3916
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
    Affects Versions: 0.11.1
            Reporter: Mackenzie


Currently to write files incrementally to S3, the following pattern appears 
necessary:

{{def write_dfs_to_s3(dfs, fname):}}
{{    first_df = dfs[0]}}
{{    table = pa.Table.from_pandas(first_df, preserve_index=False)}}
{{    fs = s3fs.S3FileSystem()}}
{{    fh = fs.open(fname, 'wb')}}
{{    with pq.ParquetWriter(fh, table.schema) as writer:}}

{{         # set file handle on writer so writer manages closing it when it is 
itself closed}}
{{        writer.file_handle = fh}}
{{        writer.write_table(table=table)}}
{{        for df in dfs[1:]:}}
{{            table = pa.Table.from_pandas(df, preserve_index=False)}}
{{            writer.write_table(table=table)}}

This works as expected, but is quite roundabout. It would be much easier if 
`ParquetWriter` supported `filesystem` as a keyword argument in its 
constructor, in which case the `_get_fs_from_path` would be overriden by the 
usual pattern of using the kwarg after ensuring it is a proper file system with 
`_ensure_filesystem`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to