[ https://issues.apache.org/jira/browse/ARROW-16746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou updated ARROW-16746: ----------------------------------- Component/s: C++ Python > S3 tag support on write > ----------------------- > > Key: ARROW-16746 > URL: https://issues.apache.org/jira/browse/ARROW-16746 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Reporter: André Kelpe > Priority: Major > > S3 allows tagging data to better organize ones data > ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] > We use this for efficient downstream processes/inventory management. > Currently arrow/pyarrow does not allow tags to be added on write. This is > causing us to scan the bucket and re-apply the tags after a pyrrow based > process has run. > I looked through the code and think that it could potentially be done via the > metadata mechanism. > The tags need to be added to the CreateMultipartUploadRequest here: > https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156 > See also > http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da -- This message was sent by Atlassian Jira (v8.20.7#820007)