[ 
https://issues.apache.org/jira/browse/FLINK-25200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479437#comment-17479437
 ] 

Piotr Nowojski edited comment on FLINK-25200 at 1/20/22, 3:11 PM:
------------------------------------------------------------------

[~yunta], I'm not sure how much more information would a more realistic test 
give us. Yes, one thing not covered by [~akalashnikov]'s test is local IO. But 
when re-uploading instead of duplicating file, it's quite likely that the state 
file will be already in the file cache for example. 

Regardless, after looking at those results, I'm beginning to doubt if it makes 
sense to provide native duplicate support for S3. It looks like the performance 
cost of both of those operations on the AWS side is the same. I was 
hoping/expecting orders of magnitude performance difference in favour of the 
CopyObject API. At the very least I think I would deprioritize this improvement 
and focus on other file systems first.


was (Author: pnowojski):
[~yunta], I'm not sure how much more information would a more realistic test 
give us. Yes, one thing not covered by [~akalashnikov]'s test is local IO. But 
when re-uploading instead of duplicating file, it's quite likely that the state 
file will be already in the file cache for example. 

Regardless, after looking at those results, I'm beginning to doubt if it makes 
sense to provide native duplicate support for S3. It looks like the performance 
cost of both of those operations on the AWS side is the same. I was 
hoping/expecting orders of magnitude performance difference in favour of the 
CopyObject API.

> Implement duplicating for s3 filesystem
> ---------------------------------------
>
>                 Key: FLINK-25200
>                 URL: https://issues.apache.org/jira/browse/FLINK-25200
>             Project: Flink
>          Issue Type: Sub-task
>          Components: FileSystems
>            Reporter: Dawid Wysakowicz
>            Priority: Major
>             Fix For: 1.15.0
>
>
> We can use https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to