[ 
https://issues.apache.org/jira/browse/HADOOP-13654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13654.
-------------------------------------
    Resolution: Won't Fix

> S3A create() to support asynchronous check of dest & parent paths
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13654
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>
> One source of delays in S3A is the need to check if a destination path exists 
> in create; this makes sure the operation isn't trying to overwrite a 
> directory.
> #. This is slow, 1-4 HTTPS requests
> # The code doesn't seem to check the entire parent path to make sure there 
> isn't a file as a parent (which raises the question: shouldn't we have a 
> contract test for this?)
> # Even with the create overwrite=false check, the fact that the new object 
> isn't created until the output stream is close()'d, means that the check has 
> race conditions.
> Instead of doing a synchronous check in create(), we could do an asynchronous 
> check of the parent directory tree. If any error surfaced, this could be 
> cached and then thrown on the next call to: write(), flush() or close(); that 
> is, the failure of a create due to path problems would not surface 
> immediately on the create() call, *but before any writes were committed*.
> The full directory tree can/should be checked, and is results remembered. 
> This would allow for the post-commit cleanup to issue delete() requests 
> purely for those paths (if any) which referred to directories.
> As well as the need to use the AWS thread pool, there's a bit of complexity 
> with cancelling multipart uploads: the output stream needs to know that the 
> request failed, and that the multipart should be aborted.
> If the complexity of the asynchronous calls can be coped with, *and client 
> code happy to accept errors in the any IO call to the output stream*, then 
> the initial overhead at file creation could be skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to