[ 
https://issues.apache.org/jira/browse/HDFS-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410458#comment-16410458
 ] 

Hudson commented on HDFS-11028:
-------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13869 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13869/])
HDFS-11028: libhdfs++: FileSystem needs to be able to cancel pending 
(james.clampffer: rev 8783461e2ec3aff2630ea3574a69beb1d5c61e84)
* (add) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cpp/connect_cancel/CMakeLists.txt
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/continuation.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filesystem.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/namenode_operations.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/namenode_info.cc
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/rpc/rpc_engine.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/rpc/rpc_engine.cc
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/namenode_operations.cc
* (add) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/c/connect_cancel/connect_cancel.c
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filesystem.cc
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/hdfspp/hdfspp.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cpp/CMakeLists.txt
* (add) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cpp/connect_cancel/connect_cancel.cc
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/asio.h
* (add) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/c/connect_cancel/CMakeLists.txt
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/c/CMakeLists.txt
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/hdfspp/hdfs_ext.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/bindings/c/hdfs.cc
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/hdfs_configuration.cc


> libhdfs++: FileSystem needs to be able to cancel pending connections
> --------------------------------------------------------------------
>
>                 Key: HDFS-11028
>                 URL: https://issues.apache.org/jira/browse/HDFS-11028
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>            Priority: Major
>         Attachments: HDFS-11028.HDFS-8707.000.patch, 
> HDFS-11028.HDFS-8707.001.patch, HDFS-11028.HDFS-8707.002.patch, 
> HDFS-11028.HDFS-8707.003.patch, HDFS-11028.HDFS-8707.004.patch
>
>
> Cancel support is now reasonably robust except the case where a FileHandle 
> operation ends up causing the RpcEngine to try to create a new RpcConnection. 
>  In HA configs it's common to have something like 10-20 failovers and a 20 
> second failover delay (no exponential backoff just yet). This means that all 
> of the functions with synchronous interfaces can still block for many minutes 
> after an operation has been canceled, and often the cause of this is 
> something trivial like a bad config file.
> The current design makes this sort of thing tricky to do because the 
> FileHandles need to be individually cancelable via CancelOperations, but they 
> share the RpcEngine that does the async magic.
> Updated design:
> Original design would end up forcing lots of reconnects.  Not a huge issue on 
> an unauthenticated cluster but on a kerberized cluster this is a recipe for 
> Kerberos thinking we're attempting a replay attack.
> User visible cancellation and internal resources cleanup are separable 
> issues.  The former can be implemented by atomically swapping the callback of 
> the operation to be canceled with a no-op callback.  The original callback is 
> then posted to the IoService with an OperationCanceled status and the user is 
> no longer blocked.  For RPC cancels this is sufficient, it's not expensive to 
> keep a request around a little bit longer and when it's eventually invoked or 
> timed out it invokes the no-op callback and is ignored (other than a trace 
> level log notification).  Connect cancels push a flag down into the RPC 
> engine to kill the connection and make sure it doesn't attempt to reconnect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to