[ https://issues.apache.org/jira/browse/HDFS-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410458#comment-16410458 ]
Hudson commented on HDFS-11028: ------------------------------- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13869 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13869/]) HDFS-11028: libhdfs++: FileSystem needs to be able to cancel pending (james.clampffer: rev 8783461e2ec3aff2630ea3574a69beb1d5c61e84) * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cpp/connect_cancel/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/continuation.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filesystem.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/namenode_operations.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/namenode_info.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/rpc/rpc_engine.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/rpc/rpc_engine.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/namenode_operations.cc * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/c/connect_cancel/connect_cancel.c * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/fs/filesystem.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/hdfspp/hdfspp.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cpp/CMakeLists.txt * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cpp/connect_cancel/connect_cancel.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/asio.h * (add) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/c/connect_cancel/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/c/CMakeLists.txt * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/hdfspp/hdfs_ext.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.h * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/bindings/c/hdfs.cc * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/hdfs_configuration.cc > libhdfs++: FileSystem needs to be able to cancel pending connections > -------------------------------------------------------------------- > > Key: HDFS-11028 > URL: https://issues.apache.org/jira/browse/HDFS-11028 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Reporter: James Clampffer > Assignee: James Clampffer > Priority: Major > Attachments: HDFS-11028.HDFS-8707.000.patch, > HDFS-11028.HDFS-8707.001.patch, HDFS-11028.HDFS-8707.002.patch, > HDFS-11028.HDFS-8707.003.patch, HDFS-11028.HDFS-8707.004.patch > > > Cancel support is now reasonably robust except the case where a FileHandle > operation ends up causing the RpcEngine to try to create a new RpcConnection. > In HA configs it's common to have something like 10-20 failovers and a 20 > second failover delay (no exponential backoff just yet). This means that all > of the functions with synchronous interfaces can still block for many minutes > after an operation has been canceled, and often the cause of this is > something trivial like a bad config file. > The current design makes this sort of thing tricky to do because the > FileHandles need to be individually cancelable via CancelOperations, but they > share the RpcEngine that does the async magic. > Updated design: > Original design would end up forcing lots of reconnects. Not a huge issue on > an unauthenticated cluster but on a kerberized cluster this is a recipe for > Kerberos thinking we're attempting a replay attack. > User visible cancellation and internal resources cleanup are separable > issues. The former can be implemented by atomically swapping the callback of > the operation to be canceled with a no-op callback. The original callback is > then posted to the IoService with an OperationCanceled status and the user is > no longer blocked. For RPC cancels this is sufficient, it's not expensive to > keep a request around a little bit longer and when it's eventually invoked or > timed out it invokes the no-op callback and is ignored (other than a trace > level log notification). Connect cancels push a flag down into the RPC > engine to kill the connection and make sure it doesn't attempt to reconnect. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org