Marton Elek created HDDS-3902:
---------------------------------

             Summary: OM HA client failover switcher to a wrong OM server
                 Key: HDDS-3902
                 URL: https://issues.apache.org/jira/browse/HDDS-3902
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
          Components: OM HA
            Reporter: Marton Elek


Found this problem with the PR/branch HDDS-3878, but it seems to be independent.

1. ozone sh volume create /vol1 works well with HA
2. ozone freon omkg (rpc client) doesn't work

{code}
ozone freon omkg | grep "Failing over"
2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
to index: 1, nodeId: om2
2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
to index: 2, nodeId: om3
2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
to index: 0, nodeId: omNodeIdDummy
{code}

om2 seems to be the leader but for some reason the failover logic switching 
back to an unknown node (?)


{code}
2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
to index: 2, nodeId: om3
2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache: 
org.apache.hadoop.ipc.Client@f5acb9d
2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 60000 ms.
2020-06-30 14:16:35 DEBUG Client:795 - Connecting to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862
2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root: 
starting, having connections 3
2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
sending #0 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
value #0
2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 439ms
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
sending #1 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
value #1
2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
sending #2 org.apache.hadoop.ozone.om.pro
tocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
value #2
2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
sending #3 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
value #3
2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache: 
org.apache.hadoop.ipc.Client@f5acb9d
2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load.
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #5 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #11 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #8 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #12 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #10 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #6 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #9 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #7 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #4 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
sending #13 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got 
value #5
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got 
value #8
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got 
value #11
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got 
value #10
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got 
value #12
2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got 
value #7
2020-06-30 14:16:36 DEBUG Hadoop3OmTransport:140 - RetryProxy: OM:om1 is not 
the leader. Suggested leader is OM:om3.
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createNotLeaderException(OzoneManagerProtocolServerSideTranslatorPB.java:198)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:141)
        at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:113)
        at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)

2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got 
value #9
2020-06-30 14:16:36 DEBUG OMFailoverProxyProvider:299 - Incrementing OM proxy 
index to 0, nodeId: omNodeIdDummy
{code}

As you can see (after a few failover) finally om2 has been found and a few 
requests has been handled. But after that the client switched back to the om0 
(???)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to