Andrew Purtell created HBASE-10121: -------------------------------------- Summary: Abort wedged Calls after a timeout Key: HBASE-10121 URL: https://issues.apache.org/jira/browse/HBASE-10121 Project: HBase Issue Type: Bug Affects Versions: 0.94.11 Reporter: Andrew Purtell Attachments: screenshot.jpg
Saw this on a mail to user@. "REPL IPC Server handler $N on $PORT WAITING Waiting for a call (since 22 hrs, 57mins, 38sec ago)" I don't think this is a TCP level issue. We are enabling keepalives on connections by default. Either we failed to remove the call upon exception or the remote is alive but not sending. Looking at the IPC server code, I don't see where we abort and clean up wedged Calls after some timeout. Regardless of the other issues here, should we do that? -- This message was sent by Atlassian JIRA (v6.1.4#6159)