I met the same problem on Tomcat 9.0.74 these days and I think I have found the
answer.

Our case is:
1. Open serveral Chrome tabs and each tab establish a websocket connection and
a websocket session with Tomcat. To keep the connection and session alive,
there is a JS timer who send a STOMP heartbeat message to Tomcat server every
10 seconds.  Tomcat will send a STOMP heart beat to Chrome every 10 seconds
too. The timeout is 30 seconds on both sides. JS will establish a new websocket
connection if the old connection is closed.
Open dev tools for each tabs to observe and record the websocket connections.

2. Wait a few minutes and do nothing, we may find that:
   1) the AbstractProtocol.waitingProcessors leak probably.
   2) the hidden Chrome tab establish serveral websocket connections, only one
   alive, others are closed by Tomcat server.
   3) look at the closed websocket conenctions carefully, we find that the
   heartbeats from the server are normal, but there is no heartbeat to the
   server in the last 30 seconds before the connection is closed.
   4) many TCP connections are in TIME_WAIT state.


The leak may happen when the WsSessions expired on the server side.

I think the process is:

1. Chrome's Intensive Throttling will prevent the JS timer to send heartbeat
messages on the hidden tabs in 1 Minute.

2. Tomcat check WsSession expiration every second by WsBackgroundThread. The
WsSession will expire, and then Tomcat will send a close message to the client/
Chrome, and the client will send a close message as response.

3. In order to fix BZ 66508 dead-locks, https://bz.apache.org/bugzilla/
show_bug.cgi?id=66508, WsRemoteEndpointImplServer will release controll of
processor(UpgradeInteralProcessor for websocket) and the socket lock, then
re-take controll. The fix may set the socketWrapper.currentProcessor to null
when semaphore(messagePartInProgress) contention happens.
Now, WsSession is OUTPUT_CLOSED while the socket is not closed.

4. Client send a close message or a normal message to Tomcat, but
socketWrapper.currentProcessor is null now instead of a
UpgradeInteralProcessor, the AbstractProtocol/Http11NioProtocol will take a
Http11Processor to process the websocket message, this causes protocol error
which leads to Tocmat close socket immediately.
Now, WsSession is OUTPUT_CLOSED and the socket is closed.

Normally, processor is released by SocketWrapperBase.close(). SocketWrapperBase
will remove its currentProcessor from AbstractProtocol.waitingProcessors. But
the currentProcessor is null now and thus cannot be removed.

There is no more chance to remove UpgradeInteralProcessor of the expired
WsSession.


Here is my solution:
I think the key point is socketWrapper.currentProcessor should not be set to
null when WsSession expires. socketWrapper.currentProcessor is changed by
setCurrentProcessor() and takeCurrentProcessor() which both are invoked by
client massage processing and protected by socketWrapper.lock.

I've create a pr, please reveiw and check it, tks.

https://github.com/apache/tomcat/pull/683

Liang

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to