Keith Turner created THRIFT-4847: ------------------------------------ Summary: CancelledKeyException causes TThreadedSelectorServer to fail. Key: THRIFT-4847 URL: https://issues.apache.org/jira/browse/THRIFT-4847 Project: Thrift Issue Type: Bug Components: Java - Library Affects Versions: 0.12.0 Reporter: Keith Turner
When attempting to use TThreadedSelectorServer I see the following exception and then the server becomes inoperable. {noformat} 2019-04-03 11:50:37,638 [server.TThreadedSelectorServer] ERROR: run() on SelectorThread exiting due to uncaught error java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:82) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.changeSelectInterests(AbstractNonblockingServer.java:440) at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.processInterestChanges(AbstractNonblockingServer.java:191) at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:548) {noformat} I tracked this down and I think it is caused by the following events : # A frame buffer is created and given a selection key [TThreadedSelectorServer.java line 691|https://github.com/apache/thrift/blob/v0.12.0/lib/java/src/org/apache/thrift/server/TThreadedSelectorServer.java#L691] # The rebuild selector code introduced in THRIFT-4251 is triggered and all selectors key are canceled when the selector is closed [TThreadedSelectorServer.java line 668|https://github.com/apache/thrift/blob/v0.12.0/lib/java/src/org/apache/thrift/server/TThreadedSelectorServer.java#L668] # A frame buffer attempts to modify its invalid selection key causing an exception [AbstractNonblockingServer.java line 440|https://github.com/apache/thrift/blob/v0.12.0/lib/java/src/org/apache/thrift/server/AbstractNonblockingServer.java#L440] I added some logging and found that {{selector.select()}} would return 0 hundreds of times, but not infinitely. I changed SELECTOR_AUTO_REBUILD_THRESHOLD from 512 to 1,000,000 and the bug did not happen. I don't think this change is the fix, its just what I did as part of debugging this. Not sure what the best fix for this is. The situation that triggers this seems to be lots of connections in a very short time period. -- This message was sent by Atlassian JIRA (v7.6.3#76005)