Le 4/15/13 6:16 PM, Joshua Warner a écrit :
> Hi all,
>
> I'm seeing a strange issue with a (practically, for these purposes) stock
> Mina 2.0.7. Basically, every once in a while (in production only, of
> course), a couple of threads will go off into the weeds, consuming 100%
> CPU, and never come back. They're sitting in an infinite loop that looks
> something like this:
>
> * We're chugging merrily along in the main processing loop
> (AbstractPollingIoProcessor.java:1070)
This is where we start processing a SelectionKey whih is active (Either
on READ or WRITE)
> * Decide to flush the single session in flushingSessions
> (AbstractPollingIoProcessor.java:773, :1129)
This is where we process the write, if any. At this point, we have no
idea if the SelectionKey is ready for write. If - and only of - a
SelectionKey is ready for write, then the session has been added to the
flushingSessions queue.
> * According to "SessionState state = getState(session)", the session is
> OPENED, which I think is a lie, and perhaps the root of the problem.
The only reason for a session state to be different from OPENED is when
the associated selectionKey is not anymore valid - or if we don't have a
SelectionKey yet. So, no, it's not necessarily a lie ;-)
> * Enter "flushNow" (AbstractPollingIoProcessor.java:821, :789, :1129)
The place where we will try to flush as many messages as possible - if
the socket can absorb all of them.
> * Begin processing a queued message (AbstractPollingIoProcessor.java:861,
> :789, :1129)
> * Try to write out the message in writeBuffer
> (AbstractPollingIoProcessor.java:931, :861, :789, :1129)
Here, we try to write the data into the channel.
> * Catch an IOException ("Broken pipe") in writeBuffer
> (AbstractPollingIoProcessor.java:935, :861, :789, :1129), call
> "session.close(true)"
So the channel is not anymore available... We are trying to close the
session now.
> * Next time around the loop at (AbstractPollingIoProcessor.java:1070), the
> session is put back in flushingSessions, because apparently the session is
> still writable (liar!) (AbstractPollingIoProcessor.java:671, :653, :1124)
No, not liar. But the pb is that the session should have been removed
from the session to provide, and the SelectionKey has been set to be
ready for OP_WRITE events. As the SelectionKey is not removed, it will
be ready for a write, no matter what, thus the infinite loop.
> * Repeat, Ad Infinitum!
>
> Unfortunately, I don't know how it first gets into that loop - which I
> think has a lot to do with how the first call to AbstractIoSession.close()
> is processed.
Looking at the code, I *think* there is something extremely fishy. The
flushingSessions queue is always fed with the same session, something
that should never happen if the selectionKey is invalid. The pb is that
we don't came back with a clear status when we have a problem
>
> Any idea on how to further debug this? Is there a simple fix - perhaps
> something that's already scheduled for the next release?
Question : are you closing the session at some point ?
Otherwise, I suggest you add this line in the AbstractPollincConnection
class, line 927 :
try {
localWrittenBytes = write(session, buf, length);
} catch (IOException ioe) {
// We have had an issue while trying to send data to the
// peer : let's close the session.
buf.free();
session.close(true);
destroy(session); // <<<<<<<<<<<<<<<<<---- This line
return 0;
}
Can you give that a try ?
> I'd love to try
> 2.0.8, but sadly, since this is only thus far reproducible in production,
> I'll probably need significant evidence that this is fixed in that
> version...
2.0.8 is not out anyway...
>
> Thanks,
> Joshua
>
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com