Thanks for the quick response, Emmanuel!
> * According to "SessionState state = getState(session)", the session is
> > OPENED, which I think is a lie, and perhaps the root of the problem.
> The only reason for a session state to be different from OPENED is when
> the associated selectionKey is not anymore valid - or if we don't have a
> SelectionKey yet. So, no, it's not necessarily a lie ;-)
>
Yeah, going back and re-reading the code, you're right that that's how it
works - I initially thought it was strange that the session was apparently
in a good, writable state, even though it clearly wasn't ever going to
recover and be able to write data.
>
> No, not liar. But the pb is that the session should have been removed
> from the session to provide, and the SelectionKey has been set to be
> ready for OP_WRITE events. As the SelectionKey is not removed, it will
> be ready for a write, no matter what, thus the infinite loop.
>
Perhaps I misspoke - I meant that, while the method does (rightfully) say
that the channel is good to write to, there's evidence elsewhere that the
session is not writable (i.e. we can't write to it; it throws an exception).
> Question : are you closing the session at some point ?
>
It's hard to say. We have yet to reproduce this on testing/dev
environments, so it's hard to find how it first dropped into this loop. It
could have been closed gracefully (from inside mina), externally, or not at
all.
> Otherwise, I suggest you add this line in the AbstractPollincConnection
> class, line 927 :
>
> try {
> localWrittenBytes = write(session, buf, length);
> } catch (IOException ioe) {
> // We have had an issue while trying to send data to the
> // peer : let's close the session.
> buf.free();
> session.close(true);
>
> destroy(session); // <<<<<<<<<<<<<<<<<---- This line
>
> return 0;
> }
>
> Can you give that a try ?
>
Thanks, we'll give it a try. What version of mina are you basing that
changes on? Would I be right in assuming that you copied that from the
top-of-tree 2.0 branch? I don't see the "buf.free()" or "return 0" lines
in the 2.0.7 code. How important are those changes?
> 2.0.8 is not out anyway...
We're not above using unreleased versions if they fix critical bugs... :)
Thanks,
Joshua