Update: I've found the cause of the problem (it wasn't what I originally
thought, but it does require patching MINA a bit) and a strategy to fix it. The
rest of this message is basically FYI. Details of my intended patch are in the
last paragraph.
Earlier, I wrote:
> The scheduleFlush() call returns false because AbstractPollingIoSession#
> setScheduledForFlush(true) does. Apparently, the AbstractIoSession#
> scheduledForFlush boolean is still set to true
Since I wrote that, I've narrowed the problem down a bit. Part of the problem
was in the way I was testing my ARQ code: I simply started the client but no
server, then sent a packet and waited for the time-outs to flow by. What I
neglected to mention in my original post was the PortUnreachableException that
gets thrown when there's nothing at the endpoint address.
As these things go, it was *of course* this exception and not that boolean
flag that was ultimately causing the writes to stall. I wrote a dummy
program that opens a DatagramSocket on the server port number, but never
sends any ACK packets, and immediately, the client writes its retries to the
network the way I expect it to.
Here's the relevant bit of code from AbstractPollingIoProcessor#flush():
> try {
> boolean flushedAll = flushNow(session);
> // (schedule next flush if it didn't write all of it at once)
> }
> catch (Exception e) {
> scheduleRemove(session);
> session.getFilterChain().fireExceptionCaught(e);
> }
I assume the scheduleRemove(session) is ultimately responsible for my
earlier problem.
Having that cleared up, this does bring me to a second problem, and this
is where it gets interesting.
When it goes live this application is going to be, by design, a second-
class citizen on the network on which it operates. Whenever it gets muted
for higher priority data, the network driver simply drops its data on the
client side and generates ICMP Unreachable errors to notify the calling
application.
This means that these PortUnreachableExceptions are going to be a very
common thing. Clearly, I can deal with this by supplying my own IoProcessor
instance when I construct the endpoints. However, with the current MINA code,
that means copying not only the NioProcessor concrete class, but also the
AbstractPollingIoProcessor, as there's no suitable hook for me to use.
I'll submit a patch that changes the catch block quoted above to forward
to an overrideable method that implements the current behaviour by default
(wait for a JIRA issue tonight or tomorrow) as I suspect that this "intercept
the exception before anything gets removed" may be useful for other people
as well. It'll also make the NioProcessor class non-final.
Regards,
Barend