Async client issues / improvements
----------------------------------

                 Key: THRIFT-862
                 URL: https://issues.apache.org/jira/browse/THRIFT-862
             Project: Thrift
          Issue Type: Task
          Components: Java - Library
    Affects Versions: 0.4
            Reporter: Ning Liang


>From Thrift-845:

Hey Eric,

You bring up some good points. We have indirect communication between the 
client manager and client going through the method call object, which we can 
refactor into direct communication. I'd like to keep the state machine separate 
from the async client manager, though - the client manager only coordinates 
transitioning the method call on ready. Responses inline for the other comments:

1. Timeouts per-transition aren't really what I want as a consumer of this api. 
It looks like there are about five states, which as a consumer I need to know 
since the effective timeout on the whole call is then 5x the one I set. I'd 
rather be guaranteed I will be called back one way or another within the 
timeout i set, counting time i spend in the selector's queue, etc. Seems like 
that would be easy to implement.

    * The naming is misleading - we can have idle socket timeout or an overall 
method call timeout (more like a deadline). I coded for the former case, so we 
can modify to the latter or rename. Modifying is trivial - we just need to 
track the method call start time, as opposed to the last action time. The 
effective deadline is arbitrarily long, since we're updating the last action 
timestamp on each socket read/write, which doesn't necessarily correspond to a 
transition between method states.

2. You should not be allowed to set a timeout lower than 
TAsyncClientManager.SELECT_TIME, and the value there should be much lower, 
maybe 5ms? Because its per-transition as above, the true minimum timeout for an 
entire call is 1s if each of its states took 200ms. We've actually run clients 
with lower than 200ms timeouts.

    * Agreed, though we won't be able to get to 5ms resolution. The reason is 
because the JVM makes no guarantees on actual sleep time. Maybe a better 
solution would be to use Selector.selectNow(), which returns immediately with 
all ready keys, making our timeout granularity as small as possible.

3. TAsyncMethodCall.onError does not do anything to remove itself from 
TAsyncClientManager.timeoutWatchSet, so unless it shows up in selectedKeys() it 
will stay there until its timeout. Not a big deal, but could make that set 
prohibitively large if there are many errors within a timeout interval.

    * TAsyncClientManager removes the method after a transition if the method's 
client.hasError(). Again, somewhat confusing since it's relying on the method 
calling the client's onError.

4. We call TAsyncMethodCall.transition before adding it to timeoutWatchSet, so 
lastTransitionTime is initialized before it is inspected. But, this is a 
somewhat dangerous pattern to allow an uninitialized primitive to ever be 
accessible. You should probably initialize it to the current time or a sentinel 
on instantiation.

    * Current time is good - if we end up using the deadline definition of 
timeout, we don't even have to update the value.

5. it might be safer for lastTransitionTime = System.currentTimeMillis() to be 
at the top of transition(SelectionKey key) so you can guarantee the timestamp 
is updated even for errant transitions...don't think it matter's functionally 
in the current code, but it could in refactorings. you don't want to 
inadvertently obscure an error with a timeout.

    * Good point. I wanted the timestamp to reflect the last socket activity - 
putting at the beginning of transition could make it inaccurate on long socket 
reads/writes. Maybe we can use the finally block?

6. TNonblockingServerSocket is the only non-test, non-tutorial thrift source 
file that uses System.err.println instead of LOGGER.warn

    * We can easily fix.

7. You might want to catch Exception instead of Throwable? I don't think you 
intend or want to catch things like OutOfMemoryError, etc. The docs say "An 
Error is a subclass of Throwable that indicates serious problems that a 
reasonable application should not try to catch"

    * Also easily fixed, though I want to make sure we didn't do this for a 
reason earlier. Bryan, why Throwable vs Exception?

8. TAsyncClientManager.call should really throw an exception if its 
SelectThread is not running (because its run method did not catch an unexpected 
Throwable) instead of inserting it into a queue that will grow unbounded and 
cause an OOM. You probably want a catch and throw block in run that cleans up 
by calling onError on all pending calls and sets its done flag. You probably 
also want to do this on ClosedSelectorException instead of just catching and 
logging it in an infinite loop. It's odd that you catch it on selectedKeys() 
but not select().

    * That catch got folded into the catch(Throwable)

9. timeoutIdleMethods would be a bit faster if it only called 
currentTimeMillis() once instead of per-check

    * Good point. I was going for accuracy but even for large timeout watch 
sets the iteration time should be negligible.

10. Probably an unnecessary optimization, but instead of relying on 
Object.hashCode for removing timed out methods, you could identify them by 
their start timestamp from currentTimeMillis() + a few bits of a static atomic 
sequence counter, put them in a TreeSet, and shortcut the iteration for timed 
out methods when you hit the first one that's not too old.

    * Definitely, this was the original plan. We'll get it in on next iteration.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to