Async client issues / improvements
----------------------------------
Key: THRIFT-862
URL: https://issues.apache.org/jira/browse/THRIFT-862
Project: Thrift
Issue Type: Task
Components: Java - Library
Affects Versions: 0.4
Reporter: Ning Liang
>From Thrift-845:
Hey Eric,
You bring up some good points. We have indirect communication between the
client manager and client going through the method call object, which we can
refactor into direct communication. I'd like to keep the state machine separate
from the async client manager, though - the client manager only coordinates
transitioning the method call on ready. Responses inline for the other comments:
1. Timeouts per-transition aren't really what I want as a consumer of this api.
It looks like there are about five states, which as a consumer I need to know
since the effective timeout on the whole call is then 5x the one I set. I'd
rather be guaranteed I will be called back one way or another within the
timeout i set, counting time i spend in the selector's queue, etc. Seems like
that would be easy to implement.
* The naming is misleading - we can have idle socket timeout or an overall
method call timeout (more like a deadline). I coded for the former case, so we
can modify to the latter or rename. Modifying is trivial - we just need to
track the method call start time, as opposed to the last action time. The
effective deadline is arbitrarily long, since we're updating the last action
timestamp on each socket read/write, which doesn't necessarily correspond to a
transition between method states.
2. You should not be allowed to set a timeout lower than
TAsyncClientManager.SELECT_TIME, and the value there should be much lower,
maybe 5ms? Because its per-transition as above, the true minimum timeout for an
entire call is 1s if each of its states took 200ms. We've actually run clients
with lower than 200ms timeouts.
* Agreed, though we won't be able to get to 5ms resolution. The reason is
because the JVM makes no guarantees on actual sleep time. Maybe a better
solution would be to use Selector.selectNow(), which returns immediately with
all ready keys, making our timeout granularity as small as possible.
3. TAsyncMethodCall.onError does not do anything to remove itself from
TAsyncClientManager.timeoutWatchSet, so unless it shows up in selectedKeys() it
will stay there until its timeout. Not a big deal, but could make that set
prohibitively large if there are many errors within a timeout interval.
* TAsyncClientManager removes the method after a transition if the method's
client.hasError(). Again, somewhat confusing since it's relying on the method
calling the client's onError.
4. We call TAsyncMethodCall.transition before adding it to timeoutWatchSet, so
lastTransitionTime is initialized before it is inspected. But, this is a
somewhat dangerous pattern to allow an uninitialized primitive to ever be
accessible. You should probably initialize it to the current time or a sentinel
on instantiation.
* Current time is good - if we end up using the deadline definition of
timeout, we don't even have to update the value.
5. it might be safer for lastTransitionTime = System.currentTimeMillis() to be
at the top of transition(SelectionKey key) so you can guarantee the timestamp
is updated even for errant transitions...don't think it matter's functionally
in the current code, but it could in refactorings. you don't want to
inadvertently obscure an error with a timeout.
* Good point. I wanted the timestamp to reflect the last socket activity -
putting at the beginning of transition could make it inaccurate on long socket
reads/writes. Maybe we can use the finally block?
6. TNonblockingServerSocket is the only non-test, non-tutorial thrift source
file that uses System.err.println instead of LOGGER.warn
* We can easily fix.
7. You might want to catch Exception instead of Throwable? I don't think you
intend or want to catch things like OutOfMemoryError, etc. The docs say "An
Error is a subclass of Throwable that indicates serious problems that a
reasonable application should not try to catch"
* Also easily fixed, though I want to make sure we didn't do this for a
reason earlier. Bryan, why Throwable vs Exception?
8. TAsyncClientManager.call should really throw an exception if its
SelectThread is not running (because its run method did not catch an unexpected
Throwable) instead of inserting it into a queue that will grow unbounded and
cause an OOM. You probably want a catch and throw block in run that cleans up
by calling onError on all pending calls and sets its done flag. You probably
also want to do this on ClosedSelectorException instead of just catching and
logging it in an infinite loop. It's odd that you catch it on selectedKeys()
but not select().
* That catch got folded into the catch(Throwable)
9. timeoutIdleMethods would be a bit faster if it only called
currentTimeMillis() once instead of per-check
* Good point. I was going for accuracy but even for large timeout watch
sets the iteration time should be negligible.
10. Probably an unnecessary optimization, but instead of relying on
Object.hashCode for removing timed out methods, you could identify them by
their start timestamp from currentTimeMillis() + a few bits of a static atomic
sequence counter, put them in a TreeSet, and shortcut the iteration for timed
out methods when you hit the first one that's not too old.
* Definitely, this was the original plan. We'll get it in on next iteration.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.