[
https://issues.apache.org/jira/browse/THRIFT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ning Liang updated THRIFT-862:
------------------------------
Attachment: thrift-862-v3.diff
Give this one a shot?
> Async client issues / improvements
> ----------------------------------
>
> Key: THRIFT-862
> URL: https://issues.apache.org/jira/browse/THRIFT-862
> Project: Thrift
> Issue Type: Task
> Components: Java - Library
> Affects Versions: 0.4
> Reporter: Ning Liang
> Assignee: Bryan Duxbury
> Attachments: thrift-862-v2.diff, thrift-862-v3.diff, thrift-862.diff
>
>
> From Thrift-845:
> Hey Eric,
> You bring up some good points. We have indirect communication between the
> client manager and client going through the method call object, which we can
> refactor into direct communication. I'd like to keep the state machine
> separate from the async client manager, though - the client manager only
> coordinates transitioning the method call on ready. Responses inline for the
> other comments:
> 1. Timeouts per-transition aren't really what I want as a consumer of this
> api. It looks like there are about five states, which as a consumer I need to
> know since the effective timeout on the whole call is then 5x the one I set.
> I'd rather be guaranteed I will be called back one way or another within the
> timeout i set, counting time i spend in the selector's queue, etc. Seems like
> that would be easy to implement.
> * The naming is misleading - we can have idle socket timeout or an
> overall method call timeout (more like a deadline). I coded for the former
> case, so we can modify to the latter or rename. Modifying is trivial - we
> just need to track the method call start time, as opposed to the last action
> time. The effective deadline is arbitrarily long, since we're updating the
> last action timestamp on each socket read/write, which doesn't necessarily
> correspond to a transition between method states.
> 2. You should not be allowed to set a timeout lower than
> TAsyncClientManager.SELECT_TIME, and the value there should be much lower,
> maybe 5ms? Because its per-transition as above, the true minimum timeout for
> an entire call is 1s if each of its states took 200ms. We've actually run
> clients with lower than 200ms timeouts.
> * Agreed, though we won't be able to get to 5ms resolution. The reason is
> because the JVM makes no guarantees on actual sleep time. Maybe a better
> solution would be to use Selector.selectNow(), which returns immediately with
> all ready keys, making our timeout granularity as small as possible.
> 3. TAsyncMethodCall.onError does not do anything to remove itself from
> TAsyncClientManager.timeoutWatchSet, so unless it shows up in selectedKeys()
> it will stay there until its timeout. Not a big deal, but could make that set
> prohibitively large if there are many errors within a timeout interval.
> * TAsyncClientManager removes the method after a transition if the
> method's client.hasError(). Again, somewhat confusing since it's relying on
> the method calling the client's onError.
> 4. We call TAsyncMethodCall.transition before adding it to timeoutWatchSet,
> so lastTransitionTime is initialized before it is inspected. But, this is a
> somewhat dangerous pattern to allow an uninitialized primitive to ever be
> accessible. You should probably initialize it to the current time or a
> sentinel on instantiation.
> * Current time is good - if we end up using the deadline definition of
> timeout, we don't even have to update the value.
> 5. it might be safer for lastTransitionTime = System.currentTimeMillis() to
> be at the top of transition(SelectionKey key) so you can guarantee the
> timestamp is updated even for errant transitions...don't think it matter's
> functionally in the current code, but it could in refactorings. you don't
> want to inadvertently obscure an error with a timeout.
> * Good point. I wanted the timestamp to reflect the last socket activity
> - putting at the beginning of transition could make it inaccurate on long
> socket reads/writes. Maybe we can use the finally block?
> 6. TNonblockingServerSocket is the only non-test, non-tutorial thrift source
> file that uses System.err.println instead of LOGGER.warn
> * We can easily fix.
> 7. You might want to catch Exception instead of Throwable? I don't think you
> intend or want to catch things like OutOfMemoryError, etc. The docs say "An
> Error is a subclass of Throwable that indicates serious problems that a
> reasonable application should not try to catch"
> * Also easily fixed, though I want to make sure we didn't do this for a
> reason earlier. Bryan, why Throwable vs Exception?
> 8. TAsyncClientManager.call should really throw an exception if its
> SelectThread is not running (because its run method did not catch an
> unexpected Throwable) instead of inserting it into a queue that will grow
> unbounded and cause an OOM. You probably want a catch and throw block in run
> that cleans up by calling onError on all pending calls and sets its done
> flag. You probably also want to do this on ClosedSelectorException instead of
> just catching and logging it in an infinite loop. It's odd that you catch it
> on selectedKeys() but not select().
> * That catch got folded into the catch(Throwable)
> 9. timeoutIdleMethods would be a bit faster if it only called
> currentTimeMillis() once instead of per-check
> * Good point. I was going for accuracy but even for large timeout watch
> sets the iteration time should be negligible.
> 10. Probably an unnecessary optimization, but instead of relying on
> Object.hashCode for removing timed out methods, you could identify them by
> their start timestamp from currentTimeMillis() + a few bits of a static
> atomic sequence counter, put them in a TreeSet, and shortcut the iteration
> for timed out methods when you hit the first one that's not too old.
> * Definitely, this was the original plan. We'll get it in on next
> iteration.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.