[
https://issues.apache.org/jira/browse/THRIFT-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906166#action_12906166
]
Eric Jensen commented on THRIFT-888:
------------------------------------
Final patch attached for integration. Reviewed this heavily with Ning and he
approves. The core change is that it provides lazy non-blocking connect
behavior from TNonblockinSocket when it interacts with TAsyncClientManager by
adding a new state to the method call. Also included:
1. bumped the SELECT_TIME down so we can have finer grained timeouts
2. catch runtime exceptions within the entire manager select thread so that
ones generated from a callback don't kill it
3. refactored the async test a bit: Initial attempts at this had transient
failures in the JankyRunnable portion of TestTAsyncClientManager where the
caller would either never be called back for the connect or we would get a
SocketException: Connection reset by peer on finishConnect. When I profiled
the test I found that the single TNonblockingServer$SelectThread was busy doing
reads and writes all the way up until the timeout. Even though we make these
calls in parallel, we dispatch them in serial in the server and that appeared
to be the bottleneck, so it was possible that we started a whole bunch of them
but didn't finish any of them until roughly the same time. if each took 0.5ms
in the worst case though, that could be 500*100*0.5ms = 2.5s. so basically the
one second wait was just too agressive for running 500 threads with 100 calls
per thread. I increased that and reduced the numThreads to 50 (to fix the
connection reset problems that i can't explain) and now no longer see any
failures after running the test 100 times on snow leopard or linux.
This test is really dirty, mostly because of copied and pasted code but also
because it asserts within callbacks, which throws assertion failures that the
selector thread subsequently catches. We should store off the throwables and
assert outside of the callbacks. Also it appears that part of why the test is
so slow and I had to mess so much with num threads and timeouts is because we
do synchronization within the callbacks to wake up the calling thread. This
means the selector itself is constantly having to notify a bunch of threads,
and I caught that as something taking time in the stack traces (since it blocks
all other selector progress). A cleaner, faster test would probably be to have
a truly async client as the test driver, enqueing the next iteration into a
ConcurrentLinkedQueue on the callback rather than doing something synchronized.
That said, I think it is sufficient for now and definitely covers the new
non-blocking connect functionality. I fixed a bunch of stuff about how it
logged errors and switched to using latches instead of objects since there was
a race condition where you could notify before the caller waits.
> async client should also have nonblocking connect
> -------------------------------------------------
>
> Key: THRIFT-888
> URL: https://issues.apache.org/jira/browse/THRIFT-888
> Project: Thrift
> Issue Type: Improvement
> Components: Java - Library
> Affects Versions: 0.4, 0.5
> Reporter: Eric Jensen
> Fix For: 0.5
>
> Attachments: async_connect_final.diff
>
>
> or its purpose of avoiding thread pooling around blocking I/O is defeated.
> this was discussed briefly in the comments of
> https://issues.apache.org/jira/browse/THRIFT-862 and will most likely obviate
> https://issues.apache.org/jira/browse/THRIFT-843
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.