[ 
https://issues.apache.org/jira/browse/THRIFT-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906166#action_12906166
 ] 

Eric Jensen commented on THRIFT-888:
------------------------------------

Final patch attached for integration.  Reviewed this heavily with Ning and he 
approves.  The core change is that it provides lazy non-blocking connect 
behavior from TNonblockinSocket when it interacts with TAsyncClientManager by 
adding a new state to the method call.  Also included:

1.  bumped the SELECT_TIME down so we can have finer grained timeouts

2.  catch runtime exceptions within the entire manager select thread so that 
ones generated from a callback don't kill it

3.  refactored the async test a bit:  Initial attempts at this had transient 
failures in the JankyRunnable portion of TestTAsyncClientManager where the 
caller would either never be called back for the connect or we would get a 
SocketException: Connection reset by peer on finishConnect.  When I profiled 
the test I found that the single TNonblockingServer$SelectThread was busy doing 
reads and writes all the way up until the timeout.  Even though we make these 
calls in parallel, we dispatch them in serial in the server and that appeared 
to be the bottleneck, so it was possible that we started a whole bunch of them 
but didn't finish any of them until roughly the same time.  if each took 0.5ms 
in the worst case though, that could be 500*100*0.5ms = 2.5s.  so basically the 
one second wait was just too agressive for running 500 threads with 100 calls 
per thread.  I increased that and reduced the numThreads to 50 (to fix the 
connection reset problems that i can't explain) and now no longer see any 
failures after running the test 100 times on snow leopard or linux.

This test is really dirty, mostly because of copied and pasted code but also 
because it asserts within callbacks, which throws assertion failures that the 
selector thread subsequently catches.  We should store off the throwables and 
assert outside of the callbacks.  Also it appears that part of why the test is 
so slow and I had to mess so much with num threads and timeouts is because we 
do synchronization within the callbacks to wake up the calling thread.  This 
means the selector itself is constantly having to notify a bunch of threads, 
and I caught that as something taking time in the stack traces (since it blocks 
all other selector progress).  A cleaner, faster test would probably be to have 
a truly async client as the test driver, enqueing the next iteration into a 
ConcurrentLinkedQueue on the callback rather than doing something synchronized. 
 

That said, I think it is sufficient for now and definitely covers the new 
non-blocking connect functionality.  I fixed a bunch of stuff about how it 
logged errors and switched to using latches instead of objects since there was 
a race condition where you could notify before the caller waits.  

> async client should also have nonblocking connect
> -------------------------------------------------
>
>                 Key: THRIFT-888
>                 URL: https://issues.apache.org/jira/browse/THRIFT-888
>             Project: Thrift
>          Issue Type: Improvement
>          Components: Java - Library
>    Affects Versions: 0.4, 0.5
>            Reporter: Eric Jensen
>             Fix For: 0.5
>
>         Attachments: async_connect_final.diff
>
>
> or its purpose of avoiding thread pooling around blocking I/O is defeated.  
> this was discussed briefly in the comments of 
> https://issues.apache.org/jira/browse/THRIFT-862 and will most likely obviate 
> https://issues.apache.org/jira/browse/THRIFT-843

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to