Hi Nicolas,
Thank you for your help.
I agree that this behavior is disturbing. I also believed that getaddrinfo 
should return at some point. But on that case, it never did.
I must say that it happened on one VM machine, and never reproduced on any 
other machine. I haven't encountered this bug since.

As far as I understand the design of gRPC's thread pool, I debugged the 
process and found our that the RPC function called "ClientReader" 
constructor, which issued an async name resolve operation and waited for 
its completion. An arbitrary thread from the pool picked up the resolve 
task, and called (eventually) getaddrinfo. Because this call *never 
returned*, the thread in the thread pool never completed the task, so the 
ClientReader constructor never finished the wait for the async name resolve 
operation.

I hope this clears things up a bit.

On Wednesday, August 9, 2017 at 12:57:56 AM UTC+3, Nicolas Noble wrote:
>
> Having getaddrinfo() not returning is disturbing. While it's true that all 
> of the OS' DNS resolution functions are synchronous, and will block until 
> the OS comes back with a response, it's usually expected that the OS 
> returns *eventually.* Either with an error (such as a timeout), or with 
> some results. Not returning at all isn't a sane nor expected behavior.
>
> Now your phrasing is a bit confusing. Are you saying that the DNS 
> resolution thread is stuck on resolving address ? Or that you think it 
> somehow did, and got the rest of the library confused and stuck ?
>
> On Monday, April 24, 2017 at 3:53:56 AM UTC-7, Amit Waisel wrote:
>>
>> I have a C++ client, that connects to a C# server. The connection is 
>> being made by a RPC function (called 
>> *InitializeStream()*), that sends a single request and receives a stream 
>> of responses from the server. This RPC function is executed with 'max' 
>> timeout (if the server is unavailable, later call to stream->Read() will 
>> return an error. This is good enough for me).
>> I encountered a weird bug, which happened on a VM. (I couldn't reproduce 
>> it on any other machine, but it reproduces easily on that VM). On that 
>> single VM, the 
>> *InitializeStream()* RPC function never returns.
>>
>> Further debugging of this issue reveled the following:
>>
>>    1. The main thread (thread #1) is blocked inside InitializeStream(), 
>>    in 
>> * grpc_iocp_work()*. The exact line is 
>> * iocp_windows.c@83* - in Windows's 
>> * GetQueuedCompletionStatus()* function.
>>    As far as I understand, here we wait for a task completion, for 
>>    unlimited timeout (I used the 'max' timeout).
>>      [External Code] 
>>    > Test.exe!grpc_iocp_work(grpc_exec_ctx * exec_ctx, gpr_timespec 
>>    deadline) Line 83 C
>>      Test.exe!grpc_pollset_work(grpc_exec_ctx * exec_ctx, grpc_pollset * 
>>    pollset, grpc_pollset_worker * * worker_hdl, gpr_timespec now, 
>>    gpr_timespec deadline) Line 140 C
>>      Test.exe!grpc_completion_queue_pluck(grpc_completion_queue * cc, 
>>    void * tag, gpr_timespec deadline, void * reserved) Line 614 C
>>      
>> Test.exe!grpc::CoreCodegen::grpc_completion_queue_pluck(grpc_completion_queue
>>  
>>    * cq, void * tag, gpr_timespec deadline, void * reserved) Line 70 C++
>>      Test.exe!grpc::CompletionQueue::Pluck(grpc::CompletionQueueTag * tag
>>    ) Line 230 C++
>>      Test.exe!grpc::ClientReader<test::TestRequest>::ClientReader<test::
>>    TestRequest><test::InitMessage>(grpc::ChannelInterface * channel, 
>>    const grpc::RpcMethod & method, grpc::ClientContext * context, const 
>>    test::InitMessage & request) Line 151 C++
>>      Test.exe!test::testInterface::Stub::InitializeStreamRaw(grpc::
>>    ClientContext * context, const test::InitMessage & request) Line 46 C
>>    ++
>>      Test.exe!test::testInterface::Stub::InitializeStream(grpc::
>>    ClientContext * context, const test::InitMessage & request) Line 86 C
>>    ++
>>      Test.exe!WinMain(HINSTANCE__ * __formal, HINSTANCE__ * __formal, 
>>    char * __formal, int __formal) Line 17 C++
>>      [External Code]
>>    
>>    2. One of gRPC's threads [from the thread pool] (thread #2), called 
>>    the function 
>> * do_request_thread()* in 
>> * resolve_address_windows.c@153*, which called 
>> * grpc_blocking_resolve_address()* (blocking function, by its name) that 
>>    called 
>> * getaddrinfo()* that never returns!
>>
>> My guess is that thread #1 waits (*GetQueuedCompletionStatus*) for 
>> thread #2's task completion. 
>> *getaddrinfo()* never returns, so 
>> *GetQueuedCompletionStatus()* is blocking, and the main thread is stuck.
>>
>> Have you encountered this error before? Do you have any idea what can I 
>> do (beside adding a timeout to the function, which I consider as a bypass 
>> and not a solution).
>> I use gRPC v1.2.0 for both C++ and C#.
>>
>> Thanks
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/14879dde-7199-4d71-a366-bf967080d2b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to