Re: How to troubleshoot stuck distributed function calls

2015-12-15 Thread Hovhannes Antonyan
Hi Barry, Yes I am running onMembers API, but as I already said there is no Function Execution Processor thread that runs that function.​ From: Barry Oglesby Sent: Wednesday, December 16, 2015 12:25 AM To: user@geode.incubator.apache.org Subject: Re: How to tr

Re: How to troubleshoot stuck distributed function calls

2015-12-15 Thread Barry Oglesby
I think it depends on how the function is being invoked. Below is an example with two peers using the onMembers API. If you're invoking your function differently (e.g. onRegion), let me know. Also, if you want to send your thread dumps, I can take a look at them. I have a test where I have one pee

Re: How to troubleshoot stuck distributed function calls

2015-12-15 Thread Hovhannes Antonyan
I have dumps of both nodes. Now can you please point to which threads should I look at? I do not see any function execution thread on target node running that function. But still the caller node waits for response from that node. Should I look at P2P threads next? Something else? _

Re: How to troubleshoot stuck distributed function calls

2015-12-15 Thread Barry Oglesby
You'll want to take thread dumps (not heap dumps) in the members especially the one that initiated the function call and the one that didn't send a response. Those will tell you whether the thread processing the function or the thread processing the reply is stuck and if so, where. Barry Oglesby G

Re: How to troubleshoot stuck distributed function calls

2015-12-15 Thread Hovhannes Antonyan
I was looking at the heapdump and identified the node which didn't sent the response. But the question now is why didn't it send it, did it run the function or not yet...? From: Darrel Schneider Sent: Tuesday, December 15, 2015 9:58 PM To: user@geode.incubator.

Re: How to troubleshoot stuck distributed function calls

2015-12-15 Thread Darrel Schneider
Usually the member waiting for a response logs a warning that it has been waiting for longer than 15 seconds from a particular member. Use that member id to identify the member that is not responding. Get a stack dump on that member and look for a thread that is processing the unresponsive message.

How to troubleshoot stuck distributed function calls

2015-12-15 Thread Hovhannes Antonyan
Hello experts, I have a multi node environment where one of the nodes has made a broadcast call to all other nodes and got stuck. It is still waiting responses from all nodes and from the heapdump I see that ResultCollector has N-1 elements, where N is the total number of nodes, so it looks lik