Below is our configuration
4 Data Nodes,3 Locator Nodes
8vcpu per node
128 GB per data node. Allocated 64 GB of ram per data node. I don't think
it could be because of GC as our heap utilization is low. Anyways will
check the GC logs if it's related to that.

Thanks,
Ashish

On Fri, Oct 19, 2018, 11:32 PM Anthony Baker <[email protected]> wrote:

> Look for GC pauses.  You add flags to the startup options to capture GC
> behavior and understand if you’re hitting a “stop the world” pause.  How
> much free heap space do you have?
>
>
> Here’s a few links:
>
> https://cwiki.apache.org/confluence/display/GEODE/Resource+Management+in+Geode
>
> https://cwiki.apache.org/confluence/display/GEODE/Troubleshooting+Garbage+Collection+Pauses
>
> Anthony
>
>
> On Oct 19, 2018, at 10:17 AM, aashish choudhary <
> [email protected]> wrote:
>
> Thanks Charlie. I have watched the video and it was helpful. Apart from
> overcommitted hardware could there be any other issue from geode
> perspective i.e. slow server etc ?. Since we have encountered this issue
> for the first time. For sure we will look into steal/ready time.
>
> Thanks,
> Ashish
>
> On Fri, Oct 19, 2018, 9:43 PM Charlie Black <[email protected]> wrote:
>
>> It's not normal for Geode to be not servicing requests.   I *do not*
>> recommend changing the fault tolerances until you find out why things
>> aren't responding in 10 seconds to 1 minute.    Imagine your users waiting
>> for a minute or more for an in-memory system to return a value.
>>
>> Some things to look out for is overcommitted hardware.   You can review
>> steal time on the guest OS.   However, most enterprises disable host
>> reporting so you might have to review Ready Time In MS on the Geode VM.
>>  This shows how long a VM was waiting to run - if its anything larger then
>> zero - make sure this is what you want.
>>
>> Here is a video where I talk about overcommitted hardware - which is
>> applicable to all things running on containers / vms.
>>
>> https://www.youtube.com/watch?v=0I2oPBKctgU
>>
>> Regards,
>>
>> Charlie
>>
>> On Fri, Oct 19, 2018 at 5:32 AM aashish choudhary <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> Recently in one of our client application for geode we are getting below
>>> exception.
>>>
>>> Pool unexpected socket timed out on client
>>>
>>>
>>> Server unreachable: could not
>>>
>>> connect after 1 attempts
>>>
>>> After looking at various threads came to know that we need to set
>>> read-timeout in client configuration to a higher value.Default is 10
>>> seconds I believe. Just curious to know why server would take more than 10
>>> seconds to respond. As 10 seconds seems to be on a higher side already.For
>>> now we will probably increase to 30 seconds atleast and observe it if makes
>>> any difference.
>>>
>>> Also on the server side could see below warnings.
>>>
>>> ClientHealthMonitor Unregistering client with member id identity xxxx
>>> due to: Socket closed.
>>> Monitoring client with member id identity xxxx It had been 60534 ms
>>> since the latest heartbeat. Max interval is 60000. Terminated client.
>>>
>>> Could this be because of high load on a particular server? But we have
>>> seen these warnings on all of our data nodes.
>>>
>>> Any parameters we need to tune in server side for this?
>>>
>>> Thanks,
>>> Ashish
>>>
>> --
>> [email protected] | +1.858.480.9722
>> Principal Realtime Data Engineer
>>
>
>

Reply via email to