Can you share the exact error message you are seeing; do you see any
exception stack trace in the server log.

Most probable cause is n/w; or memory. Can you verify the specified memory
is getting allocated to the JVM and the host (virtual machines) has
sufficient memory to run all the servers/clients.

-Anil.





On Thu, Sep 27, 2018 at 9:48 AM Dharam Thacker <[email protected]>
wrote:

> Hello Anthony,
>
> Yes I am running in virtualized infrastructure. But when I checked %id and
> %st and logged graph for it, i see %st as always 0.0 and %id in range of
> (95-98) most of the time.
>
> Could number of connections for every client app or member-timeout or
> ack-wait-threshold help here?
>
> Thanks,
> - Dharam Thacker
>
>
> On Thu, Sep 27, 2018 at 8:37 PM Anthony Baker <[email protected]> wrote:
>
>> Are you running on cloud or virtualized infrastructure?  If so, check if
>> your steal time stats—you may have “noisy neighbors” causing members to
>> become unresponsive.  Geode detects this and fences off the unhealthy
>> members to maintain consistency and availability.
>>
>> Anthony
>>
>>
>> On Sep 27, 2018, at 10:31 AM, Dharam Thacker <[email protected]>
>> wrote:
>>
>> Hi Team,
>>
>> I have following topology for geode currently and all regions are
>> replicated.
>>
>> Note : Unfortunately I am still on version 1.1.1
>>
>> *Host1*:
>> Locator1
>> Server1.1 (Group1) -- 24G
>> Server2.1 (Group2) -- 24G
>> Client1 (CQ listener only -- 20 CQs registered via locator pool)
>> Client2 (Fires OQL queries and functions only via locator pool)
>>
>> *Host2*:
>> Locator2
>> Server1.2 (Group1) -- 24G
>> Server2.2 (Group2) -- 24G
>>
>> As shown above I have spring boot web app geode clients (client1 and
>> client2) only on HOST1.
>>
>> If I scale them by putting them on HOST2 as well it works.
>>
>> Now I see 40 CQs registered for CQ listener client.
>>
>> But I frequently see now "GMS Membership error" complaining about "No
>> heartbeat request and force disconnection of member" for all server nodes.
>>
>> Transient though but really painful!
>>
>> Somehow with 1.1.1 it can't auto reconnect which I know is fixed in later
>> version but that's still fine.
>>
>> I did GC,CPU load and Memory analysis very well and at least these 3
>> looks quite healthy as expected.
>>
>> What could be the possible other reasons where scalling client apps might
>> result into this?
>>
>> Or if you can suggest anything else to look at?
>>
>> Thanks,
>> Dharam
>>
>>
>>

Reply via email to