Can you share the exact error message you are seeing; do you see any exception stack trace in the server log.
Most probable cause is n/w; or memory. Can you verify the specified memory is getting allocated to the JVM and the host (virtual machines) has sufficient memory to run all the servers/clients. -Anil. On Thu, Sep 27, 2018 at 9:48 AM Dharam Thacker <[email protected]> wrote: > Hello Anthony, > > Yes I am running in virtualized infrastructure. But when I checked %id and > %st and logged graph for it, i see %st as always 0.0 and %id in range of > (95-98) most of the time. > > Could number of connections for every client app or member-timeout or > ack-wait-threshold help here? > > Thanks, > - Dharam Thacker > > > On Thu, Sep 27, 2018 at 8:37 PM Anthony Baker <[email protected]> wrote: > >> Are you running on cloud or virtualized infrastructure? If so, check if >> your steal time stats—you may have “noisy neighbors” causing members to >> become unresponsive. Geode detects this and fences off the unhealthy >> members to maintain consistency and availability. >> >> Anthony >> >> >> On Sep 27, 2018, at 10:31 AM, Dharam Thacker <[email protected]> >> wrote: >> >> Hi Team, >> >> I have following topology for geode currently and all regions are >> replicated. >> >> Note : Unfortunately I am still on version 1.1.1 >> >> *Host1*: >> Locator1 >> Server1.1 (Group1) -- 24G >> Server2.1 (Group2) -- 24G >> Client1 (CQ listener only -- 20 CQs registered via locator pool) >> Client2 (Fires OQL queries and functions only via locator pool) >> >> *Host2*: >> Locator2 >> Server1.2 (Group1) -- 24G >> Server2.2 (Group2) -- 24G >> >> As shown above I have spring boot web app geode clients (client1 and >> client2) only on HOST1. >> >> If I scale them by putting them on HOST2 as well it works. >> >> Now I see 40 CQs registered for CQ listener client. >> >> But I frequently see now "GMS Membership error" complaining about "No >> heartbeat request and force disconnection of member" for all server nodes. >> >> Transient though but really painful! >> >> Somehow with 1.1.1 it can't auto reconnect which I know is fixed in later >> version but that's still fine. >> >> I did GC,CPU load and Memory analysis very well and at least these 3 >> looks quite healthy as expected. >> >> What could be the possible other reasons where scalling client apps might >> result into this? >> >> Or if you can suggest anything else to look at? >> >> Thanks, >> Dharam >> >> >>
