Are you running on cloud or virtualized infrastructure? If so, check if your steal time stats—you may have “noisy neighbors” causing members to become unresponsive. Geode detects this and fences off the unhealthy members to maintain consistency and availability.
Anthony > On Sep 27, 2018, at 10:31 AM, Dharam Thacker <[email protected]> > wrote: > > Hi Team, > > I have following topology for geode currently and all regions are replicated. > > Note : Unfortunately I am still on version 1.1.1 > > Host1: > Locator1 > Server1.1 (Group1) -- 24G > Server2.1 (Group2) -- 24G > Client1 (CQ listener only -- 20 CQs registered via locator pool) > Client2 (Fires OQL queries and functions only via locator pool) > > Host2: > Locator2 > Server1.2 (Group1) -- 24G > Server2.2 (Group2) -- 24G > > As shown above I have spring boot web app geode clients (client1 and client2) > only on HOST1. > > If I scale them by putting them on HOST2 as well it works. > > Now I see 40 CQs registered for CQ listener client. > > But I frequently see now "GMS Membership error" complaining about "No > heartbeat request and force disconnection of member" for all server nodes. > > Transient though but really painful! > > Somehow with 1.1.1 it can't auto reconnect which I know is fixed in later > version but that's still fine. > > I did GC,CPU load and Memory analysis very well and at least these 3 looks > quite healthy as expected. > > What could be the possible other reasons where scalling client apps might > result into this? > > Or if you can suggest anything else to look at? > > Thanks, > Dharam >
