Hello Vas, It looks like the members are waiting for another member to come online and recover the latest data, according to the logs the missing member has its disk-store on */10.1.2.22:/scripts/data-4 *(maybe you have more than just 2 servers?). I'd suggest having a look at *Start Up and Shut Down with Disk Stores [1]* and make sure you're following the recommended steps highlighted there. Best regards
[1]: https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html On Mon, 24 Aug 2020 at 19:59, vas aj <[email protected]> wrote: > Hi all, > > After I restart the geode cluster having a region of type > *PARTITION_REDUNDANT_PERSISTENT*, the following are seen in the logs > > *server-1 logs* > .......................................................... > Region /ukCustomers (and any colocated sub-regions) has potentially stale > data. Buckets [1, 6, 8] are waiting for another offline member to recover > the latest data. My persistent id is: > DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6 > Name: stay-wrong-zeta > Location: /10.1.2.28:/scripts/data-1 > Offline members with potentially new data:[ > DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634 > Location: /10.1.2.22:/scripts/data-4 > Buckets: [1, 6, 8] > ]Use the gfsh show missing-disk-stores command to see all disk stores that > are being waited on by other members. > .......... > Region /ukCustomers has successfully completed waiting for other members > to recover the latest data. My persistent member information: > DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6 > Name: stay-wrong-zeta > Location: /10.1.2.28:/scripts/data-1 > > ................ > Server in /stay-wrong-zeta on server-1-7bfcbd6c7b-b54wb[40404] as > stay-wrong-zeta is currently online. > Process ID: 23 > Uptime: 1 minute 48 seconds > Geode Version: 1.11.0 > Java Version: 1.8.0_212 > Log File: /stay-wrong-zeta/stay-wrong-zeta.log > JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334] > -Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true > -Dgemfire.cache-xml-file=/scripts/cache-1.xml -Dgemfire.log-level=error > -Xms512m -Xmx512m -XX:+UseG1GC > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > Class-Path: > /geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar > > *server-2 logs* > ................................... > Region /ukCustomers (and any colocated sub-regions) has potentially stale > data. Buckets [0, 1, 3] are waiting for another offline member to recover > the latest data.My persistent id is: > DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c > Name: kick-drab-bat > Location: /10.1.2.30:/scripts/data-2 > Offline members with potentially new data:[ > DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634 > Location: /10.1.2.22:/scripts/data-4 > Buckets: [0, 1, 3] > ]Use the gfsh show missing-disk-stores command to see all disk stores that > are being waited on by other members. > .......... > Region /ukCustomers has successfully completed waiting for other members > to recover the latest data.My persistent member information: > DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c > Name: kick-drab-bat > Location: /10.1.2.30:/scripts/data-2 > > .............. > Server in /kick-drab-bat on server-2-9cbbd877c-gl6c4[40405] as > kick-drab-bat is currently online. > Process ID: 23 > Uptime: 1 minute 15 seconds > Geode Version: 1.11.0 > Java Version: 1.8.0_212 > Log File: /kick-drab-bat/kick-drab-bat.log > JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334] > -Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true > -Dgemfire.cache-xml-file=/scripts/cache-2.xml -Dgemfire.log-level=error > -Xms512m -Xmx512m -XX:+UseG1GC > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > Class-Path: > /geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar > > When I try to connect to the geode server using *client-cache*, it throws > an error > > org.apache.geode.cache.client.NoAvailableServersException: null > at > org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:277) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:125) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108) > at > org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772) > at > org.apache.geode.cache.client.internal.PutAllOp.execute(PutAllOp.java:100) > at > org.apache.geode.cache.client.internal.ServerRegionProxy.putAll(ServerRegionProxy.java:592) > at > org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:8913) > at > org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8846) > at > org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8858) > > . . . > . . . > . . . > > However, telnet <<remote hostname>> 40404 works fine. > > *What has gone wrong ?* > > *client-cache.xml* is as follows: > > <?xml version="1.0" encoding="UTF-8"?> > <client-cache> > <pool name="writeCachePool"> > <server host="${server1.url}" port="${server1.port}"/> > <server host="${server2.url}" port="${server2.port}"/> > </pool> > <region name="ukCustomers" refid="PROXY"/> > </client-cache> > > server 1 is re-started using the command > args: ["gfsh", "start", "server", > "--locators=locator-1[10334],locator-2[10334]", > "--rebalance=true","--server-port=40404", "--log-level=error", > "--J=-Xms512m", "--J=-Xmx512m", "--J=-XX:+UseG1GC", > "--classpath=/scripts/classpath/domain.jar", > "--cache-xml-file=/scripts/cache-1.xml"] > > where cache-1.xml is as follows: > > <?xml version="1.0" encoding="UTF-8"?> > <cache version="1.0" is-server="true"> > <disk-store name="disk-store-1" compaction-threshold="40" > max-oplog-size="1024" queue-size="10000" > time-interval="2000" write-buffer-size="65536" > disk-usage-warning-percentage="80" > disk-usage-critical-percentage="98"> > <disk-dirs> > <disk-dir>/scripts/data-1</disk-dir> > </disk-dirs> > </disk-store> > <region name="ukCustomers" refid="PARTITION_REDUNDANT_PERSISTENT"> > <region-attributes data-policy="persistent-partition" > disk-store-name="disk-store-1" > statistics-enabled="true" > disk-synchronous="true"> > <partition-attributes redundant-copies="1" > recovery-delay="5000" startup-recovery-delay="5000"/> > </region-attributes> > </region> > </cache> > > server 2 is also restarted in the similar manner with cache-2.xml. However > for cache-2.xml, dish-dir would be /scripts/data-2 > & disk-store-name="disk-store-2" > > -- Ju@N
