Hello Vas,

It looks like the members are waiting for another member to come online and
recover the latest data, according to the logs the missing member has its
disk-store on */10.1.2.22:/scripts/data-4 *(maybe you have more than just 2
servers?). I'd suggest having a look at *Start Up and Shut Down with Disk
Stores [1]* and make sure you're following the recommended steps
highlighted there.
Best regards

[1]:
https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html


On Mon, 24 Aug 2020 at 19:59, vas aj <[email protected]> wrote:

> Hi all,
>
> After I restart the geode cluster having a region of type
> *PARTITION_REDUNDANT_PERSISTENT*, the following are seen in the logs
>
> *server-1 logs*
> ..........................................................
> Region /ukCustomers (and any colocated sub-regions) has potentially stale
> data.  Buckets [1, 6, 8] are waiting for another offline member to recover
> the latest data. My persistent id is:
>   DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6
>   Name: stay-wrong-zeta
>   Location: /10.1.2.28:/scripts/data-1
> Offline members with potentially new data:[
>   DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634
>   Location: /10.1.2.22:/scripts/data-4
>   Buckets: [1, 6, 8]
> ]Use the gfsh show missing-disk-stores command to see all disk stores that
> are being waited on by other members.
> ..........
> Region /ukCustomers has successfully completed waiting for other members
> to recover the latest data. My persistent member information:
>   DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6
>   Name: stay-wrong-zeta
>   Location: /10.1.2.28:/scripts/data-1
>
> ................
> Server in /stay-wrong-zeta on server-1-7bfcbd6c7b-b54wb[40404] as
> stay-wrong-zeta is currently online.
> Process ID: 23
> Uptime: 1 minute 48 seconds
> Geode Version: 1.11.0
> Java Version: 1.8.0_212
> Log File: /stay-wrong-zeta/stay-wrong-zeta.log
> JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334]
> -Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true
> -Dgemfire.cache-xml-file=/scripts/cache-1.xml -Dgemfire.log-level=error
> -Xms512m -Xmx512m -XX:+UseG1GC
> -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
> -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
> Class-Path:
> /geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar
>
> *server-2 logs*
> ...................................
> Region /ukCustomers (and any colocated sub-regions) has potentially stale
> data.  Buckets [0, 1, 3] are waiting for another offline member to recover
> the latest data.My persistent id is:
>   DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c
>   Name: kick-drab-bat
>   Location: /10.1.2.30:/scripts/data-2
> Offline members with potentially new data:[
>   DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634
>   Location: /10.1.2.22:/scripts/data-4
>   Buckets: [0, 1, 3]
> ]Use the gfsh show missing-disk-stores command to see all disk stores that
> are being waited on by other members.
> ..........
> Region /ukCustomers has successfully completed waiting for other members
> to recover the latest data.My persistent member information:
>   DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c
>   Name: kick-drab-bat
>   Location: /10.1.2.30:/scripts/data-2
>
> ..............
> Server in /kick-drab-bat on server-2-9cbbd877c-gl6c4[40405] as
> kick-drab-bat is currently online.
> Process ID: 23
> Uptime: 1 minute 15 seconds
> Geode Version: 1.11.0
> Java Version: 1.8.0_212
> Log File: /kick-drab-bat/kick-drab-bat.log
> JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334]
> -Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true
> -Dgemfire.cache-xml-file=/scripts/cache-2.xml -Dgemfire.log-level=error
> -Xms512m -Xmx512m -XX:+UseG1GC
> -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
> -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
> Class-Path:
> /geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar
>
> When I try to connect to the geode server using *client-cache*, it throws
> an error
>
> org.apache.geode.cache.client.NoAvailableServersException: null
> at
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:277)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:125)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
> at
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
> at
> org.apache.geode.cache.client.internal.PutAllOp.execute(PutAllOp.java:100)
> at
> org.apache.geode.cache.client.internal.ServerRegionProxy.putAll(ServerRegionProxy.java:592)
> at
> org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:8913)
> at
> org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8846)
> at
> org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8858)
>
> . . .
> . . .
> . . .
>
> However, telnet <<remote hostname>> 40404 works fine.
>
> *What has gone wrong ?*
>
> *client-cache.xml* is as follows:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <client-cache>
>     <pool name="writeCachePool">
>         <server host="${server1.url}" port="${server1.port}"/>
>         <server host="${server2.url}" port="${server2.port}"/>
>     </pool>
>     <region name="ukCustomers" refid="PROXY"/>
> </client-cache>
>
> server 1 is re-started using the command
> args: ["gfsh", "start", "server",
> "--locators=locator-1[10334],locator-2[10334]",
> "--rebalance=true","--server-port=40404", "--log-level=error",
> "--J=-Xms512m", "--J=-Xmx512m", "--J=-XX:+UseG1GC",
> "--classpath=/scripts/classpath/domain.jar",
> "--cache-xml-file=/scripts/cache-1.xml"]
>
> where cache-1.xml is as follows:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <cache version="1.0" is-server="true">
>     <disk-store name="disk-store-1" compaction-threshold="40"
> max-oplog-size="1024" queue-size="10000"
>                 time-interval="2000" write-buffer-size="65536"
> disk-usage-warning-percentage="80"
>                 disk-usage-critical-percentage="98">
>         <disk-dirs>
>             <disk-dir>/scripts/data-1</disk-dir>
>         </disk-dirs>
>     </disk-store>
>     <region name="ukCustomers" refid="PARTITION_REDUNDANT_PERSISTENT">
>         <region-attributes data-policy="persistent-partition"
>                            disk-store-name="disk-store-1"
>                            statistics-enabled="true"
> disk-synchronous="true">
>             <partition-attributes redundant-copies="1"
> recovery-delay="5000" startup-recovery-delay="5000"/>
>         </region-attributes>
>     </region>
> </cache>
>
> server 2 is also restarted in the similar manner with cache-2.xml. However
> for cache-2.xml,  dish-dir would be /scripts/data-2
> & disk-store-name="disk-store-2"
>
>

-- 
Ju@N

Reply via email to