Re: NoAvailableServers after geode server restart

Ju@N Tue, 25 Aug 2020 01:23:07 -0700

Hello Vas,

It looks like the members are waiting for another member to come online and
recover the latest data, according to the logs the missing member has its
disk-store on */10.1.2.22:/scripts/data-4 *(maybe you have more than just 2
servers?). I'd suggest having a look at *Start Up and Shut Down with Disk
Stores [1]* and make sure you're following the recommended steps
highlighted there.
Best regards


[1]:
https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html


On Mon, 24 Aug 2020 at 19:59, vas aj <[email protected]> wrote:

> Hi all,
>
> After I restart the geode cluster having a region of type
> *PARTITION_REDUNDANT_PERSISTENT*, the following are seen in the logs
>
> *server-1 logs*
> ..........................................................
> Region /ukCustomers (and any colocated sub-regions) has potentially stale
> data.  Buckets [1, 6, 8] are waiting for another offline member to recover
> the latest data. My persistent id is:
>   DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6
>   Name: stay-wrong-zeta
>   Location: /10.1.2.28:/scripts/data-1
> Offline members with potentially new data:[
>   DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634
>   Location: /10.1.2.22:/scripts/data-4
>   Buckets: [1, 6, 8]
> ]Use the gfsh show missing-disk-stores command to see all disk stores that
> are being waited on by other members.
> ..........
> Region /ukCustomers has successfully completed waiting for other members
> to recover the latest data. My persistent member information:
>   DiskStore ID: 30596906-c97c-4279-89ea-46d088ed27f6
>   Name: stay-wrong-zeta
>   Location: /10.1.2.28:/scripts/data-1
>
> ................
> Server in /stay-wrong-zeta on server-1-7bfcbd6c7b-b54wb[40404] as
> stay-wrong-zeta is currently online.
> Process ID: 23
> Uptime: 1 minute 48 seconds
> Geode Version: 1.11.0
> Java Version: 1.8.0_212
> Log File: /stay-wrong-zeta/stay-wrong-zeta.log
> JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334]
> -Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true
> -Dgemfire.cache-xml-file=/scripts/cache-1.xml -Dgemfire.log-level=error
> -Xms512m -Xmx512m -XX:+UseG1GC
> -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
> -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
> Class-Path:
> /geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar
>
> *server-2 logs*
> ...................................
> Region /ukCustomers (and any colocated sub-regions) has potentially stale
> data.  Buckets [0, 1, 3] are waiting for another offline member to recover
> the latest data.My persistent id is:
>   DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c
>   Name: kick-drab-bat
>   Location: /10.1.2.30:/scripts/data-2
> Offline members with potentially new data:[
>   DiskStore ID: f4d5a2f6-7254-4749-ba9f-1831d8215634
>   Location: /10.1.2.22:/scripts/data-4
>   Buckets: [0, 1, 3]
> ]Use the gfsh show missing-disk-stores command to see all disk stores that
> are being waited on by other members.
> ..........
> Region /ukCustomers has successfully completed waiting for other members
> to recover the latest data.My persistent member information:
>   DiskStore ID: 2455d3c8-d852-4dac-a743-25ae62f5892c
>   Name: kick-drab-bat
>   Location: /10.1.2.30:/scripts/data-2
>
> ..............
> Server in /kick-drab-bat on server-2-9cbbd877c-gl6c4[40405] as
> kick-drab-bat is currently online.
> Process ID: 23
> Uptime: 1 minute 15 seconds
> Geode Version: 1.11.0
> Java Version: 1.8.0_212
> Log File: /kick-drab-bat/kick-drab-bat.log
> JVM Arguments: -Dgemfire.locators=locator-1[10334],locator-2[10334]
> -Dgemfire.start-dev-rest-api=false -Dgemfire.use-cluster-configuration=true
> -Dgemfire.cache-xml-file=/scripts/cache-2.xml -Dgemfire.log-level=error
> -Xms512m -Xmx512m -XX:+UseG1GC
> -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
> -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
> Class-Path:
> /geode/lib/geode-core-1.11.0.jar:/scripts/classpath/domain.jar:/scripts/classpath/spatial4j-0.7.jar:/scripts/classpath/geode-configs.jar:/scripts/classpath/lucene-sandbox-6.6.2.jar:/geode/lib/geode-dependencies.jar
>
> When I try to connect to the geode server using *client-cache*, it throws
> an error
>
> org.apache.geode.cache.client.NoAvailableServersException: null
> at
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:277)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:125)
> at
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
> at
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
> at
> org.apache.geode.cache.client.internal.PutAllOp.execute(PutAllOp.java:100)
> at
> org.apache.geode.cache.client.internal.ServerRegionProxy.putAll(ServerRegionProxy.java:592)
> at
> org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:8913)
> at
> org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8846)
> at
> org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8858)
>
> . . .
> . . .
> . . .
>
> However, telnet <<remote hostname>> 40404 works fine.
>
> *What has gone wrong ?*
>
> *client-cache.xml* is as follows:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <client-cache>
>     <pool name="writeCachePool">
>         <server host="${server1.url}" port="${server1.port}"/>
>         <server host="${server2.url}" port="${server2.port}"/>
>     </pool>
>     <region name="ukCustomers" refid="PROXY"/>
> </client-cache>
>
> server 1 is re-started using the command
> args: ["gfsh", "start", "server",
> "--locators=locator-1[10334],locator-2[10334]",
> "--rebalance=true","--server-port=40404", "--log-level=error",
> "--J=-Xms512m", "--J=-Xmx512m", "--J=-XX:+UseG1GC",
> "--classpath=/scripts/classpath/domain.jar",
> "--cache-xml-file=/scripts/cache-1.xml"]
>
> where cache-1.xml is as follows:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <cache version="1.0" is-server="true">
>     <disk-store name="disk-store-1" compaction-threshold="40"
> max-oplog-size="1024" queue-size="10000"
>                 time-interval="2000" write-buffer-size="65536"
> disk-usage-warning-percentage="80"
>                 disk-usage-critical-percentage="98">
>         <disk-dirs>
>             <disk-dir>/scripts/data-1</disk-dir>
>         </disk-dirs>
>     </disk-store>
>     <region name="ukCustomers" refid="PARTITION_REDUNDANT_PERSISTENT">
>         <region-attributes data-policy="persistent-partition"
>                            disk-store-name="disk-store-1"
>                            statistics-enabled="true"
> disk-synchronous="true">
>             <partition-attributes redundant-copies="1"
> recovery-delay="5000" startup-recovery-delay="5000"/>
>         </region-attributes>
>     </region>
> </cache>
>
> server 2 is also restarted in the similar manner with cache-2.xml. However
> for cache-2.xml,  dish-dir would be /scripts/data-2
> & disk-store-name="disk-store-2"
>
>

-- 
Ju@N

Re: NoAvailableServers after geode server restart

Reply via email to