Re: How to deal with cluster configuration service failure

Mark Secrist Mon, 05 Jun 2017 09:48:47 -0700

I also wonder if it could be that way if the two locators are started
without knowledge of each other (via the locators) property.


On Mon, Jun 5, 2017 at 10:45 AM, Darrel Schneider <[email protected]>
wrote:

> A ConflictingPersistentDataException indicates that two copies of a
> disk-store where written independently of each other. When using cluster
> configuration the locator uses a disk-store to write the cluster
> configuration to disk. It looks like that it the disk-store that is
> throwing ConflictingPersistentDataException.
> One way this could happen is if you have just one locator running and it
> writes the cluster config to its disk-store. You then shut that locator
> down and start up a different one. It would have no knowledge of the other
> locator that you shut down so it would create a brand new cluster config in
> its disk-store. If at some point these two locators finally see each other
> the second one to start will throw a ConflictingPersistentDataException.
> In this case you need to pick which one of these disk-stores you want to be
> the winner and remove the other disk store. To pick the best winner I think
> each locator also writes some cache.xml files that will show you in plain
> text what is in the binary disk-store files. This could also help you in
> determining what configuration you will lose when you remove one of these
> disk-stores. You can get that missing config back by doing the same gfsh
> commands (for example create region). Another option would be to use the
> gfsh import/export commands. Before deleting either disk-store start them
> up one at a time and export the cluster config. Then you can start fresh by
> importing the config.
> You might hit a problem in which one of these disk-stores now knows about
> the other so when you try to start it by itself it fails saying it is
> waiting for the other to start up. Then when you do that you get the
> ConflictingPersistentDataException. In that case you would not be able to
> start them up one at a time to do the export so in that case you need to
> find the cache.xml files. Someone who knows more about cluster config might
> be able to help you more.
>
> You should be able to avoid this in the future by making sure you start
> both locators before doing your first gfsh create command. That way both
> disk-stores will know about each other and will be kept in sync.
>
> On Mon, Jun 5, 2017 at 8:07 AM, Jinmei Liao <[email protected]> wrote:
>
>> Is this related to https://issues.apache.org/jira/browse/GEODE-3003?
>>
>> On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <
>> [email protected]> wrote:
>>
>>> Hi Team,
>>>
>>>
>>>
>>> Could someone help to understand how to deal with below scenario where
>>> cluster configuration service fails to start in another locator? Which
>>> supportive action should we take to rectify this?
>>>
>>>
>>>
>>> *Note*:
>>>
>>> member001.IP.MAKSED – IP address of member001
>>>
>>> member002.IP.MASKED – IP address of member002
>>>
>>>
>>>
>>> *Locator logs on member002:*
>>>
>>>
>>>
>>> [info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message
>>> Processor 1> tid=0x3d] Initializing region _ConfigurationRegion
>>>
>>>
>>>
>>> [warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor
>>> 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
>>>
>>> org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>>> Region /_ConfigurationRegion refusing to initialize from member
>>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data
>>> /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1
>>> created at timestamp 1496241336712 version 0 diskStoreId
>>> 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when
>>> the local data from /*member002.IP.MASKED*:/local/ap
>>> ps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at
>>> timestamp 1496241344046 version 0 diskStoreId 
>>> df94511d0f3d4295-91ec9286a18aaa75
>>> name Locator2 was last online
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>>
>>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>>> RegionProcessor.getInitialImageAdvice(CreatePersistentRegion
>>> Processor.java:52)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>>> ImageAndRecovery(DistributedRegion.java:1267)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>>> (DistributedRegion.java:1101)
>>>
>>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>>> ion(GemFireCacheImpl.java:3308)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>>
>>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>>> ConfigurationRunnable.run(InternalLocator.java:649)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>>> nUntilShutdown(DistributionManager.java:621)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>>> 1.run(DistributionManager.java:878)
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> [error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1>
>>> tid=0x3d] Error occurred while initializing cluster configuration
>>>
>>> java.lang.RuntimeException: Error occurred while initializing cluster
>>> configuration
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:722)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>>
>>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>>> ConfigurationRunnable.run(InternalLocator.java:649)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>>> nUntilShutdown(DistributionManager.java:621)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>>> 1.run(DistributionManager.java:878)
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: 
>>> org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>>> Region /_ConfigurationRegion refusing to initialize from member
>>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /
>>> *member001.IP.MASKED*:/local/apps/shared/geode/members/Locato
>>> r1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712
>>> version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name
>>> RavenLocator1 which was offline when the local data from /
>>> *member002.IP.MASKED*:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2
>>> created at timestamp 1496241344046 version 0 diskStoreId
>>> df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>>
>>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>>> RegionProcessor.getInitialImageAdvice(CreatePersistentRegion
>>> Processor.java:52)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>>> ImageAndRecovery(DistributedRegion.java:1267)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>>> (DistributedRegion.java:1101)
>>>
>>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>>> ion(GemFireCacheImpl.java:3308)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>>
>>>         ... 7 more
>>>
>>>
>>>
>>> Thanks & Regards,
>>>
>>> Dharam
>>>
>>> This message is confidential and subject to terms at: http://
>>> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
>>> privilege, viruses and monitoring of electronic messages. If you are not
>>> the intended recipient, please delete this message and notify the sender
>>> immediately. Any unauthorized use is strictly prohibited.
>>>
>>
>>
>>
>> --
>> Cheers
>>
>> Jinmei
>>
>
>


-- 

*Mark Secrist | Sr Manager, **Global Education Delivery*

[email protected]

970.214.4567 Mobile

  *pivotal.io <http://www.pivotal.io/>*

Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
<http://www.linkedin.com/company/pivotalsoftware> | Facebook
<http://www.facebook.com/pivotalsoftware> | YouTube
<http://www.youtube.com/gopivotal> | Google+
<https://plus.google.com/105320112436428794490>

Re: How to deal with cluster configuration service failure

Reply via email to