Re: Ignite cluster going down frequently

Ilya Kasnacheev Thu, 29 Nov 2018 06:38:19 -0800

Hello!

It is not clear from this log alone why this node became segmented. Do you
have log from other server node in the topology? It was coordinator so
maybe it was the one experiencing problems.


Regards,
-- 
Ilya Kasnacheev


ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao <
hemasundara....@travelcentrictechnology.com>:

> Hi  Ilya Kasnacheev,
>
>  Did you get a chance to go though the log attached?
> This is one of the critical issue we are facing in our dev environment.
> Your input is of great help for us if we get, what is causing this issue
> and a probable solution to it.
>
> Thanks and Regards,
> Hemasundar.
>
> On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao <
> hemasundara....@travelcentrictechnology.com> wrote:
>
>> Hi  Ilya Kasnacheev,
>>   I have attached the log file.
>>
>> Regards,
>> Hemasundar.
>>
>> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev <ilya.kasnach...@gmail.com>
>> wrote:
>>
>>> Hello!
>>>
>>> Maybe you have some data in your caches which causes runaway heap usage
>>> in your own code. Previously you did not have such data or code which would
>>> react in such fashion.
>>>
>>> It's hard to say, can you provide more logs from the node before it
>>> segments?
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao <
>>> hemasundara....@travelcentrictechnology.com>:
>>>
>>>> Thank you very much Ilya Kasnacheev for your response.
>>>>
>>>> We are loading data initially, after that only small delta change will
>>>> be updated.
>>>> Grid down issue is happening after it is running successfully 2 to 3
>>>> days.
>>>> Once the issue started, it is repeating frequently and not getting any
>>>> clue.
>>>>
>>>> Thanks and Regards,
>>>> Hemasundar.
>>>>
>>>>
>>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev <
>>>> ilya.kasnach...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> Node will get segmented if other nodes fail to wait for Discovery
>>>>> response from that node. This usually means either network problems or 
>>>>> long
>>>>> GC pauses causes by insufficient heap on one of nodes.
>>>>>
>>>>> Make sure your data load process does not cause heap usage spikes.
>>>>>
>>>>> Regards.
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao <
>>>>> hemasundara....@travelcentrictechnology.com>:
>>>>>
>>>>>> Hi All,
>>>>>> We are running two node ignite server cluster.
>>>>>> It was running without any issue for almost 5 days. We are using this
>>>>>> grid for static data. Ignite process is running with around 8GB memory
>>>>>> after we load our data.
>>>>>> Suddenly grid server nodes going down , we tried 3 times running the
>>>>>> server nodes and loading static data. Those server node going down again
>>>>>> and again.
>>>>>>
>>>>>> Please let us know how to overcome these kind of issue.
>>>>>>
>>>>>> Attache the log file and configuration file.
>>>>>>
>>>>>> *Following Is the part of log from on server : *
>>>>>>
>>>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Node is out of topology (probably, due to short-time network problems).
>>>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Local node SEGMENTED: TcpDiscoveryNode
>>>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], 
>>>>>> sockAddrs=[/
>>>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23,
>>>>>> lastExchangeTime=1542861958327, loc=true, 
>>>>>> ver=2.4.0#20180305-sha1:aa342270,
>>>>>> isClient=false]
>>>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:36695,
>>>>>> rmtPort=36695
>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.172:58418,
>>>>>> rmtPort=58418
>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.10.125:63403,
>>>>>> rmtPort=63403
>>>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e
>>>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, 
>>>>>> res=true,
>>>>>> time=49ms]
>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3
>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, 
>>>>>> res=false,
>>>>>> time=7ms]
>>>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi]
>>>>>> Finished serving remote node connection [rmtAddr=/10.201.30.64:48038,
>>>>>> rmtPort=48038
>>>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Restarting JVM according to configured segmentation policy.
>>>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, 
>>>>>> offheap=8.0GB,
>>>>>> heap=84.0GB]
>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Data Regions Configured:
>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>> persistenceEnabled=false]
>>>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time]
>>>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680,
>>>>>> minorTopVer=0], crd=true, evt=NODE_FAILED,
>>>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null,
>>>>>> allowMerge=true]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>> Finished waiting for partition release future
>>>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], 
>>>>>> waitTime=0ms,
>>>>>> futInfo=NA]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>> Coordinator received all messages, try merge [ver=AffinityTopologyVersion
>>>>>> [topVer=680, minorTopVer=0]]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager]
>>>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask
>>>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064,
>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601],
>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]]
>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture]
>>>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=680,
>>>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, 
>>>>>> minorTopVer=0]]
>>>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e,
>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>>>> discPort=0, order=98, intOrder=53, lastExchangeTime=1542348596592,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, 
>>>>>> offheap=8.0GB,
>>>>>> heap=78.0GB]
>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Data Regions Configured:
>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>> persistenceEnabled=false]
>>>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3,
>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/10.201.30.172:0],
>>>>>> discPort=0, order=129, intOrder=71, lastExchangeTime=1542360580600,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, 
>>>>>> offheap=8.0GB,
>>>>>> heap=72.0GB]
>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Data Regions Configured:
>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>>  ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB,
>>>>>> persistenceEnabled=false]
>>>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Node FAILED: TcpDiscoveryNode [id=224648a6-e515-479e-88e4-44f7bceaeb14,
>>>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/10.201.50.96:0],
>>>>>> discPort=0, order=175, intOrder=96, lastExchangeTime=1542365246419,
>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true]
>>>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager]
>>>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, 
>>>>>> offheap=8.0GB,
>>>>>> heap=71.0GB]
>>>>>>
>>>>>> --
>>>>>> Hemasundara Rao Pottangi  | Senior Project Leader
>>>>>>
>>>>>> [image: HotelHub-logo]
>>>>>> HotelHub LLP
>>>>>> Phone: +91 80 6741 8700
>>>>>> Cell: +91 99 4807 7054
>>>>>> Email: hemasundara....@hotelhub.com
>>>>>> Website: www.hotelhub.com <http://hotelhub.com/>
>>>>>> ------------------------------
>>>>>>
>>>>>> HotelHub LLP is a service provider working on behalf of Travel
>>>>>> Centric Technology Ltd, a company registered in the United Kingdom.
>>>>>> DISCLAIMER: This email message and all attachments are confidential
>>>>>> and may contain information that is Privileged, Confidential or exempt 
>>>>>> from
>>>>>> disclosure under applicable law. If you are not the intended recipient, 
>>>>>> you
>>>>>> are notified that any dissemination, distribution or copying of this 
>>>>>> email
>>>>>> is strictly prohibited. If you have received this email in error, please
>>>>>> notify us immediately by return email to
>>>>>> noti...@travelcentrictechnology.com and destroy the original
>>>>>> message. Opinions, conclusions and other information in this message that
>>>>>> do not relate to the official business of Travel Centric Technology Ltd 
>>>>>> or
>>>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by
>>>>>> either company.
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

Re: Ignite cluster going down frequently

Reply via email to