Hello! [04:45:53,179][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=/10.201.30.64:47603, rmtPort= 47603] [04:45:53,180][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage [node=TcpDiscoveryNode [id=47aa2 976-0a02-4ffe-9c8d-3f0fbfcc532b, addrs=[10.201.30.173], sockAddrs=[/ 10.201.30.173:0], discPort=0, order=0, intOrder=0, lastExchangeTime=1542861943131, loc=false, ver=2.4.0#20180305-sha1:aa3 42270, isClient=true], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@6ce6ae2, super=TcpDiscoveryAbstractMessage [sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1, id=5e14ec5 3761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]], next=TcpDiscoveryNode [id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, ad drs=[10.201.30.64], sockAddrs=[/10.201.30.64:47603], discPort=47603, order=53, intOrder=32, lastExchangeTime=1542272829304, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], err Msg=Failed to send message to next node [msg=TcpDiscoveryJoinRequestMessage [node=TcpDiscoveryNode [id=47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, addrs=[10.201.30.173], sockAddrs=[/10.201.30.173 :0], discPort=0, order=0, intOrder=0, lastExchangeTime=1542861943131, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataP acket@6ce6ae2, super=TcpDiscoveryAbstractMessage [sndNodeId=8a825790-a987-42c3-acb0-b3ea270143e1, id=5e14ec53761-47aa2976-0a02-4ffe-9c8d-3f0fbfcc532b, verifierNodeId=null, topVer=0, pending Idx=0, failedNodes=null, isClient=true]], next=ClusterNode [id=d7782a2e-4cfc-4427-8ba7-a9af3954ae3f, order=53, addr=[10.201.30.64], daemon=true]]] [04:45:53,190][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection p lease see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
and then, on another node: [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], sockAddrs=[/ 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, lastExchangeTime=1542861958327, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false] I think that you either have long GC pauses or flaky network (or system goes into swapping and such). Consider increasing 'ackTimeout' and/or 'failureDetectionTimeout'. Also consider collecting GC logs for your nodes, looking into them for a root cause. Regards, -- Ilya Kasnacheev пт, 30 нояб. 2018 г. в 14:01, Hemasundara Rao < hemasundara....@travelcentrictechnology.com>: > Hi Ilya Kasnacheev, > > I am attaching all logs from second server (10.201.30.64). > Please let me know if you need any other details. > > Thanks and Regards, > Hemasundar. > > On Fri, 30 Nov 2018 at 09:40, Hemasundara Rao < > hemasundara....@travelcentrictechnology.com> wrote: > >> Hi Ilya Kasnacheev, >> >> We are running one cluster node (10.201.30.63). I am attaching all logs >> from this server. >> Please let me know if you need any other details. >> >> Thanks and Regards, >> Hemasundar. >> >> >> On Thu, 29 Nov 2018 at 20:07, Ilya Kasnacheev <ilya.kasnach...@gmail.com> >> wrote: >> >>> Hello! >>> >>> It is not clear from this log alone why this node became segmented. Do >>> you have log from other server node in the topology? It was coordinator so >>> maybe it was the one experiencing problems. >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> ср, 28 нояб. 2018 г. в 13:56, Hemasundara Rao < >>> hemasundara....@travelcentrictechnology.com>: >>> >>>> Hi Ilya Kasnacheev, >>>> >>>> Did you get a chance to go though the log attached? >>>> This is one of the critical issue we are facing in our dev environment. >>>> Your input is of great help for us if we get, what is causing this >>>> issue and a probable solution to it. >>>> >>>> Thanks and Regards, >>>> Hemasundar. >>>> >>>> On Mon, 26 Nov 2018 at 16:54, Hemasundara Rao < >>>> hemasundara....@travelcentrictechnology.com> wrote: >>>> >>>>> Hi Ilya Kasnacheev, >>>>> I have attached the log file. >>>>> >>>>> Regards, >>>>> Hemasundar. >>>>> >>>>> On Mon, 26 Nov 2018 at 16:50, Ilya Kasnacheev < >>>>> ilya.kasnach...@gmail.com> wrote: >>>>> >>>>>> Hello! >>>>>> >>>>>> Maybe you have some data in your caches which causes runaway heap >>>>>> usage in your own code. Previously you did not have such data or code >>>>>> which >>>>>> would react in such fashion. >>>>>> >>>>>> It's hard to say, can you provide more logs from the node before it >>>>>> segments? >>>>>> >>>>>> Regards, >>>>>> -- >>>>>> Ilya Kasnacheev >>>>>> >>>>>> >>>>>> пн, 26 нояб. 2018 г. в 14:17, Hemasundara Rao < >>>>>> hemasundara....@travelcentrictechnology.com>: >>>>>> >>>>>>> Thank you very much Ilya Kasnacheev for your response. >>>>>>> >>>>>>> We are loading data initially, after that only small delta change >>>>>>> will be updated. >>>>>>> Grid down issue is happening after it is running successfully 2 to 3 >>>>>>> days. >>>>>>> Once the issue started, it is repeating frequently and not getting >>>>>>> any clue. >>>>>>> >>>>>>> Thanks and Regards, >>>>>>> Hemasundar. >>>>>>> >>>>>>> >>>>>>> On Mon, 26 Nov 2018 at 13:43, Ilya Kasnacheev < >>>>>>> ilya.kasnach...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello! >>>>>>>> >>>>>>>> Node will get segmented if other nodes fail to wait for Discovery >>>>>>>> response from that node. This usually means either network problems or >>>>>>>> long >>>>>>>> GC pauses causes by insufficient heap on one of nodes. >>>>>>>> >>>>>>>> Make sure your data load process does not cause heap usage spikes. >>>>>>>> >>>>>>>> Regards. >>>>>>>> -- >>>>>>>> Ilya Kasnacheev >>>>>>>> >>>>>>>> >>>>>>>> пт, 23 нояб. 2018 г. в 07:54, Hemasundara Rao < >>>>>>>> hemasundara....@travelcentrictechnology.com>: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> We are running two node ignite server cluster. >>>>>>>>> It was running without any issue for almost 5 days. We are using >>>>>>>>> this grid for static data. Ignite process is running with around 8GB >>>>>>>>> memory >>>>>>>>> after we load our data. >>>>>>>>> Suddenly grid server nodes going down , we tried 3 times running >>>>>>>>> the server nodes and loading static data. Those server node going down >>>>>>>>> again and again. >>>>>>>>> >>>>>>>>> Please let us know how to overcome these kind of issue. >>>>>>>>> >>>>>>>>> Attache the log file and configuration file. >>>>>>>>> >>>>>>>>> *Following Is the part of log from on server : * >>>>>>>>> >>>>>>>>> [04:45:58,335][WARNING][tcp-disco-msg-worker-#2%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Node is out of topology (probably, due to short-time network >>>>>>>>> problems). >>>>>>>>> [04:45:58,335][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Local node SEGMENTED: TcpDiscoveryNode >>>>>>>>> [id=8a825790-a987-42c3-acb0-b3ea270143e1, addrs=[10.201.30.63], >>>>>>>>> sockAddrs=[/ >>>>>>>>> 10.201.30.63:47600], discPort=47600, order=42, intOrder=23, >>>>>>>>> lastExchangeTime=1542861958327, loc=true, >>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, >>>>>>>>> isClient=false] >>>>>>>>> [04:45:58,335][INFO][tcp-disco-sock-reader-#78%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Finished serving remote node connection [rmtAddr=/ >>>>>>>>> 10.201.30.64:36695, rmtPort=36695 >>>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#70%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Finished serving remote node connection [rmtAddr=/ >>>>>>>>> 10.201.30.172:58418, rmtPort=58418 >>>>>>>>> [04:45:58,337][INFO][tcp-disco-sock-reader-#74%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Finished serving remote node connection [rmtAddr=/ >>>>>>>>> 10.201.10.125:63403, rmtPort=63403 >>>>>>>>> [04:46:01,516][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Pinging node: 6a603d8b-f8bf-40bf-af50-6c04a56b572e >>>>>>>>> [04:46:01,546][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Finished node ping [nodeId=6a603d8b-f8bf-40bf-af50-6c04a56b572e, >>>>>>>>> res=true, >>>>>>>>> time=49ms] >>>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Pinging node: 5ec6ee69-075e-4829-84ca-ae40411c7bc3 >>>>>>>>> [04:46:02,482][INFO][tcp-comm-worker-#1%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Finished node ping [nodeId=5ec6ee69-075e-4829-84ca-ae40411c7bc3, >>>>>>>>> res=false, >>>>>>>>> time=7ms] >>>>>>>>> [04:46:08,283][INFO][tcp-disco-sock-reader-#4%StaticGrid_NG_Dev%][TcpDiscoverySpi] >>>>>>>>> Finished serving remote node connection [rmtAddr=/ >>>>>>>>> 10.201.30.64:48038, rmtPort=48038 >>>>>>>>> [04:46:08,367][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Restarting JVM according to configured segmentation policy. >>>>>>>>> [04:46:08,388][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Node FAILED: TcpDiscoveryNode >>>>>>>>> [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, >>>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], >>>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642, >>>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false] >>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Topology snapshot [ver=680, servers=1, clients=17, CPUs=36, >>>>>>>>> offheap=8.0GB, >>>>>>>>> heap=84.0GB] >>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Data Regions Configured: >>>>>>>>> [04:46:08,389][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB, >>>>>>>>> persistenceEnabled=false] >>>>>>>>> [04:46:08,396][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][time] >>>>>>>>> Started exchange init [topVer=AffinityTopologyVersion [topVer=680, >>>>>>>>> minorTopVer=0], crd=true, evt=NODE_FAILED, >>>>>>>>> evtNode=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, customEvt=null, >>>>>>>>> allowMerge=true] >>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] >>>>>>>>> Finished waiting for partition release future >>>>>>>>> [topVer=AffinityTopologyVersion [topVer=680, minorTopVer=0], >>>>>>>>> waitTime=0ms, >>>>>>>>> futInfo=NA] >>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] >>>>>>>>> Coordinator received all messages, try merge >>>>>>>>> [ver=AffinityTopologyVersion >>>>>>>>> [topVer=680, minorTopVer=0]] >>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridCachePartitionExchangeManager] >>>>>>>>> Stop merge, custom task found: WalStateNodeLeaveExchangeTask >>>>>>>>> [node=TcpDiscoveryNode [id=20687a72-b5c7-48bf-a5ab-37bd3f7fa064, >>>>>>>>> addrs=[10.201.30.64], sockAddrs=[/10.201.30.64:47601], >>>>>>>>> discPort=47601, order=41, intOrder=22, lastExchangeTime=1542262724642, >>>>>>>>> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]] >>>>>>>>> [04:46:08,398][INFO][exchange-worker-#42%StaticGrid_NG_Dev%][GridDhtPartitionsExchangeFuture] >>>>>>>>> finishExchangeOnCoordinator [topVer=AffinityTopologyVersion >>>>>>>>> [topVer=680, >>>>>>>>> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=680, >>>>>>>>> minorTopVer=0]] >>>>>>>>> [04:46:08,512][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Node FAILED: TcpDiscoveryNode >>>>>>>>> [id=6a603d8b-f8bf-40bf-af50-6c04a56b572e, >>>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/ >>>>>>>>> 10.201.30.172:0], discPort=0, order=98, intOrder=53, >>>>>>>>> lastExchangeTime=1542348596592, loc=false, >>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true] >>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Topology snapshot [ver=683, servers=1, clients=16, CPUs=36, >>>>>>>>> offheap=8.0GB, >>>>>>>>> heap=78.0GB] >>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Data Regions Configured: >>>>>>>>> [04:46:08,512][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB, >>>>>>>>> persistenceEnabled=false] >>>>>>>>> [04:46:08,513][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Node FAILED: TcpDiscoveryNode >>>>>>>>> [id=5ec6ee69-075e-4829-84ca-ae40411c7bc3, >>>>>>>>> addrs=[10.201.30.172], sockAddrs=[BLRVM-HHNG01.devdom/ >>>>>>>>> 10.201.30.172:0], discPort=0, order=129, intOrder=71, >>>>>>>>> lastExchangeTime=1542360580600, loc=false, >>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true] >>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Topology snapshot [ver=684, servers=1, clients=15, CPUs=36, >>>>>>>>> offheap=8.0GB, >>>>>>>>> heap=72.0GB] >>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Data Regions Configured: >>>>>>>>> [04:46:08,513][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> ^-- Default_Region [initSize=256.0 MiB, maxSize=8.0 GiB, >>>>>>>>> persistenceEnabled=false] >>>>>>>>> [04:46:08,514][WARNING][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Node FAILED: TcpDiscoveryNode >>>>>>>>> [id=224648a6-e515-479e-88e4-44f7bceaeb14, >>>>>>>>> addrs=[10.201.50.96], sockAddrs=[BLRWSVERMA3420.devdom/ >>>>>>>>> 10.201.50.96:0], discPort=0, order=175, intOrder=96, >>>>>>>>> lastExchangeTime=1542365246419, loc=false, >>>>>>>>> ver=2.4.0#20180305-sha1:aa342270, isClient=true] >>>>>>>>> [04:46:08,514][INFO][disco-event-worker-#41%StaticGrid_NG_Dev%][GridDiscoveryManager] >>>>>>>>> Topology snapshot [ver=685, servers=1, clients=14, CPUs=32, >>>>>>>>> offheap=8.0GB, >>>>>>>>> heap=71.0GB] >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Hemasundara Rao Pottangi | Senior Project Leader >>>>>>>>> >>>>>>>>> [image: HotelHub-logo] >>>>>>>>> HotelHub LLP >>>>>>>>> Phone: +91 80 6741 8700 >>>>>>>>> Cell: +91 99 4807 7054 >>>>>>>>> Email: hemasundara....@hotelhub.com >>>>>>>>> Website: www.hotelhub.com <http://hotelhub.com/> >>>>>>>>> ------------------------------ >>>>>>>>> >>>>>>>>> HotelHub LLP is a service provider working on behalf of Travel >>>>>>>>> Centric Technology Ltd, a company registered in the United Kingdom. >>>>>>>>> DISCLAIMER: This email message and all attachments are >>>>>>>>> confidential and may contain information that is Privileged, >>>>>>>>> Confidential >>>>>>>>> or exempt from disclosure under applicable law. If you are not the >>>>>>>>> intended >>>>>>>>> recipient, you are notified that any dissemination, distribution or >>>>>>>>> copying >>>>>>>>> of this email is strictly prohibited. If you have received this email >>>>>>>>> in >>>>>>>>> error, please notify us immediately by return email to >>>>>>>>> noti...@travelcentrictechnology.com and destroy the original >>>>>>>>> message. Opinions, conclusions and other information in this message >>>>>>>>> that >>>>>>>>> do not relate to the official business of Travel Centric Technology >>>>>>>>> Ltd or >>>>>>>>> HotelHub LLP, shall be understood to be neither given nor endorsed by >>>>>>>>> either company. >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >> > > -- > Hemasundara Rao Pottangi | Senior Project Leader > > [image: HotelHub-logo] > HotelHub LLP > Phone: +91 80 6741 8700 > Cell: +91 99 4807 7054 > Email: hemasundara....@hotelhub.com > Website: www.hotelhub.com <http://hotelhub.com/> > ------------------------------ > > HotelHub LLP is a service provider working on behalf of Travel Centric > Technology Ltd, a company registered in the United Kingdom. > DISCLAIMER: This email message and all attachments are confidential and > may contain information that is Privileged, Confidential or exempt from > disclosure under applicable law. If you are not the intended recipient, you > are notified that any dissemination, distribution or copying of this email > is strictly prohibited. If you have received this email in error, please > notify us immediately by return email to > noti...@travelcentrictechnology.com and destroy the original message. > Opinions, conclusions and other information in this message that do not > relate to the official business of Travel Centric Technology Ltd or > HotelHub LLP, shall be understood to be neither given nor endorsed by > either company. > >