Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread userx
Hi Pavel, The exchange finished taking its time, but during that time, new client was not able to write to the cache. So what happened was that There were 4 Ignite servers out of a bunch of 19 (as you can see in the consistentids) in my message above, that their acknowledgement to Coordinator

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread Pavel Kovalenko
Hello, I don't clearly understand from your message, but have the exchange finally finished? Or you were getting this WARN message all the time? пт, 1 мая 2020 г. в 12:32, Ilya Kasnacheev : > Hello! > > This description sounds like a typical hanging Partition Map Exchange, but > you should be

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread Ilya Kasnacheev
Hello! This description sounds like a typical hanging Partition Map Exchange, but you should be able to see that in logs. If you don't, you can collect thread dumps from all nodes with jstack and check it for any stalling operations (or share with us). Regards, -- Ilya Kasnacheev пт, 1 мая

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2020-05-01 Thread userx
Hi Pavel, I am using 2.8 and still getting the same issue. Here is the ecosystem 19 Ignite servers (S1 to S19) running at 16GB of max JVM and in persistent mode. 96 Clients (C1 to C96) There are 19 machines, 1 Ignite server is started on 1 machine. The clients are evenly distributed across

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-11 Thread ihalilaltun
Hi Pavel, Thank you for detailed explanation. We are discussing hotfix with management, but i think decision will be negative :( I think we'll have to wait 2.8 release, which seems to be released on January 17, 2020. I hope we'll have this issue by then. Regards. - İbrahim Halil Altun

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-11 Thread Pavel Kovalenko
Ibrahim, I've checked logs and found the following issue: [2019-09-27T15:00:06,164][ERROR][sys-stripe-32-#33][atomic] Received message without registered handler (will ignore) [msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, arr=[6389728]]],

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-11 Thread ihalilaltun
Hi Pavel, Here is the logs from node with localId:3561ac09-6752-4e2e-8279-d975c268d045 ignite-2019-10-06.gz cache creation is done with java code on our side, we use getOrCreateCache method, here is the piece of

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-10 Thread Pavel Kovalenko
Ibrahim, Could you please also share the cache configuration that is used for dynamic creation? чт, 10 окт. 2019 г. в 19:09, Pavel Kovalenko : > Hi Ibrahim, > > I see that one node didn't send acknowledgment during cache creation: > [2019-09-27T15:00:17,727][WARN >

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-10 Thread Pavel Kovalenko
Hi Ibrahim, I see that one node didn't send acknowledgment during cache creation: [2019-09-27T15:00:17,727][WARN ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await partitions release latch within timeout: ServerLatch [permits=1,

Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-09 Thread ihalilaltun
Hi There Igniters, We had a very strange cluster behivour while creating new caches on the fly. Just after caches are created we start get following warnings from all cluster nodes, including coordinator node; [2019-09-27T15:00:17,727][WARN