Hi Abhishek,

What's your Ignite version? Anything else to note about the cluster? E.g.
frequent topology changes (clients or servers joining and leaving, caches
starting and stopping)? What was the topology version when this happened?

Regarding the GC. Try adding -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime to your logging options, and share
the GC logs. Sometimes there are long pauses which can be seen in the logs
which are not GC pauses. Check the "Total time for which application
threads were stopped" and "Stopping threads took".

Stan

On Wed, Aug 21, 2019 at 7:17 PM Abhishek Gupta (BLOOMBERG/ 731 LEX) <
agupta...@bloomberg.net> wrote:

> Hello,
> I'm using ZK based discovery for my 6 node grid. Its been working smoothly
> for a while until suddenly my ZK node went OOM. Turns out there were 1000s
> of znodes, many with data about ~1M + there were suddenly a lot of stuff ZK
> requests (tx log was huge).
>
> One symptom on the grid to notes is that when this happened my nodes were
> heavily stalling (this is a separate issue to discuss - they're stalling
> with lots of high JVM pauses but GC logs appear alright) and were also
> getting heavy write from DataStreamers.
>
> I see the joinData znode having many 1000s of persistent children. I'd
> like to undersstand why so many znodes were created under 'jd' and what's
> the best way to prevent this and clean up these child nodes under jd.
>
>
> Thanks,
> Abhishek
>
>
>
>
>

Reply via email to