Thank you for your response. Regarding the cluster reconnection, I used the 
setting shown in the scaledown examples:

    <!-- since the backup servers scale down we need a sensible setting here so 
the bridge will stop -->
    <reconnect-attempts>5</reconnect-attempts>

I'm not sure how much effect it has on this situation though (at least in case 
2), since I didn't notice anything being logged about the bridge being stopped 
and JMX reported the cluster topology being intact. Just that the other node's 
internal forwarding queue, the one that had messages piling up, had consumers: 
0. The other node's internal queue had the one consumer it was supposed to.

I am using a version built from master approx. two weeks ago.

BR,
- Ilkka

-----Original Message-----
From: Clebert Suconic <clebert.suco...@gmail.com> 
Sent: 3. toukokuuta 2018 18:26
To: users@activemq.apache.org
Subject: Re: Artemis 2.5.0 - Colocated scaledown cluster issues

On Wed, May 2, 2018 at 3:01 AM, Ilkka Virolainen <ilkka.virolai...@bitwise.fi> 
wrote:
> Hello,
>
> As well as some previous issues [1] I have some problems with my Artemis 
> cluster. My setup [2] is a symmetric two node cluster of colocated instances 
> with scaledown. As well as the node restart causing a problematic state in 
> replication [1] there are other issues, namely:
>
> 1) After running for approximately two weeks one of the nodes crashed to heap 
> space exhaustion. Heap dump analysis would indicate that this is due to 
> cluster connection failing and millions of messages would end up in the 
> internal store-and-forward queue causing an eventual OOM exception - I guess 
> the internal messages are not paged?

You can configure it to paging...

Also.. on cluster conneciton you can configure the max-retry of the 
cluster-connectoin...

I'm not talking about replication here. .this is probably about another node 
that still connected.

>
> 2) I have now run the cluster for ~2 weeks and the cluster has ended up in a 
> state where messages are being redistributed from node 1 to node 2 BUT not 
> the other way around. This can be the same issue as 1) but I cannot tell for 
> sure. I tried setting the core server logging level to DEBUG on node 2 and 
> sending messages to a test topic but I get no references to the address name 
> in Artemis logs.

Check what I talked about reconnects on cluster connection.



If you were using master.. there's a way you can consume messages from the 
internal queue.. and send them manually using producer / consumer.. you will 
need to get a snapshot from master.


>
> I realize that it's difficult to address these problems given the information 
> at hand and due to the problematic nature of the circumstances in which they 
> occur: they (excl. the issue described in [1]) start to appear after running 
> a cluster for a long time and there's no apparent cause or easy way of 
> replication. I would however appreciate if anyone has tips to debug this 
> issue further or has advice on where to look for a probable cause.
>
> - Ilkka
>
> [1] Backup voting issue: 
> http://activemq.2283324.n4.nabble.com/Artemis-2-5-0-Problems-with-colo
> cated-scaledown-td4737583.html#a4737808
> [2] Sample brokers: 
> https://github.com/ilkkavi/activemq-artemis/tree/scaledown-issue/issue
> s/IssueExample/src/main/resources/activemq



--
Clebert Suconic

Reply via email to