Hello, in my current project we are trying to set up an activeMQ cluster with LevelDB replication. Before even trying to configure it in production we decided to have a short spike where we could try out simple failing scenarios. Our test configuration has a ZooKeeper ensemble of three nodes and an ActiveMQ cluster of three nodes.
The following is the configuration used for activeMQ: (of course the hostname is different for each node in the cluster) <persistenceAdapter> <replicatedLevelDB replicas="3" bind="tcp://0.0.0.0:0" hostname="activemq1" zkAddress="zk1:2181,zk2:2181,zk3:2181" zkPath="/activemq/leveldb-stores" /> </persistenceAdapter> We have tried different scenarios and they seem to work. The scenario we are not able to successfully try is when zookeeper leader goes down. We start up three instances of zookeeper and three instances of activemq. We observe that the zookeeper leader gets correctly elected. The zookeeper leader correctly elect an activeMQ master, which accepts producers messages and client can consume from it. Plus the web admin correctly works. We deliberately kill the zookeeper leader instance to see what happen. The other 2 zk instances successfully elect another leader. The activeMQ master doesn't seem able to recover from the election of the new leader. We get the following logs 2016-10-03 15:15:53,185 | ERROR | Could not accept connection : java.lang.InterruptedException | org.apache.activemq.broker.TransportConnector | ActiveMQ Transport Server Thread Han dler: mqtt://0.0.0.0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 2016-10-03 15:15:53,196 | INFO | Master stopped | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[localhost] Task-4 2016-10-03 15:15:53,205 | INFO | Connector ws stopped | org.apache.activemq.broker.TransportConnector | ActiveMQ BrokerService[localhost] Task-5 2016-10-03 15:15:53,217 | INFO | Connector vm://localhost stopped | org.apache.activemq.broker.TransportConnector | ActiveMQ BrokerService[localhost] Task-5 2016-10-03 15:15:53,227 | WARN | SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS confi guration file: '../../conf.tmp/login.config'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. | org.apache.zookeeper.ClientC nxn | WrapperSimpleAppMain-SendThread(zk2.docker_default:2181) 2016-10-03 15:15:53,228 | INFO | Opening socket connection to server zk2.docker_default/172.18.0.4:2181 | org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-SendThread(zk2.dock er_default:2181) 2016-10-03 15:15:53,228 | WARN | unprocessed event state: AuthFailed | org.apache.activemq.leveldb.replicated.groups.ZKClient | WrapperSimpleAppMain-EventThread 2016-10-03 15:15:53,230 | INFO | Socket connection established to zk2.docker_default/172.18.0.4:2181, initiating session | org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-Se ndThread(zk2.docker_default:2181) 2016-10-03 15:15:53,241 | INFO | Unable to read additional data from server sessionid 0x3578b1ac0d80000, likely server has closed socket, closing socket connection and attempting r econnect | org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-SendThread(zk2.docker_default:2181) 2016-10-03 15:15:53,363 | INFO | JobSchedulerStore: /data/activemq/localhost/scheduler stopped. | org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl | ActiveMQ Broker Service[localhost] Task-5 2016-10-03 15:15:53,367 | INFO | StateChangeDispatcher terminated. | org.apache.activemq.leveldb.replicated.groups.ZKClient | ZooKeeper state change dispatcher thread 2016-10-03 15:15:53,994 | INFO | Session: 0x3578b1ac0d80000 closed | org.apache.zookeeper.ZooKeeper | ActiveMQ BrokerService[localhost] Task-5 The new zookeeper instance is actually zk2, so it seems that the activeMQ master knows who is the leader of the ensemble. But as you can see it can't read additional data from the server. Do you guys successfully managed to make this configuration work? We are trying the entire configuration with a docker-compose stack version: '2' services: zk1: container_name: zk1 image: zookeeper:latest ports: - "2181:2181" - "2888:2888" - "3888:3888" environment: ZOO_MY_ID: 1 ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888 server.3=zk3:2888:3888 zk2: container_name: zk2 image: zookeeper:latest ports: - "22181:2181" - "22888:2888" - "33888:3888" environment: ZOO_MY_ID: 2 ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888 server.3=zk3:2888:3888 zk3: container_name: zk3 image: zookeeper:latest ports: - "23181:2181" - "32888:2888" - "43888:3888" environment: ZOO_MY_ID: 3 ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888 server.3=zk3:2888:3888 activemq1: container_name: activemq1 image: webcenter/activemq:5.13.2 ports: - "61617:61616" - "18161:8161" volumes: - /Users/Video/Projects/ActiveMqSpike/docker/activemq-conf:/opt/activemq/conf depends_on: - zk1 - zk2 - zk3 activemq2: container_name: activemq2 image: webcenter/activemq:5.13.2 ports: - "61618:61616" - "28161:8161" volumes: - /Users/Video/Projects/ActiveMqSpike/docker/activemq-conf2:/opt/activemq/conf depends_on: - zk1 - zk2 - zk3 activemq3: container_name: activemq3 image: webcenter/activemq:5.13.2 ports: - "61619:61616" - "38161:8161" volumes: - /Users/Video/Projects/ActiveMqSpike/docker/activemq-conf3:/opt/activemq/conf depends_on: - zk1 - zk2 - zk3 Where the volumes mounted contain the configuration I pasted at the beginning of this post. -- View this message in context: http://activemq.2283324.n4.nabble.com/Zookeeper-and-LevelDB-replication-non-reliable-tp4717449.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.