Hello, We have activemq-5.16.4 and java-1.8.0-openjdk.x86_64 1:1.8.0.332.b09-1.el7_9 running on rhel7.
The following was done on our acceptance cluster. I check activemq.log for messages to determine if activemq has corrupt data files: [kxn2@amq-a02 scheduler]$ sudo grep "Failed to start job scheduler store" /opt/local/activemq/data/activemq.log | head -1 2022-06-03 16:00:46,670 | ERROR | Failed to start job scheduler store: JobSchedulerStore: /opt/local/apache-activemq-5.16.4/data/amq-acceptance-cluster/scheduler | org.apache.activemq.broker.BrokerService | main Then I move scheduleDB files after stopping activemq.service on both brokers. cd /opt/local/activemq/data/kahadb/scheduler sudo mv scheduleDB.data scheduleDB.data.`date +%Y%m%d`; sudo mv scheduleDB.redo scheduleDB.redo.`date +%Y%m%d` After starting ActiveMQ, 7,500,000 entries were recovered, but it failed with ERROR | Failed to start job scheduler store. There was a corrupt journal file. [kxn2@amq-a02 data]$ grep Corrupt activemq.log* 2022-06-02 07:55:40,066 | WARN | Corrupt journal records found in '/opt/local/apache-activemq-5.16.4/data/amq-acceptance-cluster/scheduler/db-1179.log' between offsets: 11558626..11559784 | org.apache.activemq.store.kahadb.disk.journal.Journal | main We tried starting activemq without the db-1179.log file, with an empty db-1179.log file. ActiveMQ complained about both. We eventually stopped activemq, renamed the schedule/ directory and started activemq. After we restarted, we have one db-*.log file with 50K messages. [kxn2@amq-a02 scheduler]$ wc -l db-1.log 50,067 db-1.log Before we had 125 log files and 8.697,209 messages! [kxn2@amq-a02 scheduler.bkup]$ wc -l db-*.log ... 8,697,209 total So, we have millions of messages that we probably do not need. It took 2.5 hours to recover 7.5M entries before it failed; likely due to the corrupt record. How can I get activemq to clean up these logs, so this recovery doesn't take so long? How can I correct the data corruption? For a test, I did remove the range of the file between offsets: 11558626..11559784. I used the "head -c" command, grep and vi to do that. ActiveMQ did start. I am hoping that this doesn't happen in production, because it won't be acceptable to lose messages to get activemq to start up. --- Karl Nordström Systems Administrator Penn State IT | Application Platforms