OK, so I want to recap and make sure I've got everything correct. Please confirm that there's nothing I'm mis-stating.
Your KahaDB data directory contained lots of files that were older than the age-off interval that all of your producers are using when they produce messages, which is unexpected because every message in those files should either have been successfully consumed or deleted due to expiration. Although your first posted configuration doesn't list it, you have the discarding DLQ plugin in use, so you're sure that none of those early messages are in the DLQ. Your use patterns involve consumers with dedicated queues, where the consumer might disappear and never return and where no other consumer will consume from the queue. So once the consumer for a given queue disconnects for the very last time, any messages still in the queue or published later will expire and there will never be another consumer connecting to the queue. Yesterday, you did an experiment where you browsed all the queues twice, the first time using a selector you knew wouldn't match any messages, and the second time using a selector that you knew would match some but not all messages. After the first browse, there was no change in which KahaDB files were in the data directory; after the second browse, a substantial number of files were deleted. Two things (maybe three) changed between your first and your second browse: 1. You used a selector that actually matched messages. 2. Time passed, which means it's possible that there were some messages that were not eligible for expiration when you examined the data directory after the first browse but that had been expired after the second browse. 3. If this was a production broker instance with consumers connected to it, it's possible that consumers consumed old messages on their own. Any of these three differences could have been responsible for data files being deleted, but your message on December 23 said that the oldest file has a modification date of December 8, which means that any messages in it with a 3-week expiration period would be expiring yesterday, possibly while you were running your experiment. In my message on December 24, I recommended that you make a backup copy of all of the data files, and then use a copy of that backup to stand up a dev/test broker that you could use for experiments. To see if #2 is the full explanation, take the backup copy (which should still have the December 8 file in it) and place a copy of it into your dev/test broker's data directory (while the broker is down) and restart the broker without connecting any consumers. If the broker immediately deletes the data file, then you know the deletion didn't require your browsing activities to delete those files, and the broker is working as designed. If not, repeat your browsing with the selector that matches important messages, and see if that causes the data files to be deleted, which would indicate that the broker isn't expiring messages when there's no activity (consumption or browsing) on the queue, which would be a bug. But Tim Bish has previously said (http://stackoverflow.com/a/19143643) that the broker does periodic sweeps of all queues (even ones with no consumers) to discard any expired messages, so I don't believe this will turn out to be the case. You pointed out that when there are a few lingering unconsumed messages but nearly all messages have been consumed, KahaDB's live-to-dead ratio is very bad. Yes, this is a known limitation of the design. If you want KahaDB to support compaction of journal files, feel free to vote for https://issues.apache.org/jira/browse/AMQ-3978. You also asked whether changing the data file sizes would improve things for you. In theory, smaller files would increase the odds that any given file doesn't contain acks for messages in an earlier files that's being kept alive because of a small number of unconsumed messages, but in practice it's entirely a dice roll about exactly what order your messages are received by the broker compared to where file breaks occur. And since your usage pattern seems to be one where messages often sit for a while before being consumed, your odds that all acks for messages in a give file are contained within that file are much lower than average, probably close to zero. So although in theory having more small files increases the odds that at least some of them can be eliminated, for your use case it might not. But you can always give it a try and see. (Unfortunately, your 3-week message expiration period means that it'll take you almost a month to run that experiment.) But that's really just window dressing on the real problem: your usage pattern is one that allows messages to sit unconsumed and not discarded for three whole weeks at a time. This is the point where Art Naseef would break out the refrain "ActiveMQ is not a database" and tell you that you need to change your usage patterns so messages are dealt with (consumed or discarded) much sooner, like maybe after one day instead of three weeks. If you can live with clients losing messages that aren't consumed after a day (or whatever shorter interval you choose), then just change the JMSExpiration each producer uses and you're done. If not, you can use your own processes to consume messages after the 1-day interval and store their data into a database, from which consumers who reconnected later would read before they start processing the live stream of backed-up messages. (You could also write the messages into queues that are backed by a different KahaDB instance; since this will only be the unconsumed messages, the disk space used will be much smaller.) Another option would be to move to LevelDB instead of KahaDB; recent email threads seem to indicate that replicated LevelDB still has some bugs that are being (maybe have been) worked out, but they all seem to be in the replication aspect so I'd expect that non-replicated LevelDB would probably work without issue. Or you could just leave things the way they are and allocate a lot of disk space to KahaDB and just live with it. Tim On Wed, Dec 30, 2015 at 1:49 AM, Shine <activ...@club.webhop.org> wrote: > Hi Tim, > > I subscribed every queue with a tool, but I used a selector which never > match a message. > => no changes in den data folder > > Than I used a selector which filters unimportant messages. > => the server removed about 150 files from the data folder. > > I think the activemq server works fine and if you consume all messages than > all files will be remove. > > Now 750 unconsumed messages are left on the broker. The size of each > message > is 3 KB. > => 750 * 3 KB = 2,2 MB > ==> the data folder uses 15,5 GB > > Is it a good idea to set the maximum size of the message log file to 8 MB > or > 4 MB or 2 MB instead 32 MB? > > > regards > Shine > > > > -- > View this message in context: > http://activemq.2283324.n4.nabble.com/Need-help-with-a-message-broker-configuration-tp4705074p4705484.html > Sent from the ActiveMQ - User mailing list archive at Nabble.com. >