On Fri, 2009-07-24 at 10:50 -0700, Sandy Pratt wrote: > Hi, > > I'm experimenting with the qpidd broker supplied with Fedora 10 (qpidd > (qpidc) version 0.5 - the install should be up to date as of yesterday). > Basically, I'm having a problem where I've been able to enqueue enough > messages that my journal is full and I can't start the broker because of > this. This is contrary to my understanding of how persistent queues were > supposed to work, in a number of ways: > > 1) I thought producers were not allowed to commit messages past the 80% full > point. Correct. You should see a Enqueue capacity threshold error message such as: 2009-07-24 14:20:48 notice Journal "perftest0": Created 2009-07-24 14:23:31 warning Journal "perftest0": Enqueue capacity threshold exceeded on queue "perftest0". 2009-07-24 14:23:31 error Unexpected exception: Enqueue capacity threshold exceeded on queue "perftest0". (JournalImpl.cpp:577) 2009-07-24 14:23:31 error Connection 127.0.0.1:55390 closed by error: Enqueue capacity threshold exceeded on queue "perftest0". (JournalImpl.cpp:577)(501)
which closes the connection to the producer. The journal is a circular disk buffer. Both enqueues and dequeues are performed by writing records at the head of the journal. Thus a full journal is a fatal condition because you can neither enqueue nor dequeue (to make space). The 80% policy limits enqueues only; dequeues are not subject to this limit as these operations inherently free up journal space. In practice, it may be possible to fill a journal by using tiny messages (~10 bytes) to just below the enqueue threshold, then dequeueing all these messages except the first. In this case the sheer number of dequeues will fill the remaining approx. 20% of the journal while keeping the first message in place as a blocker. One of the functional tests uses this technique to test that a full journal occurs as expected. > > 2) I thought the flow-to-disk policy would result in excess messages being > accepted into a slower interim storage, not being rejected or causing the > broker to crash. This is incorrect. However, staged messages (ie messages that exceed a broker per-message size threshold) are stored outside the journal in BDB. > > 3) It appears as though I was able to crash the broker and cause the apparent > loss of acked messages simply by sending the broker too many? I have not been able to reproduce your scenario. I have attempted to create and set up the broker using your parameters, but using a c++ client rather than a Java one, I always hit the enqueue threshold first (as expected) I have even turned on a transactional block size of 1 (whcih emulates JMS IIRC), but in all cases, I have not been able to reproduce this. I would be interested in the following if you are able to supply it: 1. A reproducer (ie a program that will cause this error) that I can run, and/or 2. A copy of the journal files so I can take a look at the state of the journals. > > ... > > Here are some details about what I was doing: > > I created a queue with the following command: > > qpid-config --durable add queue test3 --durable --limit-policy flow-to-disk > > This created the following journal files: > > [r...@prattrs-fedora test3]# ll > total 12356 > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:26 JournalData.0000.jdat > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:27 JournalData.0001.jdat > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0002.jdat > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0003.jdat > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0004.jdat > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:29 JournalData.0005.jdat > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:22 JournalData.0006.jdat > -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:22 JournalData.0007.jdat > -rw-r----- 1 qpidd qpidd 866 2009-07-22 14:23 JournalData.jinf > [r...@prattrs-fedora test3]# du -h > 13M . > [r...@prattrs-fedora test3]# This is a standard 12.5MiB journal. > > I then did a test where I ran some producers in Java JMS for a while without > any consumers running: > > > TextMessage msg = > sendSession.createTextMessage(testContext.getPrefix() > > + > RandomUtils.randomAlphaNumericString(4085)); > > producer.send(msg, DeliveryMode.PERSISTENT, > Message.DEFAULT_PRIORITY, > > > Message.DEFAULT_TIME_TO_LIVE); > > Unsurprisingly, my journal quickly filled up and the broker showed this in > logs: > > Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Failed to > prepare: Journal full on queue "test3". (JournalImpl.cpp:586) > Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 critical Journal > "test3": Journal full on queue "test3". > Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Error > completing xid > yu\x00\x00\x00\x00\x00\x00\x97\xD6\xA9aP\x18IM\xB6\x91\xFA\xF1\xE6\x18}W: > Journal full on queue "test3". (JournalImpl.cpp:586) > Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Commit > failed with exception: Journal full on queue "test3". (JournalImpl.cpp:586) > Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Execution > exception: internal-error: Commit failed (qpid/broker/SemanticState.cpp:123) > Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Channel > exception: not-attached: Channel 0 is not attached > (qpid/amqp_0_10/SessionHandler.cpp:40) > Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 critical Journal > "test3": Journal full on queue "test3". > Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 error Unexpected > exception: Journal full on queue "test3". (JournalImpl.cpp:586) > Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 error Connection > 153.32.227.108:57837 closed by error: Journal full on queue "test3". > (JournalImpl.cpp:586)(501) > > I can sort of understand this, although it means flow-to-disk is not > functioning for my queue, I suppose. However, it turns out that I can't > restart the broker either: > > Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Store > module initialized; dir=/var/lib/qpidd > Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal > "test1": Created > Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal > "test2": Created > Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 warning Journal > "test2": Recovery found 12 files (different from --num-jfiles value of 8). > Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 warning Journal > "test2": Recovery found file size = 12 (different from --jfile-size-pgs value > of 24). > Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal > "test3": Created > Jul 23 17:40:01 localhost qpidd[6303]: 2009-07-23 17:40:01 critical Daemon > startup failed: Queue test3: recoverQueues() failed: jexception 0x0205 > jcntl::recover() threw JERR_JCNTL_RECOVERJFULL: Journal data files full, > cannot write. (MessageStoreImpl.cpp:785) > Jul 23 17:40:01 localhost qpidd[6302]: 2009-07-23 17:40:01 critical Broker > start-up failed: Daemon startup failed: Queue test3: recoverQueues() failed: > jexception 0x0205 jcntl::recover() threw JERR_JCNTL_RECOVERJFULL: Journal > data files full, cannot write. (MessageStoreImpl.cpp:785) This is expected. It is not possible to restart a full broker; it will recover into a full state which is just as broken as prior to the shutdown. At some future point, I will add auto-expansion to the journal, and when this happens, it may be possible to expand a recovered journal such that dequeues (and enqueues if there is enough space) are possible on recover. > > My client jars are qpid-client-M5.1.jar and qpid-common-M5.1.jar. > > Thanks in advance to anyone who can help. Questions are listed above. > > Sandy > Thanks for bringing this up, I am interested to see how the enqueue capacity check is being evaded. It may be some odd sequence that occurs through JMS - but I fail to see how. Kim --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:[email protected]
