Re: Journal Full Error

Kim van der Riet Fri, 24 Jul 2009 12:13:43 -0700

On Fri, 2009-07-24 at 10:50 -0700, Sandy Pratt wrote:
> Hi,
> 
> I'm experimenting with the qpidd broker supplied with Fedora 10 (qpidd 
> (qpidc) version 0.5 - the install should be up to date as of yesterday).
> Basically, I'm having a problem where I've been able to enqueue enough 
> messages that my journal is full and I can't start the broker because of 
> this.  This is contrary to my understanding of how persistent queues were 
> supposed to work, in a number of ways:
> 
> 1) I thought producers were not allowed to commit messages past the 80% full 
> point.
Correct. You should see a Enqueue capacity threshold error message such
as:
2009-07-24 14:20:48 notice Journal "perftest0": Created
2009-07-24 14:23:31 warning Journal "perftest0": Enqueue capacity threshold 
exceeded on queue "perftest0".
2009-07-24 14:23:31 error Unexpected exception: Enqueue capacity threshold 
exceeded on queue "perftest0". (JournalImpl.cpp:577)
2009-07-24 14:23:31 error Connection 127.0.0.1:55390 closed by error: Enqueue 
capacity threshold exceeded on queue "perftest0". (JournalImpl.cpp:577)(501)


which closes the connection to the producer.

The journal is a circular disk buffer. Both enqueues and dequeues are
performed by writing records at the head of the journal. Thus a full
journal is a fatal condition because you can neither enqueue nor dequeue
(to make space). The 80% policy limits enqueues only; dequeues are not
subject to this limit as these operations inherently free up journal
space.

In practice, it may be possible to fill a journal by using tiny messages
(~10 bytes) to just below the enqueue threshold, then dequeueing all
these messages except the first. In this case the sheer number of
dequeues will fill the remaining approx. 20% of the journal while
keeping the first message in place as a blocker. One of the functional
tests uses this technique to test that a full journal occurs as
expected.

> 
> 2) I thought the flow-to-disk policy would result in excess messages being 
> accepted into a slower interim storage, not being rejected or causing the 
> broker to crash.
This is incorrect. However, staged messages (ie messages that exceed a
broker per-message size threshold) are stored outside the journal in
BDB.
> 
> 3) It appears as though I was able to crash the broker and cause the apparent 
> loss of acked messages simply by sending the broker too many?
I have not been able to reproduce your scenario. I have attempted to
create and set up the broker using your parameters, but using a c++
client rather than a Java one, I always hit the enqueue threshold first
(as expected) I have even turned on a transactional block size of 1
(whcih emulates JMS IIRC), but in all cases, I have not been able to
reproduce this.

I would be interested in the following if you are able to supply it:

1. A reproducer (ie a program that will cause this error) that I can
run, and/or

2. A copy of the journal files so I can take a look at the state of the
journals.

> 
> ...
> 
> Here are some details about what I was doing:
> 
> I created a queue with the following command:
> 
> qpid-config --durable add queue test3 --durable --limit-policy flow-to-disk
> 
> This created the following journal files:
> 
> [r...@prattrs-fedora test3]# ll
> total 12356
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:26 JournalData.0000.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:27 JournalData.0001.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0002.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0003.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:28 JournalData.0004.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:29 JournalData.0005.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:22 JournalData.0006.jdat
> -rw-r----- 1 qpidd qpidd 1573376 2009-07-23 17:22 JournalData.0007.jdat
> -rw-r----- 1 qpidd qpidd     866 2009-07-22 14:23 JournalData.jinf
> [r...@prattrs-fedora test3]# du -h
> 13M    .
> [r...@prattrs-fedora test3]#

This is a standard 12.5MiB journal.

> 
> I then did a test where I ran some producers in Java JMS for a while without 
> any consumers running:
> 
>                                                                               
>                                   TextMessage msg = 
> sendSession.createTextMessage(testContext.getPrefix()
>                                                                               
>                                                                   + 
> RandomUtils.randomAlphaNumericString(4085));
>                                                                               
>                                   producer.send(msg, DeliveryMode.PERSISTENT, 
> Message.DEFAULT_PRIORITY,
>                                                                               
>                                                                   
> Message.DEFAULT_TIME_TO_LIVE);
> 
> Unsurprisingly, my journal quickly filled up and the broker showed this in 
> logs:
> 
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Failed to 
> prepare: Journal full on queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 critical Journal 
> "test3": Journal full on queue "test3".
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Error 
> completing xid 
> yu\x00\x00\x00\x00\x00\x00\x97\xD6\xA9aP\x18IM\xB6\x91\xFA\xF1\xE6\x18}W: 
> Journal full on queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Commit 
> failed with exception: Journal full on queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Execution 
> exception: internal-error: Commit failed (qpid/broker/SemanticState.cpp:123)
> Jul 23 17:29:32 localhost qpidd[2550]: 2009-07-23 17:29:32 error Channel 
> exception: not-attached: Channel 0 is not attached 
> (qpid/amqp_0_10/SessionHandler.cpp:40)
> Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 critical Journal 
> "test3": Journal full on queue "test3".
> Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 error Unexpected 
> exception: Journal full on queue "test3". (JournalImpl.cpp:586)
> Jul 23 17:29:33 localhost qpidd[2550]: 2009-07-23 17:29:33 error Connection 
> 153.32.227.108:57837 closed by error: Journal full on queue "test3". 
> (JournalImpl.cpp:586)(501)
> 
> I can sort of understand this, although it means flow-to-disk is not 
> functioning for my queue, I suppose.  However, it turns out that I can't 
> restart the broker either:
> 
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Store 
> module initialized; dir=/var/lib/qpidd
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal 
> "test1": Created
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal 
> "test2": Created
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 warning Journal 
> "test2": Recovery found 12 files (different from --num-jfiles value of 8).
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 warning Journal 
> "test2": Recovery found file size = 12 (different from --jfile-size-pgs value 
> of 24).
> Jul 23 17:40:00 localhost qpidd[6303]: 2009-07-23 17:40:00 notice Journal 
> "test3": Created
> Jul 23 17:40:01 localhost qpidd[6303]: 2009-07-23 17:40:01 critical Daemon 
> startup failed: Queue test3: recoverQueues() failed: jexception 0x0205 
> jcntl::recover() threw JERR_JCNTL_RECOVERJFULL: Journal data files full, 
> cannot write. (MessageStoreImpl.cpp:785)
> Jul 23 17:40:01 localhost qpidd[6302]: 2009-07-23 17:40:01 critical Broker 
> start-up failed: Daemon startup failed: Queue test3: recoverQueues() failed: 
> jexception 0x0205 jcntl::recover() threw JERR_JCNTL_RECOVERJFULL: Journal 
> data files full, cannot write. (MessageStoreImpl.cpp:785)
This is expected. It is not possible to restart a full broker; it will
recover into a full state which is just as broken as prior to the
shutdown. At some future point, I will add auto-expansion to the
journal, and when this happens, it may be possible to expand a recovered
journal such that dequeues (and enqueues if there is enough space) are
possible on recover.

> 
> My client jars are qpid-client-M5.1.jar and qpid-common-M5.1.jar.
> 
> Thanks in advance to anyone who can help.   Questions are listed above.
> 
> Sandy
> 
Thanks for bringing this up, I am interested to see how the enqueue
capacity check is being evaded. It may be some odd sequence that occurs
through JMS - but I fail to see how.

Kim


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Re: Journal Full Error

Reply via email to