Hi

We use Apache Apollo 1.6 and STOMP for messaging in our production system. So 
far only one application uses this messaging system, but we plan to migrate 
more critical applications to this setup.

Setup / Configuration:
Approximately 2500 clients are permanently connected to the broker. Every 30 
minutes these clients send a STOMP message to a persistent queue on the broker, 
whose default virtual host is configured to use a leveldb store. There are two 
consumers subscribed to the queue, processing these messages. They have 
subscribed with the following headers:
credit:1,0
ack:client-individual

Observed behaviour:
Every once in a while (without an obvious reason) a message's body gets 
truncated somewhere within the broker and the message is then stored in the 
leveldb store. When this corrupted message reaches the top of the queue, the 
broker sends the message to one of the attached consumers (as one would expect 
it to do). Since the message body has been truncated, the content-length header 
does no longer match the body's length and, as a consequence, the consumer 
tries to read more octets than are being sent by the broker. The web interface 
of the Apollo broker, however, shows that there is one message in "Transfer" 
and that it is "Waiting On" the "consumer". Obviously, this is a deadlock 
situation, as the broker waits for the consumer to acknowledge the message 
while the consumer waits for the missing octets to arrive, resulting in no 
messages being consumed and processed anymore.

Debugging:
- As you have already noted, I was not able to figure out where exactly the 
corruption of the message happens.
- I was not able to trick the broker into accepting a corrupted STOMP frame and 
thus suspect the corruption to happen within the broker itself.
- Unfortunately, I was not able to reproduce the aforementioned behaviour in 
our testing environment either. (Note that I tried to reproduce that with only 
250 clients simultaneously connected, each of which sending 10 messages).
- No warnings nor errors were found in the log files.

Any insights or pointers on how to further debug/analyse this problem are 
greatly appreciated.


Thanks a lot for your help!
-Raphi


-- 
raphael seebacher
security engineer

open systems ag
raeffelstrasse 29
ch-8045 zurich
t: +41 58 100 10 10
f: +41 58 100 10 11

[email protected]

http://www.open.ch

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to