[ 
https://issues.apache.org/jira/browse/AMQ-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957177#comment-14957177
 ] 

Dhananjay Patkar commented on AMQ-4512:
---------------------------------------

Is there any valid work around for applications which are stick to 5.7.0?

I am using 5.7.0 client libraries, as I am tied to other dependencies like 
karaf, camel etc. 



> MemoryUsage waitForSpace in inconsistent state
> ----------------------------------------------
>
>                 Key: AMQ-4512
>                 URL: https://issues.apache.org/jira/browse/AMQ-4512
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.6.0, 5.7.0
>            Reporter: Sam hendley
>            Assignee: Timothy Bish
>             Fix For: 5.9.0
>
>         Attachments: AMQ4512Patch.txt, AMQ4512Patch.txt, MemUsage1.PNG, 
> MemUsage2.PNG, MemoryUsageTest.java, QueueStats.PNG
>
>
> There is a race condition in MemoryUsage which makes it possible for it to be 
> left in an inconsistent state and thereby hang any clients in waitForSpace().
> The core issue is in the following block of code:
> {code:java}
>     public void decreaseUsage(long value) {
>         if (value == 0) {
>             return;
>         }
>         int percentUsage;
>         synchronized (usageMutex) {
>             usage -= value;
>             percentUsage = caclPercentUsage();
>         }
>         setPercentUsage(percentUsage);
>         if (parent != null) {
>             parent.decreaseUsage(value);
>         }
>     }
> {code}
> The bug occurs when multiple threads are calling increment/decrement at same 
> time. Since the field "usage" is protected with the usageMutex each 
> writer/reader will see the correct and current value for usage and calculate 
> the right value for percentUsage at that instant. "setPercentUsage" is also   
> protected by the same usageMutex so we resyncronize on usageMutex to set the 
> percentUsage field as well. The issue is that threads may enter the 
> setPercentUsage synchronized block in a different order than they entered the 
> percentUsage "calculating" block. Since percentUsage is carried between the 
> two blocks, a reordering of threads can allow the wrong final percentUsage 
> value to be set. 
> Possible threading (imagine usage starts at 0 and limit is 100).
> Thread #1 - usage += 150; percentUsage = 150;
> Thread #1 - suspended before called setPercentUsage
> Thread #2 - usage -= 150; percentUsage = 0;
> Thread #2 - setPercentUsage(0);
> Thread #1 - resumed, can now call setPercentUsage
> Thread #1 - setPercentUsage(150);
> Final value = 150
> This same pattern of synchronizing to calculate the percentUsage and then 
> setting the value later is repeated in all of the Usage objects I looked at. 
> My guess it was written this way to avoid holding locks while making calls 
> out to "untrusted code" but it is a very dangerous way to do the 
> calculations. The most surprising thing is the locks are still being 
> explicitly held while calling fireEvent anyways.
> I have attached two screenshots taken using a debugger of two threads that 
> have both been stalled for multiple minutes on "waitForSpace" trying to 
> publish to the same queue. Notice they both have a "usage" of 0 but a 
> "percentUsage" > 100, this should be impossible. To get the system into this 
> state I was using JmsTemplate and a CachingConnectionFactory to publish on 8 
> threads and a single DefaultMessageListenerContainer who is pulling the 
> messages off as fast as possible. The test publishes 100000 measurements and 
> around ~75% of the time atleast a few producers end up stalled in 
> waitForSpace() even though the queue is ready for more messages. I can also 
> reproduce these results using JmsTemplate and PooledConnectionFactory so I 
> don't believe it's an issue in the pooling implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to