[ https://issues.apache.org/jira/browse/AMQ-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957177#comment-14957177 ]
Dhananjay Patkar commented on AMQ-4512: --------------------------------------- Is there any valid work around for applications which are stick to 5.7.0? I am using 5.7.0 client libraries, as I am tied to other dependencies like karaf, camel etc. > MemoryUsage waitForSpace in inconsistent state > ---------------------------------------------- > > Key: AMQ-4512 > URL: https://issues.apache.org/jira/browse/AMQ-4512 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.6.0, 5.7.0 > Reporter: Sam hendley > Assignee: Timothy Bish > Fix For: 5.9.0 > > Attachments: AMQ4512Patch.txt, AMQ4512Patch.txt, MemUsage1.PNG, > MemUsage2.PNG, MemoryUsageTest.java, QueueStats.PNG > > > There is a race condition in MemoryUsage which makes it possible for it to be > left in an inconsistent state and thereby hang any clients in waitForSpace(). > The core issue is in the following block of code: > {code:java} > public void decreaseUsage(long value) { > if (value == 0) { > return; > } > int percentUsage; > synchronized (usageMutex) { > usage -= value; > percentUsage = caclPercentUsage(); > } > setPercentUsage(percentUsage); > if (parent != null) { > parent.decreaseUsage(value); > } > } > {code} > The bug occurs when multiple threads are calling increment/decrement at same > time. Since the field "usage" is protected with the usageMutex each > writer/reader will see the correct and current value for usage and calculate > the right value for percentUsage at that instant. "setPercentUsage" is also > protected by the same usageMutex so we resyncronize on usageMutex to set the > percentUsage field as well. The issue is that threads may enter the > setPercentUsage synchronized block in a different order than they entered the > percentUsage "calculating" block. Since percentUsage is carried between the > two blocks, a reordering of threads can allow the wrong final percentUsage > value to be set. > Possible threading (imagine usage starts at 0 and limit is 100). > Thread #1 - usage += 150; percentUsage = 150; > Thread #1 - suspended before called setPercentUsage > Thread #2 - usage -= 150; percentUsage = 0; > Thread #2 - setPercentUsage(0); > Thread #1 - resumed, can now call setPercentUsage > Thread #1 - setPercentUsage(150); > Final value = 150 > This same pattern of synchronizing to calculate the percentUsage and then > setting the value later is repeated in all of the Usage objects I looked at. > My guess it was written this way to avoid holding locks while making calls > out to "untrusted code" but it is a very dangerous way to do the > calculations. The most surprising thing is the locks are still being > explicitly held while calling fireEvent anyways. > I have attached two screenshots taken using a debugger of two threads that > have both been stalled for multiple minutes on "waitForSpace" trying to > publish to the same queue. Notice they both have a "usage" of 0 but a > "percentUsage" > 100, this should be impossible. To get the system into this > state I was using JmsTemplate and a CachingConnectionFactory to publish on 8 > threads and a single DefaultMessageListenerContainer who is pulling the > messages off as fast as possible. The test publishes 100000 measurements and > around ~75% of the time atleast a few producers end up stalled in > waitForSpace() even though the queue is ready for more messages. I can also > reproduce these results using JmsTemplate and PooledConnectionFactory so I > don't believe it's an issue in the pooling implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)