Re: Java broker OOM due to DirectMemory

Oleksandr Rudyy Mon, 08 May 2017 02:16:13 -0700

Hi Ramayan,

Thanks for testing the patch and providing a feedback.


Regarding direct memory utilization, the Qpid Broker caches up to 256MB of
direct memory internally in QpidByteBuffers. Thus, when testing the Broker
with only 256MB of direct memory, the entire direct memory could be cached
and it would look as if direct memory is never released. Potentially, you
can reduce the number of buffers cached on broker by changing context
variable 'broker.directByteBufferPoolSize'. By default, it is set to 1000.
With buffer size of 256K, it would give ~256M of cache.

Regarding introducing lower and upper thresholds for 'flow to disk'. It
seems like a good idea and we will try to implement it early this week on
trunk first.

Kind Regards,
Alex


On 5 May 2017 at 23:49, Ramayan Tiwari <ramayan.tiw...@gmail.com> wrote:

> Hi Alex,
>
> Thanks for providing the patch. I verified the fix with same perf test, and
> it does prevent broker from going OOM, however. DM utilization doesn't get
> any better after hitting the threshold (where flow to disk is activated
> based on total used % across broker - graph in the link below).
>
> After hitting the final threshold, flow to disk activates and deactivates
> pretty frequently across all the queues. The reason seems to be because
> there is only one threshold currently to trigger flow to disk. Would it
> make sense to break this down to high and low threshold - so that once flow
> to disk is active after hitting high threshold, it will be active until the
> queue utilization (or broker DM allocation) reaches the low threshold.
>
> Graph and flow to disk logs are here:
> https://docs.google.com/document/d/1Wc1e-id-WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> U-RiM/edit#heading=h.6400pltvjhy7
>
> Thanks
> Ramayan
>
> On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <oru...@gmail.com> wrote:
>
> > Hi Ramayan,
> >
> > We attached to the QPID-7753 a patch with a work around for 6.0.x branch.
> > It triggers flow to disk based on direct memory consumption rather than
> > estimation of the space occupied by the message content. The flow to disk
> > should evacuate message content preventing running out of direct memory.
> We
> > already committed the changes into 6.0.x and 6.1.x branches. It will be
> > included into upcoming 6.0.7 and 6.1.3 releases.
> >
> > Please try and test the patch in your environment.
> >
> > We are still working at finishing of the fix for trunk.
> >
> > Kind Regards,
> > Alex
> >
> > On 30 April 2017 at 15:45, Lorenz Quack <quack.lor...@gmail.com> wrote:
> >
> > > Hi Ramayan,
> > >
> > > The high-level plan is currently as follows:
> > >  1) Periodically try to compact sparse direct memory buffers.
> > >  2) Increase accuracy of messages' direct memory usage estimation to
> more
> > > reliably trigger flow to disk.
> > >  3) Add an additional flow to disk trigger based on the amount of
> > allocated
> > > direct memory.
> > >
> > > A little bit more details:
> > >  1) We plan on periodically checking the amount of direct memory usage
> > and
> > > if it is above a
> > >     threshold (50%) we compare the sum of all queue sizes with the
> amount
> > > of allocated direct memory.
> > >     If the ratio falls below a certain threshold we trigger a
> compaction
> > > task which goes through all queues
> > >     and copy's a certain amount of old message buffers into new ones
> > > thereby freeing the old buffers so
> > >     that they can be returned to the buffer pool and be reused.
> > >
> > >  2) Currently we trigger flow to disk based on an estimate of how much
> > > memory the messages on the
> > >     queues consume. We had to use estimates because we did not have
> > > accurate size numbers for
> > >     message headers. By having accurate size information for message
> > > headers we can more reliably
> > >     enforce queue memory limits.
> > >
> > >  3) The flow to disk trigger based on message size had another problem
> > > which is more pertinent to the
> > >     current issue. We only considered the size of the messages and not
> > how
> > > much memory we allocate
> > >     to store those messages. In the FIFO use case those numbers will be
> > > very close to each other but in
> > >     use cases like yours we can end up with sparse buffers and the
> > numbers
> > > will diverge. Because of this
> > >     divergence we do not trigger flow to disk in time and the broker
> can
> > go
> > > OOM.
> > >     To fix the issue we want to add an additional flow to disk trigger
> > > based on the amount of allocated direct
> > >     memory. This should prevent the broker from going OOM even if the
> > > compaction strategy outlined above
> > >     should fail for some reason (e.g., the compaction task cannot keep
> up
> > > with the arrival of new messages).
> > >
> > > Currently, there are patches for the above points but they suffer from
> > some
> > > thread-safety issues that need to be addressed.
> > >
> > > I hope this description helps. Any feedback is, as always, welcome.
> > >
> > > Kind regards,
> > > Lorenz
> > >
> > >
> > >
> > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
> > ramayan.tiw...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Lorenz,
> > > >
> > > > Thanks so much for the patch. We have a perf test now to reproduce
> this
> > > > issue, so we did test with 256KB, 64KB and 4KB network byte buffer.
> > None
> > > of
> > > > these configurations help with the issue (or give any more breathing
> > > room)
> > > > for our use case. We would like to share the perf analysis with the
> > > > community:
> > > >
> > > > https://docs.google.com/document/d/1Wc1e-id-
> > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> > > > U-RiM/edit?usp=sharing
> > > >
> > > > Feel free to comment on the doc if certain details are incorrect or
> if
> > > > there are questions.
> > > >
> > > > Since the short term solution doesn't help us, we are very interested
> > in
> > > > getting some details on how the community plans to address this, a
> high
> > > > level description of the approach will be very helpful for us in
> order
> > to
> > > > brainstorm our use cases along with this solution.
> > > >
> > > > - Ramayan
> > > >
> > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <
> quack.lor...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Ramayan,
> > > > >
> > > > > We are still working on a fix for this issue.
> > > > > In the mean time we had an idea to potentially workaround the issue
> > > until
> > > > > a proper fix is released.
> > > > >
> > > > > The idea is to decrease the qpid network buffer size the broker
> uses.
> > > > > While this still allows for sparsely populated buffers it would
> > improve
> > > > > the overall occupancy ratio.
> > > > >
> > > > > Here are the steps to follow:
> > > > >  * ensure you are not using TLS
> > > > >  * apply the attached patch
> > > > >  * figure out the size of the largest messages you are sending
> > > (including
> > > > > header and some overhead)
> > > > >  * set the context variable "qpid.broker.networkBufferSize" to
> that
> > > > value
> > > > > but not smaller than 4096
> > > > >  * test
> > > > >
> > > > > Decreasing the qpid network buffer size automatically limits the
> > > maximum
> > > > > AMQP frame size.
> > > > > Since you are using a very old client we are not sure how well it
> > copes
> > > > > with small frame sizes where it has to split a message across
> > multiple
> > > > > frames.
> > > > > Therefore, to play it safe you should not set it smaller than the
> > > largest
> > > > > messages (+ header + overhead) you are sending.
> > > > > I do not know what message sizes you are sending but AMQP imposes
> the
> > > > > restriction that the framesize cannot be smaller than 4096 bytes.
> > > > > In the qpid broker the default currently is 256 kB.
> > > > >
> > > > > In the current state the broker does not allow setting the network
> > > buffer
> > > > > to values smaller than 64 kB to allow TLS frames to fit into one
> > > network
> > > > > buffer.
> > > > > I attached a patch to this mail that lowers that restriction to the
> > > limit
> > > > > imposed by AMQP (4096 Bytes).
> > > > > Obviously, you should not use this when using TLS.
> > > > >
> > > > >
> > > > > I hope this reduces the problems you are currently facing until we
> > can
> > > > > complete the proper fix.
> > > > >
> > > > > Kind regards,
> > > > > Lorenz
> > > > >
> > > > >
> > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
> > > > > > Thanks so much Keith and the team for finding the root cause. We
> > are
> > > so
> > > > > > relieved that we fix the root cause shortly.
> > > > > >
> > > > > > Couple of things that I forgot to mention on the mitigation steps
> > we
> > > > took
> > > > > > in the last incident:
> > > > > > 1) We triggered GC from JMX bean multiple times, it did not help
> in
> > > > > > reducing DM allocated.
> > > > > > 2) We also killed all the AMQP connections to the broker when DM
> > was
> > > at
> > > > > > 80%. This did not help either. The way we killed connections -
> > using
> > > > JMX
> > > > > > got list of all the open AMQP connections and called close from
> JMX
> > > > > mbean.
> > > > > >
> > > > > > I am hoping the above two are not related to root cause, but
> wanted
> > > to
> > > > > > bring it up in case this is relevant.
> > > > > >
> > > > > > Thanks
> > > > > > Ramayan
> > > > > >
> > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <keith.w...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > >
> > > > > > > Hello Ramayan
> > > > > > >
> > > > > > > I believe I understand the root cause of the problem.  We have
> > > > > > > identified a flaw in the direct memory buffer management
> employed
> > > by
> > > > > > > Qpid Broker J which for some messaging use-cases can lead to
> the
> > > OOM
> > > > > > > direct you describe.   For the issue to manifest the producing
> > > > > > > application needs to use a single connection for the production
> > of
> > > > > > > messages some of which are short-lived (i.e. are consumed
> > quickly)
> > > > > > > whilst others remain on the queue for some time.  Priority
> > queues,
> > > > > > > sorted queues and consumers utilising selectors that result in
> > some
> > > > > > > messages being left of the queue could all produce this patten.
> > > The
> > > > > > > pattern leads to a sparsely occupied 256K net buffers which
> > cannot
> > > be
> > > > > > > released or reused until every message that reference a 'chunk'
> > of
> > > it
> > > > > > > is either consumed or flown to disk.   The problem was
> introduced
> > > > with
> > > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
> > > > > > >
> > > > > > > The flow to disk feature is not helping us here because its
> > > algorithm
> > > > > > > considers only the size of live messages on the queues. If the
> > > > > > > accumulative live size does not exceed the threshold, the
> > messages
> > > > > > > aren't flown to disk. I speculate that when you observed that
> > > moving
> > > > > > > messages cause direct message usage to drop earlier today, your
> > > > > > > message movement cause a queue to go over threshold, cause
> > message
> > > to
> > > > > > > be flown to disk and their direct memory references released.
> > The
> > > > > > > logs will confirm this is so.
> > > > > > >
> > > > > > > I have not identified an easy workaround at the moment.
> > >  Decreasing
> > > > > > > the flow to disk threshold and/or increasing available direct
> > > memory
> > > > > > > should alleviate and may be an acceptable short term
> workaround.
> > > If
> > > > > > > it were possible for publishing application to publish short
> > lived
> > > > and
> > > > > > > long lived messages on two separate JMS connections this would
> > > avoid
> > > > > > > this defect.
> > > > > > >
> > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related this
> > > problem.
> > > > > > > We intend to be working on these early next week and will be
> > aiming
> > > > > > > for a fix that is back-portable to 6.0.
> > > > > > >
> > > > > > > Apologies that you have run into this defect and thanks for
> > > > reporting.
> > > > > > >
> > > > > > > Thanks, Keith
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
> > > ramayan.tiw...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > We have been monitoring the brokers everyday and today we
> found
> > > one
> > > > > > > instance
> > > > > > > >
> > > > > > > > where broker’s DM was constantly going up and was about to
> > crash,
> > > > so
> > > > > we
> > > > > > > > experimented some mitigations, one of which caused the DM to
> > come
> > > > > down.
> > > > > > > > Following are the details, which might help us understanding
> > the
> > > > > issue:
> > > > > > > >
> > > > > > > > Traffic scenario:
> > > > > > > >
> > > > > > > > DM allocation had been constantly going up and was at 90%.
> > There
> > > > > were two
> > > > > > > > queues which seemed to align with the theories that we had.
> > Q1’s
> > > > > size had
> > > > > > > > been large right after the broker start and had slow
> > consumption
> > > of
> > > > > > > > messages, queue size only reduced from 76MB to 75MB over a
> > period
> > > > of
> > > > > > > 6hrs.
> > > > > > > >
> > > > > > > > Q2 on the other hand, started small and was gradually
> growing,
> > > > queue
> > > > > size
> > > > > > > > went from 7MB to 10MB in 6hrs. There were other queues with
> > > traffic
> > > > > > > during
> > > > > > > >
> > > > > > > > this time.
> > > > > > > >
> > > > > > > > Action taken:
> > > > > > > >
> > > > > > > > Moved all the messages from Q2 (since this was our original
> > > theory)
> > > > > to Q3
> > > > > > > > (already created but no messages in it). This did not help
> with
> > > the
> > > > > DM
> > > > > > > > growing up.
> > > > > > > > Moved all the messages from Q1 to Q4 (already created but no
> > > > > messages in
> > > > > > > > it). This reduced DM allocation from 93% to 31%.
> > > > > > > >
> > > > > > > > We have the heap dump and thread dump from when broker was
> 90%
> > in
> > > > DM
> > > > > > > > allocation. We are going to analyze that to see if we can get
> > > some
> > > > > clue.
> > > > > > > We
> > > > > > > >
> > > > > > > > wanted to share this new information which might help in
> > > reasoning
> > > > > about
> > > > > > > the
> > > > > > > >
> > > > > > > > memory issue.
> > > > > > > >
> > > > > > > > - Ramayan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
> > > > > > > ramayan.tiw...@gmail.com>
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Keith,
> > > > > > > > >
> > > > > > > > > Thanks so much for your response and digging into the
> issue.
> > > > Below
> > > > > are
> > > > > > > the
> > > > > > > >
> > > > > > > > >
> > > > > > > > > answer to your questions:
> > > > > > > > >
> > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't use
> > 6.1
> > > > > where it
> > > > > > > > > was released because we need JMX support. Here is the
> > > destination
> > > > > > > format:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > ""%s ; {node : { type : queue }, link : { x-subscribes : {
> > > > > arguments : {
> > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
> > > > > > > > >
> > > > > > > > > 2) Our machines have 40 cores, which will make the number
> of
> > > > > threads to
> > > > > > > > > 80. This might not be an issue, because this will show up
> in
> > > the
> > > > > > > baseline DM
> > > > > > > >
> > > > > > > > >
> > > > > > > > > allocated, which is only 6% (of 4GB) when we just bring up
> > the
> > > > > broker.
> > > > > > > > >
> > > > > > > > > 3) The only setting that we tuned WRT to DM is
> > > > flowToDiskThreshold,
> > > > > > > which
> > > > > > > >
> > > > > > > > >
> > > > > > > > > is set at 80% now.
> > > > > > > > >
> > > > > > > > > 4) Only one virtual host in the broker.
> > > > > > > > >
> > > > > > > > > 5) Most of our queues (99%) are priority, we also have 8-10
> > > > sorted
> > > > > > > queues.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 6) Yeah we are using the standard 0.16 client and not AMQP
> > 1.0
> > > > > clients.
> > > > > > > > > The connection log line looks like:
> > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) : Protocol
> > > Version
> > > > :
> > > > > 0-10
> > > > > > > :
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Client ID : test : Client Version : 0.16 : Client Product :
> > > qpid
> > > > > > > > >
> > > > > > > > > We had another broker crashed about an hour back, we do see
> > the
> > > > > same
> > > > > > > > > patterns:
> > > > > > > > > 1) There is a queue which is constantly growing, enqueue is
> > > > faster
> > > > > than
> > > > > > > > > dequeue on that queue for a long period of time.
> > > > > > > > > 2) Flow to disk didn't kick in at all.
> > > > > > > > >
> > > > > > > > > This graph shows memory growth (red line - heap, blue - DM
> > > > > allocated,
> > > > > > > > > yellow - DM used)
> > > > > > > > >
> > > > > > > > > https://drive.google.com/file/d/
> > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
> > > > > > > view?usp=sharing
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The below graph shows growth on a single queue (there are
> > 10-12
> > > > > other
> > > > > > > > > queues with traffic as well, something large size than this
> > > > queue):
> > > > > > > > >
> > > > > > > > > https://drive.google.com/file/d/
> > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
> > > > > > > view?usp=sharing
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Couple of questions:
> > > > > > > > > 1) Is there any developer level doc/design spec on how Qpid
> > > uses
> > > > > DM?
> > > > > > > > > 2) We are not getting heap dumps automatically when broker
> > > > crashes
> > > > > due
> > > > > > > to
> > > > > > > >
> > > > > > > > >
> > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has anyone
> > > found a
> > > > > way
> > > > > > > to get
> > > > > > > >
> > > > > > > > >
> > > > > > > > > around this problem?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Ramayan
> > > > > > > > >
> > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
> > keith.w...@gmail.com
> > > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Ramayan
> > > > > > > > > >
> > > > > > > > > > We have been discussing your problem here and have a
> couple
> > > of
> > > > > > > questions.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I have been experimenting with use-cases based on your
> > > > > descriptions
> > > > > > > > > > above, but so far, have been unsuccessful in reproducing
> a
> > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
> > > condition.
> > > > > The
> > > > > > > > > > direct memory usage reflects the expected model: it
> levels
> > > off
> > > > > when
> > > > > > > > > > the flow to disk threshold is reached and direct memory
> is
> > > > > release as
> > > > > > > > > > messages are consumed until the minimum size for caching
> of
> > > > > direct is
> > > > > > > > > > reached.
> > > > > > > > > >
> > > > > > > > > > 1] For clarity let me check: we believe when you say
> "patch
> > > to
> > > > > use
> > > > > > > > > > MultiQueueConsumer" you are referring to the patch
> attached
> > > to
> > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the
> broker"
> > > > and
> > > > > you
> > > > > > > > > > are using a combination of this "x-pull-only"  with the
> > > > standard
> > > > > > > > > > "x-multiqueue" feature.  Is this correct?
> > > > > > > > > >
> > > > > > > > > > 2] One idea we had here relates to the size of the
> > > virtualhost
> > > > IO
> > > > > > > > > > pool.   As you know from the documentation, the Broker
> > > > > caches/reuses
> > > > > > > > > > direct memory internally but the documentation fails to
> > > > mentions
> > > > > that
> > > > > > > > > > each pooled virtualhost IO thread also grabs a chunk
> (256K)
> > > of
> > > > > direct
> > > > > > > > > > memory from this cache.  By default the virtual host IO
> > pool
> > > is
> > > > > sized
> > > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors() * 2,
> > > 64),
> > > > > so if
> > > > > > > > > > you have a machine with a very large number of cores, you
> > may
> > > > > have a
> > > > > > > > > > surprising large amount of direct memory assigned to
> > > > virtualhost
> > > > > IO
> > > > > > > > > > threads.   Check the value of connectionThreadPoolSize on
> > the
> > > > > > > > > > virtualhost
> > > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
> > > > > virtualhostnodename>/<;
> > > > > > > virtualhostname>)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > to see what value is in force.  What is it?  It is
> possible
> > > to
> > > > > tune
> > > > > > > > > > the pool size using context variable
> > > > > > > > > > virtualhost.connectionThreadPool.size.
> > > > > > > > > >
> > > > > > > > > > 3] Tell me if you are tuning the Broker in way beyond the
> > > > > direct/heap
> > > > > > > > > > memory settings you have told us about already.  For
> > instance
> > > > > you are
> > > > > > > > > > changing any of the direct memory pooling settings
> > > > > > > > > > broker.directByteBufferPoolSize, default network buffer
> > size
> > > > > > > > > > qpid.broker.networkBufferSize or applying any other
> > > > non-standard
> > > > > > > > > > settings?
> > > > > > > > > >
> > > > > > > > > > 4] How many virtual hosts do you have on the Broker?
> > > > > > > > > >
> > > > > > > > > > 5] What is the consumption pattern of the messages?  Do
> > > consume
> > > > > in a
> > > > > > > > > > strictly FIFO fashion or are you making use of message
> > > > selectors
> > > > > > > > > > or/and any of the out-of-order queue types (LVQs,
> priority
> > > > queue
> > > > > or
> > > > > > > > > > sorted queues)?
> > > > > > > > > >
> > > > > > > > > > 6] Is it just the 0.16 client involved in the
> application?
> > > >  Can
> > > > > I
> > > > > > > > > > check that you are not using any of the AMQP 1.0 clients
> > > > > > > > > > (org,apache.qpid:qpid-jms-client or
> > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the software
> > stack
> > > > (as
> > > > > either
> > > > > > > > > > consumers or producers)
> > > > > > > > > >
> > > > > > > > > > Hopefully the answers to these questions will get us
> closer
> > > to
> > > > a
> > > > > > > > > > reproduction.   If you are able to reliable reproduce it,
> > > > please
> > > > > share
> > > > > > > > > > the steps with us.
> > > > > > > > > >
> > > > > > > > > > Kind regards, Keith.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
> > > > > ramayan.tiw...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > After a lot of log mining, we might have a way to
> explain
> > > the
> > > > > > > sustained
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > increased in DirectMemory allocation, the correlation
> > seems
> > > > to
> > > > > be
> > > > > > > with
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > the
> > > > > > > > > > > growth in the size of a Queue that is getting consumed
> > but
> > > at
> > > > > a much
> > > > > > > > > > > slower
> > > > > > > > > > > rate than producers putting messages on this queue.
> > > > > > > > > > >
> > > > > > > > > > > The pattern we see is that in each instance of broker
> > > crash,
> > > > > there is
> > > > > > > > > > > at
> > > > > > > > > > > least one queue (usually 1 queue) whose size kept
> growing
> > > > > steadily.
> > > > > > > > > > > It’d be
> > > > > > > > > > > of significant size but not the largest queue --
> usually
> > > > there
> > > > > are
> > > > > > > > > > > multiple
> > > > > > > > > > > larger queues -- but it was different from other queues
> > in
> > > > > that its
> > > > > > > > > > > size
> > > > > > > > > > > was growing steadily. The queue would also be moving,
> but
> > > its
> > > > > > > > > > > processing
> > > > > > > > > > > rate was not keeping up with the enqueue rate.
> > > > > > > > > > >
> > > > > > > > > > > Our theory that might be totally wrong: If a queue is
> > > moving
> > > > > the
> > > > > > > entire
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > time, maybe then the broker would keep reusing the same
> > > > buffer
> > > > > in
> > > > > > > > > > > direct
> > > > > > > > > > > memory for the queue, and keep on adding onto it at the
> > end
> > > > to
> > > > > > > > > > > accommodate
> > > > > > > > > > > new messages. But because it’s active all the time and
> > > we’re
> > > > > pointing
> > > > > > > > > > > to
> > > > > > > > > > > the same buffer, space allocated for messages at the
> head
> > > of
> > > > > the
> > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long after
> those
> > > > > messages
> > > > > > > have
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > been processed. Just a theory.
> > > > > > > > > > >
> > > > > > > > > > > We are also trying to reproduce this using some perf
> > tests
> > > to
> > > > > enqueue
> > > > > > > > > > > with
> > > > > > > > > > > same pattern, will update with the findings.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Ramayan
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
> > > > > > > > > > > <ramayan.tiw...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Another issue that we noticed is when broker goes OOM
> > due
> > > > to
> > > > > direct
> > > > > > > > > > > > memory, it doesn't create heap dump (specified by
> > "-XX:+
> > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM error
> > is
> > > > > same as
> > > > > > > what
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > is
> > > > > > > > > > > > mentioned in the oracle JVM docs
> > > > > ("java.lang.OutOfMemoryError").
> > > > > > > > > > > >
> > > > > > > > > > > > Has anyone been able to find a way to get to heap
> dump
> > > for
> > > > > DM OOM?
> > > > > > > > > > > >
> > > > > > > > > > > > - Ramayan
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari
> > > > > > > > > > > > <ramayan.tiw...@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Below are the flow to disk logs from broker having
> > > > > 3million+
> > > > > > > messages
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > at
> > > > > > > > > > > > > this time. We only have one virtual host. Time is
> in
> > > GMT.
> > > > > Looks
> > > > > > > like
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > flow
> > > > > > > > > > > > > to disk is active on the whole virtual host and
> not a
> > > > > queue level.
> > > > > > > > > > > > >
> > > > > > > > > > > > > When the same broker went OOM yesterday, I did not
> > see
> > > > any
> > > > > flow to
> > > > > > > > > > > > > disk
> > > > > > > > > > > > > logs from when it was started until it crashed
> > (crashed
> > > > > twice
> > > > > > > within
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4hrs).
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3356539KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3354866KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3358509KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3353501KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3357544KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3353236KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3356704KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3353511KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3357948KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3355310KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3365624KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3355136KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3358683KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > After production release (2days back), we have
> seen 4
> > > > > crashes in 3
> > > > > > > > > > > > > different brokers, this is the most pressing
> concern
> > > for
> > > > > us in
> > > > > > > > > > > > > decision if
> > > > > > > > > > > > > we should roll back to 0.32. Any help is greatly
> > > > > appreciated.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy <
> > > > > oru...@gmail.com
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ramayan,
> > > > > > > > > > > > > > Thanks for the details. I would like to clarify
> > > whether
> > > > > flow to
> > > > > > > disk
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > triggered today for 3 million messages?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The following logs are issued for flow to disk:
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > use
> > > > > > > > > > > > > > {0,number,#}KB
> > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > > memory use
> > > > > > > > > > > > > > {0,number,#}KB within threshold {1,number,#.##}KB
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
> > > > > > > ramayan.tiw...@gmail.com>
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Alex,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for your response, here are the details:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We use "direct" exchange, without persistence
> (we
> > > > > specify
> > > > > > > > > > > > > > NON_PERSISTENT
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > that while sending from client) and use BDB
> > store.
> > > We
> > > > > use JSON
> > > > > > > > > > > > > > > virtual
> > > > > > > > > > > > > > host
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > type. We are not using SSL.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > When the broker went OOM, we had around 1.3
> > million
> > > > > messages
> > > > > > > with
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100
> > > > > > > > > > > > > > bytes
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > average message size. Direct memory allocation
> > > (value
> > > > > read from
> > > > > > > > > > > > > > > MBean)
> > > > > > > > > > > > > > kept
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > going up, even though it wouldn't need more DM
> to
> > > > > store these
> > > > > > > many
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > messages. DM allocated persisted at 99% for
> > about 3
> > > > > and half
> > > > > > > hours
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > before
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > crashing.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Today, on the same broker we have 3 million
> > > messages
> > > > > (same
> > > > > > > message
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > size)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and DM allocated is only at 8%. This seems like
> > > there
> > > > > is some
> > > > > > > > > > > > > > > issue
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > de-allocation or a leak.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have uploaded the memory utilization graph
> > here:
> > > > > > > > > > > > > > > https://drive.google.com/file/d/
> > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
> > > > > > > > > > > > > > > view?usp=sharing
> > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM Used
> (sum
> > > of
> > > > > queue
> > > > > > > > > > > > > > > payload)
> > > > > > > > > > > > > > and Red
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > is heap usage.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr
> Rudyy
> > > > > > > > > > > > > > > <oru...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Ramayan,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Could please share with us the details of
> > > messaging
> > > > > use
> > > > > > > case(s)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > ended
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > up in OOM on broker side?
> > > > > > > > > > > > > > > > I would like to reproduce the issue on my
> local
> > > > > broker in
> > > > > > > order
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > fix
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I would appreciate if you could provide as
> much
> > > > > details as
> > > > > > > > > > > > > > > > possible,
> > > > > > > > > > > > > > > > including, messaging topology, message
> > > persistence
> > > > > type,
> > > > > > > message
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > sizes,volumes, etc.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for
> > keeping
> > > > > message
> > > > > > > content
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > receiving/sending data. Each plain connection
> > > > > utilizes 512K of
> > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of direct
> > > > > memory. Your
> > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > settings
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > look Ok to me.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan Tiwari
> > > > > > > > > > > > > > > > <ramayan.tiw...@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with patch
> to
> > > use
> > > > > > > > > > > > > > MultiQueueConsumer
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > feature. We just finished deploying to
> > > production
> > > > > and saw
> > > > > > > > > > > > > > > > > couple of
> > > > > > > > > > > > > > > > > instances of broker OOM due to running out
> of
> > > > > DirectMemory
> > > > > > > > > > > > > > > > > buffer
> > > > > > > > > > > > > > > > > (exceptions at the end of this email).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Here is our setup:
> > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g (this
> > is
> > > > > opposite of
> > > > > > > > > > > > > > > > > what the
> > > > > > > > > > > > > > > > > recommendation is, however, for our use
> cause
> > > > > message
> > > > > > > payload
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > really
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > small ~400bytes and is way less than the
> per
> > > > > message
> > > > > > > overhead
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > 1KB).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > perf testing, we were able to put 2 million
> > > > > messages without
> > > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > issues.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. ~400 connections to broker.
> > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions and
> there
> > is
> > > > > one multi
> > > > > > > > > > > > > > > > > queue
> > > > > > > > > > > > > > > consumer
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > attached to each session, listening to
> around
> > > > 1000
> > > > > queues.
> > > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I know).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > With the above setup, the baseline
> > utilization
> > > > > (without any
> > > > > > > > > > > > > > messages)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > direct memory was around 230mb (with 410
> > > > > connection each
> > > > > > > > > > > > > > > > > taking
> > > > > > > > > > > > > > 500KB).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Based on our understanding of broker memory
> > > > > allocation,
> > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > payload
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > should be the only thing adding to direct
> > > memory
> > > > > utilization
> > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > top of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > baseline), however, we are experiencing
> > > something
> > > > > completely
> > > > > > > > > > > > > > different.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > our last broker crash, we see that broker
> is
> > > > > constantly
> > > > > > > > > > > > > > > > > running
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 90%+
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > direct memory allocated, even when message
> > > > payload
> > > > > sum from
> > > > > > > > > > > > > > > > > all the
> > > > > > > > > > > > > > > > queues
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > is only 6-8% (these % are against available
> > DM
> > > of
> > > > > 4gb).
> > > > > > > During
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > these
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > high
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > DM usage period, heap usage was around 60%
> > (of
> > > > > 12gb).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We would like some help in understanding
> what
> > > > > could be the
> > > > > > > > > > > > > > > > > reason
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > high DM allocations. Are there things other
> > > than
> > > > > message
> > > > > > > > > > > > > > > > > payload
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > connection, which use DM and could be
> > > > contributing
> > > > > to these
> > > > > > > > > > > > > > > > > high
> > > > > > > > > > > > > > usage?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Another thing where we are puzzled is the
> > > > > de-allocation of
> > > > > > > DM
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > byte
> > > > > > > > > > > > > > > > buffers.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > From log mining of heap and DM utilization,
> > > > > de-allocation of
> > > > > > > > > > > > > > > > > DM
> > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > correlate with heap GC. If anyone has seen
> > any
> > > > > documentation
> > > > > > > > > > > > > > related to
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > this, it would be very helpful if you could
> > > share
> > > > > that.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > *Exceptions*
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > > memory
> > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> > Bits.java:658)
> > > > > > > ~[na:1.8.0_40]
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > > init>(DirectByteBuffer.java:
> > > > > > > 123)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > > allocateDirect(ByteBuffer.
> > > > > java:311)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnectionPlainD
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > elegate.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
> > > > > > > NonBlockingConnectionPlainDele
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > gate.java:93)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnectionPlainDele
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > gate.processData(
> > > NonBlockingConnectionPlainDele
> > > > > > > gate.java:60)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnection.doRead(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnection.doWork(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NetworkConnectionScheduler.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > processConnection(
> > NetworkConnectionScheduler.
> > > > > java:124)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > ConnectionPr
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ocessor.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > processConnection(SelectorThread.java:504)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > > > > > > > > > > > SelectionTask.performSelect(
> > > > > SelectorThread.java:337)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > SelectionTask.run(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > SelectorThread.java:87)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > *Second exception*
> > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > > memory
> > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> > Bits.java:658)
> > > > > > > ~[na:1.8.0_40]
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > > init>(DirectByteBuffer.java:
> > > > > > > 123)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > > allocateDirect(ByteBuffer.
> > > > > java:311)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnectionPlainDele
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > gate.<init>(NonBlockingConnectionPlainDele
> > > > > gate.java:45)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnection.
> > > > > > > > > > > > > > > > > setTransportEncryption(
> > > > NonBlockingConnection.java:
> > > > > 625)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnection.<init>(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingNetworkTransport.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > acceptSocketChannel(
> > > NonBlockingNetworkTransport.
> > > > > java:158)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > SelectionTas
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > k$1.run(
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > SelectorThread.java:191)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > ------------------------------
> > ------------------------------
> > > > > ---------
> > > > > > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.
> org
> > > > > > > > > > For additional commands, e-mail:
> > users-h...@qpid.apache.org
> > > > > > > > > >
> > > > >
> > > > >
> > > > > ------------------------------------------------------------
> > ---------
> > > > > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> > > > > For additional commands, e-mail: users-h...@qpid.apache.org
> > > > >
> > > >
> > >
> >
>

Re: Java broker OOM due to DirectMemory

Reply via email to