Re: High (Max) CPU

Rob Shepherd Fri, 02 Nov 2018 12:18:02 -0700

Thank you for the detailed information on these values and for the use case
analysis.


As a newcomer to geode i’d Suggest copying the information in these
responses into the docs and changing the example timeout to something other
than 0

Thanks again

Rob



On Fri, 2 Nov 2018 at 18:32, Udo Kohlmeyer <[email protected]> wrote:

> Hi there Rob,
>
> Great to see that you found one of the problems. So essentially you told
> the async-queue to try and send whatever it had in it's queue every 0ms...
> which means, it will just spin, wanting to send.
>
> `batch-time-interval` and `batch-size` are there to determine WHEN message
> batches get to sent. So, with a `batch-size=1` would also cause this queue
> to always fire, when there is 1 message in the queue.  I usually treat the
> settings like, "send a batch of xxx messages when the limit is reached,
> otherwise wait yy-millis to send what is available"
>
> Usually WAN Replication and Async-Queue (which share a common hierarchy)
> are async, fault-tolerant, batch-oriented mechanisms. To use them at the
> granular level of every 0ms or 1 entry in the batch is a lot of overhead,
> in terms of effort to keep the primary and backup queues in sync.
>
> Also, keep in mind, sending every message you receive might not be
> beneficial either. (think financial rates). If one receives the data at a
> high frequency, it is always best to ask oneself, can the
> process/downstream system process and potentially respond to a change
> BEFORE the next rate is delivered. So in some cases it makes sense to even
> have batch-conflation turned on, to avoid sending messages that could
> essentially be ignored.
>
> But if the requirement is to send (and react to) every message, then these
> two parameters are something I would test with to find optimal send sizes
> and timeout. ALSO, take into account here, network send buffer sizes, can
> play a role on performance, so take that into account sizing batches and
> configuring buffer sizes.
>
> --Udo
>
> On 11/2/18 09:56, Rob Shepherd wrote:
>
> Thank you Nabarun,
>
> Having started to pull out my config to send to you, I noticed the
> following in my cache.xml:
>
> <async-event-queue
>     id="expiry-event-queue"
>     parallel="false"
>     enable-batch-conflation="false"
>     disk-synchronous="true"
>     forward-expiration-destroy="true"
>         —-> batch-time-interval=“0"
>     batch-size="1"
>     >...
>
> …Which is a copy from the example here:
> https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
>
> create async-event-queue --id=example-async-event-queue \
>   --parallel=false \
>   --enable-batch-conflation=false \
>   --batch-size=1 \
>   --batch-time-interval=0 \
>   --listener=org.apache.geode_examples.async.ExampleAsyncEventListener
>
>
>
> I pondered on “batch-time-interval” and set it to 1000 and it has fixed
> the issue.
>
> I think I understand what this parameter is for and so a delay here would
> be tolerable.
>
>
>
> I have another question, if I set parallel=“true” the gfsh start server
> command hangs and I have to kill the new server process and the gfsh
> launcher.
>
> it is not important to me now, but i would like to evaluate this at some
> point and so i’ll happily try and debug the cause.
>
> Thanks
>
> Rob
>
>
>
>
> On 2 Nov 2018, at 16:22, Nabarun Nag <[email protected]> wrote:
>
> Hi Rob,
>
> We will look into this, meanwhile could you please elaborate on what
> configuration is Apache Geode running, like how many servers, how many AEQs
> regions etc, what workload is it running.
>
> Thank you
> Nabarun Nag
>
> On Nov 2, 2018, at 8:37 AM, Rob Shepherd <[email protected]> wrote:
>
> Hi,
>
> I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)
>
> On both I’m seeing maxed out CPUs.
>
> I’ve profiled it locally on a dormant server instance (no application
> activity) and the Async Queue routines are the highest contributor to CPU
> activity by a long stretch.
>
> <PastedGraphic-1.png>
>
> Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU)
> Samples
>
> org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
>
> 100.00% 2829377292 2829377292 8639
>
> .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
>
> 0.00% 0 0 8639
>
> ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
>
> 0.00% 0 0 8263
>
> ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
>
> 1.21% 9906300 9906300 8180
>
> ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
>
> 100.00% 1144915695 1144915695 5
>
> .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
>
> 0.00% 0 0 5
>
>
> How can I determine if this is a problem with my setup or if it is a bug?
>
> A supposition:  I notice that there are multiple instances of a thread
> named after my Async Event queue ID
>
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
> Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
>
> Are there supposed to be 4?  are they interfering with each other
> (wait/notify) on an empty queue?
>
> Thanks for any insight
>
> Rob
>
>
>
>
> --
Rob Shepherd BEng PhD

Re: High (Max) CPU

Reply via email to