Hi there Rob,

Great to see that you found one of the problems. So essentially you told the async-queue to try and send whatever it had in it's queue every 0ms... which means, it will just spin, wanting to send.

`batch-time-interval` and `batch-size` are there to determine WHEN message batches get to sent. So, with a `batch-size=1` would also cause this queue to always fire, when there is 1 message in the queue.  I usually treat the settings like, "send a batch of xxx messages when the limit is reached, otherwise wait yy-millis to send what is available"

Usually WAN Replication and Async-Queue (which share a common hierarchy) are async, fault-tolerant, batch-oriented mechanisms. To use them at the granular level of every 0ms or 1 entry in the batch is a lot of overhead, in terms of effort to keep the primary and backup queues in sync.

Also, keep in mind, sending every message you receive might not be beneficial either. (think financial rates). If one receives the data at a high frequency, it is always best to ask oneself, can the process/downstream system process and potentially respond to a change BEFORE the next rate is delivered. So in some cases it makes sense to even have batch-conflation turned on, to avoid sending messages that could essentially be ignored.

But if the requirement is to send (and react to) every message, then these two parameters are something I would test with to find optimal send sizes and timeout. ALSO, take into account here, network send buffer sizes, can play a role on performance, so take that into account sizing batches and configuring buffer sizes.

--Udo


On 11/2/18 09:56, Rob Shepherd wrote:
Thank you Nabarun,

Having started to pull out my config to send to you, I noticed the following in my cache.xml:

<async-event-queue
id="expiry-event-queue"
parallel="false"
enable-batch-conflation="false"
disk-synchronous="true"
forward-expiration-destroy="true"
  —-> batch-time-interval=“0"
batch-size="1"
>...

…Which is a copy from the example here: https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
create async-event-queue --id=example-async-event-queue \
   --parallel=false \
   --enable-batch-conflation=false \
   --batch-size=1 \
   --batch-time-interval=0 \
   --listener=org.apache.geode_examples.async.ExampleAsyncEventListener


I pondered on “batch-time-interval” and set it to 1000 and it has fixed the issue.

I think I understand what this parameter is for and so a delay here would be tolerable.



I have another question, if I set parallel=“true” the gfsh start server command hangs and I have to kill the new server process and the gfsh launcher.

it is not important to me now, but i would like to evaluate this at some point and so i’ll happily try and debug the cause.

Thanks

Rob




On 2 Nov 2018, at 16:22, Nabarun Nag <[email protected] <mailto:[email protected]>> wrote:

Hi Rob,

We will look into this, meanwhile could you please elaborate on what configuration is Apache Geode running, like how many servers, how many AEQs regions etc, what workload is it running.

Thank you
Nabarun Nag

On Nov 2, 2018, at 8:37 AM, Rob Shepherd <[email protected] <mailto:[email protected]>> wrote:

Hi,

I’m using Geode (1.7.0)  locally (OSX) and on a server.  (Linux Arm64)

On both I’m seeing maxed out CPUs.

I’ve profiled it locally on a dormant server instance (no application activity) and the Async Queue routines are the highest contributor to CPU activity by a long stretch.

<PastedGraphic-1.png>

Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU) Samples
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
        100.00%         2829377292      2829377292      8639
.org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
        0.00%   0       0       8639
..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
        0.00%   0       0       8263
...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
        1.21%   9906300         9906300         8180
....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
        100.00%         1144915695      1144915695      5
.....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
        0.00%   0       0       5



How can I determine if this is a problem with my setup or if it is a bug?

A supposition:  I notice that there are multiple instances of a thread named after my Async Event queue ID

Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4

Are there supposed to be 4?  are they interfering with each other (wait/notify) on an empty queue?

Thanks for any insight

Rob



Reply via email to