Hi there Rob,
Great to see that you found one of the problems. So essentially you told
the async-queue to try and send whatever it had in it's queue every
0ms... which means, it will just spin, wanting to send.
`batch-time-interval` and `batch-size` are there to determine WHEN
message batches get to sent. So, with a `batch-size=1` would also cause
this queue to always fire, when there is 1 message in the queue. I
usually treat the settings like, "send a batch of xxx messages when the
limit is reached, otherwise wait yy-millis to send what is available"
Usually WAN Replication and Async-Queue (which share a common hierarchy)
are async, fault-tolerant, batch-oriented mechanisms. To use them at the
granular level of every 0ms or 1 entry in the batch is a lot of
overhead, in terms of effort to keep the primary and backup queues in sync.
Also, keep in mind, sending every message you receive might not be
beneficial either. (think financial rates). If one receives the data at
a high frequency, it is always best to ask oneself, can the
process/downstream system process and potentially respond to a change
BEFORE the next rate is delivered. So in some cases it makes sense to
even have batch-conflation turned on, to avoid sending messages that
could essentially be ignored.
But if the requirement is to send (and react to) every message, then
these two parameters are something I would test with to find optimal
send sizes and timeout. ALSO, take into account here, network send
buffer sizes, can play a role on performance, so take that into account
sizing batches and configuring buffer sizes.
--Udo
On 11/2/18 09:56, Rob Shepherd wrote:
Thank you Nabarun,
Having started to pull out my config to send to you, I noticed the
following in my cache.xml:
<async-event-queue
id="expiry-event-queue"
parallel="false"
enable-batch-conflation="false"
disk-synchronous="true"
forward-expiration-destroy="true"
—-> batch-time-interval=“0"
batch-size="1"
>...
…Which is a copy from the example here:
https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh
create async-event-queue --id=example-async-event-queue \
--parallel=false \
--enable-batch-conflation=false \
--batch-size=1 \
--batch-time-interval=0 \
--listener=org.apache.geode_examples.async.ExampleAsyncEventListener
I pondered on “batch-time-interval” and set it to 1000 and it has
fixed the issue.
I think I understand what this parameter is for and so a delay here
would be tolerable.
I have another question, if I set parallel=“true” the gfsh start
server command hangs and I have to kill the new server process and the
gfsh launcher.
it is not important to me now, but i would like to evaluate this at
some point and so i’ll happily try and debug the cause.
Thanks
Rob
On 2 Nov 2018, at 16:22, Nabarun Nag <[email protected]
<mailto:[email protected]>> wrote:
Hi Rob,
We will look into this, meanwhile could you please elaborate on what
configuration is Apache Geode running, like how many servers, how
many AEQs regions etc, what workload is it running.
Thank you
Nabarun Nag
On Nov 2, 2018, at 8:37 AM, Rob Shepherd <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64)
On both I’m seeing maxed out CPUs.
I’ve profiled it locally on a dormant server instance (no
application activity) and the Async Queue routines are the highest
contributor to CPU activity by a long stretch.
<PastedGraphic-1.png>
Back Traces - Method Total Time [%] Total Time [µs] Total Time
(CPU) Samples
org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey()
100.00% 2829377292 2829377292 8639
.org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey()
0.00% 0 0 8639
..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead()
0.00% 0 0 8263
...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek()
1.21% 9906300 9906300 8180
....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue()
100.00% 1144915695 1144915695 5
.....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run()
0.00% 0 0 5
How can I determine if this is a problem with my setup or if it is a
bug?
A supposition: I notice that there are multiple instances of a
thread named after my Async Event queue ID
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3
Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4
Are there supposed to be 4? are they interfering with each other
(wait/notify) on an empty queue?
Thanks for any insight
Rob