Thank you for the detailed information on these values and for the use case analysis.
As a newcomer to geode i’d Suggest copying the information in these responses into the docs and changing the example timeout to something other than 0 Thanks again Rob On Fri, 2 Nov 2018 at 18:32, Udo Kohlmeyer <[email protected]> wrote: > Hi there Rob, > > Great to see that you found one of the problems. So essentially you told > the async-queue to try and send whatever it had in it's queue every 0ms... > which means, it will just spin, wanting to send. > > `batch-time-interval` and `batch-size` are there to determine WHEN message > batches get to sent. So, with a `batch-size=1` would also cause this queue > to always fire, when there is 1 message in the queue. I usually treat the > settings like, "send a batch of xxx messages when the limit is reached, > otherwise wait yy-millis to send what is available" > > Usually WAN Replication and Async-Queue (which share a common hierarchy) > are async, fault-tolerant, batch-oriented mechanisms. To use them at the > granular level of every 0ms or 1 entry in the batch is a lot of overhead, > in terms of effort to keep the primary and backup queues in sync. > > Also, keep in mind, sending every message you receive might not be > beneficial either. (think financial rates). If one receives the data at a > high frequency, it is always best to ask oneself, can the > process/downstream system process and potentially respond to a change > BEFORE the next rate is delivered. So in some cases it makes sense to even > have batch-conflation turned on, to avoid sending messages that could > essentially be ignored. > > But if the requirement is to send (and react to) every message, then these > two parameters are something I would test with to find optimal send sizes > and timeout. ALSO, take into account here, network send buffer sizes, can > play a role on performance, so take that into account sizing batches and > configuring buffer sizes. > > --Udo > > On 11/2/18 09:56, Rob Shepherd wrote: > > Thank you Nabarun, > > Having started to pull out my config to send to you, I noticed the > following in my cache.xml: > > <async-event-queue > id="expiry-event-queue" > parallel="false" > enable-batch-conflation="false" > disk-synchronous="true" > forward-expiration-destroy="true" > —-> batch-time-interval=“0" > batch-size="1" > >... > > …Which is a copy from the example here: > https://github.com/apache/geode-examples/blob/master/async/scripts/start.gfsh > > create async-event-queue --id=example-async-event-queue \ > --parallel=false \ > --enable-batch-conflation=false \ > --batch-size=1 \ > --batch-time-interval=0 \ > --listener=org.apache.geode_examples.async.ExampleAsyncEventListener > > > > I pondered on “batch-time-interval” and set it to 1000 and it has fixed > the issue. > > I think I understand what this parameter is for and so a delay here would > be tolerable. > > > > I have another question, if I set parallel=“true” the gfsh start server > command hangs and I have to kill the new server process and the gfsh > launcher. > > it is not important to me now, but i would like to evaluate this at some > point and so i’ll happily try and debug the cause. > > Thanks > > Rob > > > > > On 2 Nov 2018, at 16:22, Nabarun Nag <[email protected]> wrote: > > Hi Rob, > > We will look into this, meanwhile could you please elaborate on what > configuration is Apache Geode running, like how many servers, how many AEQs > regions etc, what workload is it running. > > Thank you > Nabarun Nag > > On Nov 2, 2018, at 8:37 AM, Rob Shepherd <[email protected]> wrote: > > Hi, > > I’m using Geode (1.7.0) locally (OSX) and on a server. (Linux Arm64) > > On both I’m seeing maxed out CPUs. > > I’ve profiled it locally on a dormant server instance (no application > activity) and the Async Queue routines are the highest contributor to CPU > activity by a long stretch. > > <PastedGraphic-1.png> > > Back Traces - Method Total Time [%] Total Time [µs] Total Time (CPU) > Samples > > org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getHeadKey() > > 100.00% 2829377292 2829377292 8639 > > .org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.getCurrentKey() > > 0.00% 0 0 8639 > > ..org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peekAhead() > > 0.00% 0 0 8263 > > ...org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderQueue.peek() > > 1.21% 9906300 9906300 8180 > > ....org.apache.geode.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue() > > 100.00% 1144915695 1144915695 5 > > .....org.apache.geode.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run() > > 0.00% 0 0 5 > > > How can I determine if this is a problem with my setup or if it is a bug? > > A supposition: I notice that there are multiple instances of a thread > named after my Async Event queue ID > > Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.1 > Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.2 > Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.3 > Event Processor for GatewaySender_AsyncEventQueue_expiry-event-queue.4 > > Are there supposed to be 4? are they interfering with each other > (wait/notify) on an empty queue? > > Thanks for any insight > > Rob > > > > > -- Rob Shepherd BEng PhD
