You can try increasing the subscription queue... Following are some of the steps to manage subscription queue: http://gemfire.docs.pivotal.io/geode/developing/events/limit_server_subscription_queue_size.html
-Anil. On Wed, Sep 27, 2017 at 2:58 PM, Mangesh Deshmukh <[email protected]> wrote: > Hi, > > > > FYI: I have filed a JIRA ticket on this but I thought may be someone might > be aware of solution or workaround for this problem. So, I am posting it > here as well. > > > > In one of the project we are using Geode. Here is a summary of how we use > it. > > - Geode servers (Release 1.1.1) have multiple regions. > > - Clients subscribe to the data from these regions. > > - Clients subscribe interest in all the entries, therefore they get > updates about all the entries from creation to modification to deletion. > > - One of the regions usually has 5-10 million entries with a TTL of 24 > hours. Most entries are added in an hour's span one after other. So, when > TTL kicks in, they are often destroyed in an hour. > > > > Problem: > > Every now and then we observe following message: > > Client queue for _gfe_non_durable_client_with_ > id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue client is full. > > This seems to happen when the TTL kicks in. Entries start getting evicted > (deleted), the updates now must be sent to clients. We see that the updates > do happen for a while but suddenly the updates stop and the queue size > starts growing. This is becoming a major issue for smooth functioning of > our production setup. Any help will be much appreciated. > > > > I did some ground work by downloading and looking at the code. I see > reference to 2 issues #37581, #51400. But I am unable to view actual JIRA > tickets (needs login credentials) Hopefully, it helps someone looking at > the issue. > > Here is the pertinent code: > > > > @Override > > @edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT") > > void checkQueueSizeConstraint() throws InterruptedException { > > if (this.haContainer instanceof HAContainerMap && isPrimary()) { // > Fix for bug 39413 > > if (Thread.interrupted()) > > throw new InterruptedException(); > > synchronized (this.putGuard) { > > if (putPermits <= 0) { > > synchronized (this.permitMon) { > > if (reconcilePutPermits() <= 0) { > > if (region.getSystem().getConfig( > ).getRemoveUnresponsiveClient()) { > > isClientSlowReciever = true; > > } else { > > try { > > long logFrequency = CacheClientNotifier.DEFAULT_ > LOG_FREQUENCY; > > CacheClientNotifier ccn = CacheClientNotifier. > getInstance(); > > if (ccn != null) { // check needed for junit tests > > logFrequency = ccn.getLogFrequency(); > > } > > if ((this.maxQueueSizeHitCount % logFrequency) == 0) { > > logger.warn(LocalizedMessage.create( > > LocalizedStrings.HARegionQueue_CLIENT_QUEUE_ > FOR_0_IS_FULL, > > new Object[] {region.getName()})); > > this.maxQueueSizeHitCount = 0; > > } > > ++this.maxQueueSizeHitCount; > > this.region.checkReadiness(); // fix for bug 37581 > > // TODO: wait called while holding two locks > > this.permitMon.wait(CacheClientNotifier. > eventEnqueueWaitTime); > > this.region.checkReadiness(); // fix for bug 37581 > > // Fix for #51400. Allow the queue to grow beyond its > > // capacity/maxQueueSize, if it is taking a long time > to > > // drain the queue, either due to a slower client or > the > > // deadlock scenario mentioned in the ticket. > > reconcilePutPermits(); > > if ((this.maxQueueSizeHitCount % logFrequency) == 1) { > > logger.info(LocalizedMessage > > .create(LocalizedStrings. > HARegionQueue_RESUMING_WITH_PROCESSING_PUTS)); > > } > > } catch (InterruptedException ex) { > > // TODO: The line below is meaningless. Comment it out > later > > this.permitMon.notifyAll(); > > throw ex; > > } > > } > > } > > } // synchronized (this.permitMon) > > } // if (putPermits <= 0) > > --putPermits; > > } // synchronized (this.putGuard) > > } > > } > > > > > > Thanks > > Mangesh > > >
