That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening with
the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu <rallav...@gmail.com> wrote:

> Mark - currently 5.3 is being evaluated for upgrade purposes and
> hopefully get there sooner. Meanwhile, following exception is noted from
> logs during updates
>
> ERROR org.apache.solr.update.CommitTracker  – auto commit
> error...:java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
>          at
>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
>          at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
>          at
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
>          at
> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>          at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
>          at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>          at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>          at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
>          at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>          at java.lang.Thread.run(Thread.java:682)
>
> Considering the fact that the machine is configured with 48G (24G for
> JVM which will be reduced in future) wondering how would it still go out
> of memory. For memory mapped index files the remaining 24G or what is
> available off of it should be available. Looking at the lsof output the
> memory mapped files were around 10G.
>
> Thanks.
>
>
> On 10/5/15 5:41 PM, Mark Miller wrote:
> > I'd make two guess:
> >
> > Looks like you are using Jrocket? I don't think that is common or well
> > tested at this point.
> >
> > There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace
> of
> > SolrCloud, you are dealing with something fairly ancient and so it will
> be
> > harder to find help with older issues most likely.
> >
> > - Mark
> >
> > On Mon, Oct 5, 2015 at 12:46 PM Rallavagu <rallav...@gmail.com> wrote:
> >
> >> Any takers on this? Any kinda clue would help. Thanks.
> >>
> >> On 10/4/15 10:14 AM, Rallavagu wrote:
> >>> As there were no responses so far, I assume that this is not a very
> >>> common issue that folks come across. So, I went into source (4.6.1) to
> >>> see if I can figure out what could be the cause.
> >>>
> >>>
> >>> The thread that is locking is in this block of code
> >>>
> >>> synchronized (recoveryLock) {
> >>>         // to be air tight we must also check after lock
> >>>         if (cc.isShutDown()) {
> >>>           log.warn("Skipping recovery because Solr is shutdown");
> >>>           return;
> >>>         }
> >>>         log.info("Running recovery - first canceling any ongoing
> >> recovery");
> >>>         cancelRecovery();
> >>>
> >>>         while (recoveryRunning) {
> >>>           try {
> >>>             recoveryLock.wait(1000);
> >>>           } catch (InterruptedException e) {
> >>>
> >>>           }
> >>>           // check again for those that were waiting
> >>>           if (cc.isShutDown()) {
> >>>             log.warn("Skipping recovery because Solr is shutdown");
> >>>             return;
> >>>           }
> >>>           if (closed) return;
> >>>         }
> >>>
> >>> Subsequently, the thread will get into cancelRecovery method as below,
> >>>
> >>> public void cancelRecovery() {
> >>>       synchronized (recoveryLock) {
> >>>         if (recoveryStrat != null && recoveryRunning) {
> >>>           recoveryStrat.close();
> >>>           while (true) {
> >>>             try {
> >>>               recoveryStrat.join();
> >>>             } catch (InterruptedException e) {
> >>>               // not interruptible - keep waiting
> >>>               continue;
> >>>             }
> >>>             break;
> >>>           }
> >>>
> >>>           recoveryRunning = false;
> >>>           recoveryLock.notifyAll();
> >>>         }
> >>>       }
> >>>     }
> >>>
> >>> As per the stack trace "recoveryStrat.join()" is where things are
> >>> holding up.
> >>>
> >>> I wonder why/how cancelRecovery would take time so around 870 threads
> >>> would be waiting on. Is it possible that ZK is not responding or
> >>> something else like Operating System resources could cause this?
> Thanks.
> >>>
> >>>
> >>> On 10/2/15 4:17 PM, Rallavagu wrote:
> >>>> Here is the stack trace of the thread that is holding the lock.
> >>>>
> >>>>
> >>>> "Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
> >>>> native_blocked, daemon
> >>>>       -- Waiting for notification on:
> >>>> org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
> >>>>       at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
> >>>>       at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
> >>>>       at
> >>>> syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
> >>>>       at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
> >>>>       at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
> >>>>       at
> >>>>
> >>
> RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a
> >>>>
> >>>>
> >>>>       at
> >>>> jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
> >>>> Method)
> >>>>       at java/lang/Object.wait(J)V(Native Method)
> >>>>       at java/lang/Thread.join(Thread.java:1206)
> >>>>       ^-- Lock released while waiting:
> >>>> org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
> >>>>       at java/lang/Thread.join(Thread.java:1259)
> >>>>       at
> >>>>
> >>
> org/apache/solr/update/DefaultSolrCoreState.cancelRecovery(DefaultSolrCoreState.java:331)
> >>>>
> >>>>
> >>>>       ^-- Holding lock: java/lang/Object@0x114d8dd00[recursive]
> >>>>       at
> >>>>
> >>
> org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:297)
> >>>>
> >>>>
> >>>>       ^-- Holding lock: java/lang/Object@0x114d8dd00[fat lock]
> >>>>       at
> >>>>
> >>
> org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)
> >>>>
> >>>>
> >>>>       at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
> >>>>
> >>>>
> >>>> Stack trace of one of the 870 threads that is waiting for the lock to
> be
> >>>> released.
> >>>>
> >>>> "Thread-55489" id=77520 idx=0xebc tid=1494 prio=5 alive, blocked,
> >>>> native_blocked, daemon
> >>>>       -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat
> >>>> lock]
> >>>>       at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
> >>>>       at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
> >>>>       at
> >>>> syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
> >>>>       at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
> >>>>       at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
> >>>>       at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
> >>>>       at
> jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
> >>>>       at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
> >>>>       at
> >>>>
> jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]
> >>>>       at
> >>>> jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
> >>>>       at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
> >>>>       at
> >>>>
> >>
> org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290)
> >>>>
> >>>>
> >>>>       at
> >>>>
> >>
> org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)
> >>>>
> >>>>
> >>>>       at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
> >>>>
> >>>> On 10/2/15 4:12 PM, Rallavagu wrote:
> >>>>> Solr 4.6.1 on Tomcat 7, single shard 4 node cloud with 3 node
> zookeeper
> >>>>>
> >>>>> During updates, some nodes are going very high cpu and becomes
> >>>>> unavailable. The thread dump shows the following thread is blocked
> 870
> >>>>> threads which explains high CPU. Any clues on where to look?
> >>>>>
> >>>>> "Thread-56848" id=79207 idx=0x38 tid=3169 prio=5 alive, blocked,
> >>>>> native_blocked, daemon
> >>>>>       -- Blocked trying to get lock: java/lang/Object@0x114d8dd00
> [fat
> >>>>> lock]
> >>>>>       at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
> >>>>>       at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
> >>>>>       at
> >>>>> syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
> >>>>>       at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
> >>>>>       at
> syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
> >>>>>       at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
> >>>>>       at
> jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
> >>>>>       at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
> >>>>>       at
> >>>>>
> >> jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]
> >>>>>       at
> >>>>> jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
> >>>>>       at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
> >>>>>       at
> >>>>>
> >>
> org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290)
> >>>>>
> >>>>>
> >>>>>
> >>>>>       at
> >>>>>
> >>
> org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)
> >>>>>
> >>>>>
> >>>>>
> >>>>>       at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
> >>
>
-- 
- Mark
about.me/markrmiller

Reply via email to