Hi,
We are using Solr 4.10.4 and experiencing out of memory exception.  It
seems the problem is cause by the following code & scenario.

This is the last part of a fetchLastIndex method in SnapPuller.java

        // we must reload the core after we open the IW back up
        if (reloadCore) {
          reloadCore();
        }

        if (successfulInstall) {
          if (isFullCopyNeeded) {
            // let the system know we are changing dir's and the old one
            // may be closed
            if (indexDir != null) {
              LOG.info("removing old index directory " + indexDir);
              core.getDirectoryFactory().doneWithDirectory(indexDir);
              core.getDirectoryFactory().remove(indexDir);
            }
          }
          if (isFullCopyNeeded) {
            solrCore.getUpdateHandler().newIndexWriter(isFullCopyNeeded);
          }

          openNewSearcherAndUpdateCommitPoint(isFullCopyNeeded);
        }

Inside the reloadCore, it create a new core, register it, and try to close
the current/old core.  When the closing old core process goes normal, it
throws an exception "SnapPull failed :org.apache.solr.common.SolrException:
Index fetch failed Caused by java.lang.RuntimeException: Interrupted while
waiting for core reload to finish Caused by Caused by:
java.lang.InterruptedException."

Despite this exception, the process seems OK because it just terminate the
SnapPuller thread but all other threads that process the closing go well.

*Now, the problem is when the close() method called during the reloadCore
doesn't really close the core.*
This is the beginning of the close() method.
    public void close() {
        int count = refCount.decrementAndGet();
        if (count > 0) return; // close is called often, and only actually
closes if nothing is using it.
        if (count < 0) {
           log.error("Too many close [count:{}] on {}. Please report this
exception to solr-user@lucene.apache.org", count, this );
           assert false : "Too many closes on SolrCore";
           return;
        }
        log.info(logid+" CLOSING SolrCore " + this);

When a HTTP Request is executing, the refCount is greater than 1. So, when
the old core is trying to be closed during the core reload, the if (count >
0) condition simply return this method.

Then, fetchLastIndex method in SnapPuller processes next code and execute
"openNewSearcherAndUpdateCommitPoint".  If you look at this method, it
tries to open a new searcher of the solrCore which is referenced during the
SnapPuller constructor and I believe this one points to the old core.  At
certain timing, this method also throw
SnapPuller  - java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at ....SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)

After this exception, things start to go bad.

*In summary, I have two questions.*
1. Can you confirm this memory / thread issue?
2. When the core reload happens successfully (no matter it throws the
exception or not), does Solr need to call the
openNewSearcherAndUpdateCommitPoint method?

Thanks.

Reply via email to