Hi, We are using Solr 4.10.4 and experiencing out of memory exception. It seems the problem is cause by the following code & scenario.
This is the last part of a fetchLastIndex method in SnapPuller.java // we must reload the core after we open the IW back up if (reloadCore) { reloadCore(); } if (successfulInstall) { if (isFullCopyNeeded) { // let the system know we are changing dir's and the old one // may be closed if (indexDir != null) { LOG.info("removing old index directory " + indexDir); core.getDirectoryFactory().doneWithDirectory(indexDir); core.getDirectoryFactory().remove(indexDir); } } if (isFullCopyNeeded) { solrCore.getUpdateHandler().newIndexWriter(isFullCopyNeeded); } openNewSearcherAndUpdateCommitPoint(isFullCopyNeeded); } Inside the reloadCore, it create a new core, register it, and try to close the current/old core. When the closing old core process goes normal, it throws an exception "SnapPull failed :org.apache.solr.common.SolrException: Index fetch failed Caused by java.lang.RuntimeException: Interrupted while waiting for core reload to finish Caused by Caused by: java.lang.InterruptedException." Despite this exception, the process seems OK because it just terminate the SnapPuller thread but all other threads that process the closing go well. *Now, the problem is when the close() method called during the reloadCore doesn't really close the core.* This is the beginning of the close() method. public void close() { int count = refCount.decrementAndGet(); if (count > 0) return; // close is called often, and only actually closes if nothing is using it. if (count < 0) { log.error("Too many close [count:{}] on {}. Please report this exception to solr-user@lucene.apache.org", count, this ); assert false : "Too many closes on SolrCore"; return; } log.info(logid+" CLOSING SolrCore " + this); When a HTTP Request is executing, the refCount is greater than 1. So, when the old core is trying to be closed during the core reload, the if (count > 0) condition simply return this method. Then, fetchLastIndex method in SnapPuller processes next code and execute "openNewSearcherAndUpdateCommitPoint". If you look at this method, it tries to open a new searcher of the solrCore which is referenced during the SnapPuller constructor and I believe this one points to the old core. At certain timing, this method also throw SnapPuller - java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at ....SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680) After this exception, things start to go bad. *In summary, I have two questions.* 1. Can you confirm this memory / thread issue? 2. When the core reload happens successfully (no matter it throws the exception or not), does Solr need to call the openNewSearcherAndUpdateCommitPoint method? Thanks.