Some more info to provide: -Replication almost never completes following the "this IndexWriter is closed" stacktraces. -When the replication begins after "this IndexWriter is closed" error, over a few hours the replica eventually fills the disk to 100% with index files under data/. There are so many files in the data directory it can't be listed and takes a very long time to delete. It seems the frequent replications are filling the disk with new files whose sum is roughly 3 times larger than the real index. Is it leaking filehandles or forgetting it has downloaded something?
Is this a better question for the lucene list? It seems (see below) that this stacktrace is occuring in the lucene layer vs solr, but maybe someone could confirm? "ERROR [2014-01-27 18:28:49.368] [org.apache.solr.common.SolrException] org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.DocumentsWriter.ensureOpen(DocumentsWriter.java:199) at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:338) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:419) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1508) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:519) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:655) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) ... <chopped>" Thanks! Tim On 5 February 2014 13:04, Tim Vaillancourt <t...@elementspace.com> wrote: > Hey guys, > > I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2 > shards over 4 Solr instances, (which results in 1 core per Solr instance). > > After some time in Production without issues, we are seeing errors related > to the IndexWriter all over our logs and an infinite loop of failing > replication from Leader on our 2 replicas. > > We see a flood of: "org.apache.lucene.store.AlreadyClosedException: this > IndexWriter is closed" stacktraces, then the Solr replica tries to > replicate/recover, then fails replication and then the following 2 errors > show up: > > 1) "SolrIndexWriter was not closed prior to finalize(), indicates a bug -- > POSSIBLE RESOURCE LEAK!!!" > 2) "Error closing IndexWriter, trying rollback" (which results in a > null-pointer exception). > > I'm guessing the best way forward would be to upgrade to latest, but that > is an undertaking that will take significant time/testing. In the meantime, > is there anything I can do to mitigate or understand the issue more? > > Does anyone know what the IndexWriter errors refer to? > > Below is a URL to a .txt file with summarized portions of my solr.log. Any > help is really appreciated as always!! > > http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt > > Thanks all, > > Tim >