Hi, in the last few days we had some troubles with one of our clusters (5 machines each running 4.7.2 inside jetty container, no replication, Java 1.7.21). Two time we had troubles to restart one server (same machine) because of some FileNotFoundException.
1. First time: Stopping Solr while indexing resulted in the following log output: 2014-09-04 10:09:45,633 INFO o.a.s.s.SolrIndexSearcher [recoveryExecutor-6-thread-1] Opening Searcher@2b94db[shard2_replica1] realtime 2014-09-04 10:09:45,634 INFO o.a.s.u.DirectUpdateHandler2 [recoveryExecutor-6-thread-1] Reordered DBQs detected. Update=add{...} DBQs=[...] 2014-09-04 10:09:45,646 ERROR o.a.s.c.SolrException [recoveryExecutor-6-thread-1] Error opening realtime searcher for deleteByQuery:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1521) at org.apache.solr.update.UpdateLog.add(UpdateLog.java:422) at org.apache.solr.update.DirectUpdateHandler2.addAndDelete(DirectUpdateHandler2.java:449) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:216) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:704) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:858) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1326) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1215) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: _7omin_Lucene41_0.tip at org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252) at org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238) at java.util.TimSort.binarySort(TimSort.java:265) at java.util.TimSort.sort(TimSort.java:208) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286) at org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:1970) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1940) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:404) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:289) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:274) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:250) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1445) ... 21 more 2. Second time: brought some updates to init.d scripts, had to restart each server on the cluster. No indexing at this time. Same server chrashed now with this output: While shutting down 2014-09-05 15:13:39,204 INFO o.a.s.c.c.ZkStateReader$2 [main-EventThread] A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 2014-09-05 15:13:39,585 INFO o.a.s.c.c.ZkStateReader$2 [main-EventThread] A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 2014-09-05 15:13:39,586 INFO o.a.s.c.c.ZkStateReader$2 [main-EventThread] A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 2014-09-05 15:13:39,942 INFO o.a.s.c.c.ZkStateReader$3 [main-EventThread] Updating live nodes... (4) 2014-09-05 15:13:39,942 INFO o.a.s.c.c.ZkStateReader$3 [main-EventThread] Updating live nodes... (4) 2014-09-05 15:13:40,004 ERROR o.a.s.u.SolrIndexWriter [Scanner-1] Error closing IndexWriter, trying rollback java.io.FileNotFoundException: /var/lib/solr/cores/shard2_replica1/data/index/_8375e.si (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:505) at org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:307) at org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:219) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4489) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2953) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3049) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1041) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:932) When coming back up: 2014-09-05 15:13:44,912 INFO o.a.s.c.CachingDirectoryFactory [coreLoadExecutor-4-thread-1] Closing directory: /var/lib/solr/cores/shard2_replica1/data 2014-09-05 15:13:44,912 ERROR o.a.s.c.CoreContainer [coreLoadExecutor-4-thread-1] Unable to create core: shard2_replica1 org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.<init>(SolrCore.java:844) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:630) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:245) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:595) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:258) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:250) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1521) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1633) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:827) ... 13 more Caused by: java.io.FileNotFoundException: /var/lib/solr/cores/shard2_replica1/data/index/segments_42u (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:233) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:324) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:404) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:694) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:400) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:746) The first instance I can kind of understand. Shutting down a server while he is indexing is bound to make create problems with the index, but the second time the server was just being queried and not being indexed. Any ideas what went wrong? File system problems? Shutting down too fast? Regards Oliver