Hi, We're doing some tests with the latest trunk revision on a cluster of five high-end machines. There is one collection, five shards and one replica per shard on some other node.
We're filling the index from a MapReduce job, 18 processes run concurrently. This is plenty when indexing to a single high-end node but with SolrCloud things go down pretty soon. First we get a Too Many Open Files error on all nodes almost at the same time. When shutting down the indexer the nodes won't respond anymore except for an Internal Server Error. First the too many open files stack trace: 2012-02-29 15:22:51,067 ERROR [solr.core.SolrCore] - [http-80-6] - : java.io.FileNotFoundException: /opt/solr/openindex_b/data/index/_h5_0.tim (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:449) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:288) at org.apache.lucene.codecs.BlockTreeTermsWriter.<init>(BlockTreeTermsWriter.java:149) at org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:66) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:118) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:322) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:475) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:320) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:389) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1533) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1505) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:56) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:53) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:354) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:451) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:258) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:118) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662 A similar exception sometimes begins with: %2012-02-29 15:25:36,137 ERROR [solr.update.CommitTracker] - [pool-5-thread-1] - : auto commit error...:java.io.FileNotFoundException: /opt/solr/openi ndex_a/data/index/_j3_0.tim (Too many open files) Here's the Internal server error stack trace: %2012-02-29 15:26:51,402 ERROR [solr.core.SolrCore] - [http-80-4] - : org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://cn005.openindex.io/solr/openindex_b/select at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662 This one also comes in various flavours such as : %2012-02-29 15:21:55,292 ERROR [solr.core.SolrCore] - [http-80-15] - : shard update error http://cn005.openindex.io:80/solr/openindex_b/:org.apache.solr.common.SolrException: Internal Server Error The Linux machines have proper settings for ulimit and friends, 32k open files allowed so i suspect there's another limit which i am unaware of. I also listed the number of open files while the errors were coming in but it did not exceed 11k at any given time. Any hints or advice? Did i miss something? Thanks, Markus