Hi,

We're doing some tests with the latest trunk revision on a cluster of five 
high-end machines. There is one collection, five shards and one replica per 
shard on some other node.

We're filling the index from a MapReduce job, 18 processes run concurrently. 
This is plenty when indexing to a single high-end node but with SolrCloud 
things go down pretty soon.

First we get a Too Many Open Files error on all nodes almost at the same time. 
When shutting down the indexer the nodes won't respond anymore except for an 
Internal Server Error.

First the too many open files stack trace:

2012-02-29 15:22:51,067 ERROR [solr.core.SolrCore] - [http-80-6] - : 
java.io.FileNotFoundException: /opt/solr/openindex_b/data/index/_h5_0.tim (Too 
many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
        at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:449)
        at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:288)
        at 
org.apache.lucene.codecs.BlockTreeTermsWriter.<init>(BlockTreeTermsWriter.java:149)
        at 
org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:66)
        at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:118)
        at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:322)
        at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92)
        at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
        at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
        at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:475)
        at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
        at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:320)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:389)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1533)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1505)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:56)
        at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:53)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:354)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:451)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:258)
        at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:118)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1539)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
        at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
        at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:662



A similar exception sometimes begins with:

%2012-02-29 15:25:36,137 ERROR [solr.update.CommitTracker] - [pool-5-thread-1] 
- : auto commit error...:java.io.FileNotFoundException: /opt/solr/openi
ndex_a/data/index/_j3_0.tim (Too many open files)




Here's the Internal server error stack trace:

%2012-02-29 15:26:51,402 ERROR [solr.core.SolrCore] - [http-80-4] - : 
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: http://cn005.openindex.io/solr/openindex_b/select
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
        at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)
        at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662


This one also comes in various flavours such as :


%2012-02-29 15:21:55,292 ERROR [solr.core.SolrCore] - [http-80-15] - : shard 
update error 
http://cn005.openindex.io:80/solr/openindex_b/:org.apache.solr.common.SolrException:
 
Internal Server Error



The Linux machines have proper settings for ulimit and friends, 32k open files 
allowed so i suspect there's another limit which i am unaware of. I also 
listed the number of open files while the errors were coming in but it did not 
exceed 11k at any given time.

Any hints or advice? Did i miss something?

Thanks,
Markus

Reply via email to