Was looking on our server and at one point there were over 13k open file descriptors for the same spell index: /home/dsteiger/local/solr/cores/qaa/data/spell/_1ji.cfs. At some point dropped back down to 3000 (when I checked again) with no intervention from us.

On my local machine after every query I end up with two extra open files for 
the spell index.

Solr start:
$ ls -l /proc/18832/fd|grep spell
lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 28 -> /home/dsteiger/Desktop/solr/example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-spellchecker-2.3.0.jar

After first query:
$ ls -l /proc/18832/fd|grep spell
lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 28 -> /home/dsteiger/Desktop/solr/example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-spellchecker-2.3.0.jar lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 49 -> /home/dsteiger/Desktop/solr/example/solr/core0/data/spell/_25y.cfs lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 50 -> /home/dsteiger/Desktop/solr/example/solr/core0/data/spell/_25y.cfs

After 10 queries:
$ ls -l /proc/18832/fd|grep _25y|wc -l
20

Up until this point I've done each query one at a time. After 15 more queries with a perl script (15 konsoles open all running my random query script at the same time):
$ ls -l /proc/18832/fd|grep _25y|wc -l
38

Another 15 leaves the count at 44.

I'm guessing this has to do with the spellchecker being in a component and how I ripped the code out of the SpellCheckRequestHandler. If I hit the SpellCheckRequestHandler normally (http://localhost:8983/solr/core0/select?qt=spellchecker&q=pouted), two files are opened after the first query, and then no additional files opened.

If anyone wants to take a look at the spellcheck component I have, let me know and I'll pass it along. I may just have to stop using it and go back to a separate request for our spellchecking.

Thanks.
Doug

Doug Steigerwald wrote:
The user that runs our apps is configured to allow 65536 open files in limits.conf. Shouldn't even come close to that number. Solr is the only app we have running on these machines as our app user.

We hit the same type of issue when we had our mergeFactor set to 40 for all of our indexes. We lowered it to 5 and have been fine since.

No errors in the snappuller for either core. The spellcheck index is rebuilt once a night around midnight and copied to the slave afterwards. I had even rebuilt the spell index manually for the two cores, pulled them, installed them, and tested to make sure it was working with a few queries before the load testing started (this was before we released the patch to lower the spell index mergeFactor).

We were even getting errors trying to run out postCommit script on the slave (it doesn't end up doing anything since it's the slave).

SEVERE: java.io.IOException: Cannot run program "./solr/bin/snapctl": java.io.IOException: error=24, Too many open files
        at java.lang.ProcessBuilder.start(Unknown Source)
        at java.lang.Runtime.exec(Unknown Source)

And a correction from my previous email. The errors started 10 -seconds- after load testing started. This was about 40 minutes after Solr started, and less than 30 queries had been run on the server before load testing started.

Load testing has been fine since I restarted Solr and rebuilt the spellcheck indexes with the lowered mergeFactor.

Doug

Otis Gospodnetic wrote:
Hi Doug,

Sounds fishy, especially increasing/decreasing mergeFactor to "funny values" (try changing your OS setting instead).

My guess is this is happening only with the 2 indices that are being modified and I'll guess that the FNFE is due to a bad/incomplete rsync from the master. Do snappuller logs mention any errors?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Reply via email to