Was looking on our server and at one point there were over 13k open file descriptors for the same
spell index: /home/dsteiger/local/solr/cores/qaa/data/spell/_1ji.cfs. At some point dropped back
down to 3000 (when I checked again) with no intervention from us.
On my local machine after every query I end up with two extra open files for
the spell index.
Solr start:
$ ls -l /proc/18832/fd|grep spell
lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 28 ->
/home/dsteiger/Desktop/solr/example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-spellchecker-2.3.0.jar
After first query:
$ ls -l /proc/18832/fd|grep spell
lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 28 ->
/home/dsteiger/Desktop/solr/example/work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/lucene-spellchecker-2.3.0.jar
lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 49 ->
/home/dsteiger/Desktop/solr/example/solr/core0/data/spell/_25y.cfs
lr-x------ 1 dsteiger dsteiger 64 2008-04-03 11:37 50 ->
/home/dsteiger/Desktop/solr/example/solr/core0/data/spell/_25y.cfs
After 10 queries:
$ ls -l /proc/18832/fd|grep _25y|wc -l
20
Up until this point I've done each query one at a time. After 15 more queries with a perl script
(15 konsoles open all running my random query script at the same time):
$ ls -l /proc/18832/fd|grep _25y|wc -l
38
Another 15 leaves the count at 44.
I'm guessing this has to do with the spellchecker being in a component and how I ripped the code out
of the SpellCheckRequestHandler. If I hit the SpellCheckRequestHandler normally
(http://localhost:8983/solr/core0/select?qt=spellchecker&q=pouted), two files are opened after the
first query, and then no additional files opened.
If anyone wants to take a look at the spellcheck component I have, let me know and I'll pass it
along. I may just have to stop using it and go back to a separate request for our spellchecking.
Thanks.
Doug
Doug Steigerwald wrote:
The user that runs our apps is configured to allow 65536 open files in
limits.conf. Shouldn't even come close to that number. Solr is the
only app we have running on these machines as our app user.
We hit the same type of issue when we had our mergeFactor set to 40 for
all of our indexes. We lowered it to 5 and have been fine since.
No errors in the snappuller for either core. The spellcheck index is
rebuilt once a night around midnight and copied to the slave
afterwards. I had even rebuilt the spell index manually for the two
cores, pulled them, installed them, and tested to make sure it was
working with a few queries before the load testing started (this was
before we released the patch to lower the spell index mergeFactor).
We were even getting errors trying to run out postCommit script on the
slave (it doesn't end up doing anything since it's the slave).
SEVERE: java.io.IOException: Cannot run program "./solr/bin/snapctl":
java.io.IOException: error=24, Too many open files
at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
And a correction from my previous email. The errors started 10
-seconds- after load testing started. This was about 40 minutes after
Solr started, and less than 30 queries had been run on the server before
load testing started.
Load testing has been fine since I restarted Solr and rebuilt the
spellcheck indexes with the lowered mergeFactor.
Doug
Otis Gospodnetic wrote:
Hi Doug,
Sounds fishy, especially increasing/decreasing mergeFactor to "funny
values" (try changing your OS setting instead).
My guess is this is happening only with the 2 indices that are being
modified and I'll guess that the FNFE is due to a bad/incomplete rsync
from the master. Do snappuller logs mention any errors?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch