Hi,
I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes for
some time. The cloud is hosting about 40 small collections that receive updates
once a day. The collections are using different shard and replication
configurations (varying from 2 shards without replication to 2 shard with 3
replicas).
After running Tomcat for a couple of weeks, I notice the number of open files
is dramatically increasing. Most of those files are deleted tlog files that
SOLR keeps open:
eric@node1:/ # lsof -np 16810 | grep deleted | wc -l
36345
Those files are no longer on disk, but SOLR still has a handle open. My disk
use is going through the roof. 6GB is currently 'in use' by deleted but still
open files. When I restart Tomcat, the space is freed and it starts all over
again. All of my nodes experience this behavior.
First I thought it had something to do with the lack of commits. But it happens
on all my collections, even the ones with fast autoCommit:
<autoCommit>
<maxDocs>5000</maxDocs>
<maxTime>120000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
My update process always triggers a commit or rollback and updates are showing
up correctly.
I read something about SOLR having TCP connections in CLOSE_WAIT. The only
CLOSE_WAIT connection I see are between the nodes. And there are only about 10
of them. Those connections can't be causing 36k open files, right?
Any suggestions/tips? At the moment, I have to restart my leader every couple
of weeks and that's not really something I would like to do :)
Best regards,
Eric Bus