What is netstat telling you about the connections on the servers?
Any connections in "CLOSE_WAIT" (passive close) hanging? Saw this on my servers last week. Used a little proggi to spoof a local connection on those servers ports and was able to fake the TCP-stack to close those connections. It also immediately released all open fd's set to DEL and cleaned everything up without restarting. Regards Bernd Am 01.03.2012 11:36, schrieb Markus Jelsma:
Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a while ClassNotFoundErrors and other similar weirdness begin to appear. This will not solve itself if indexing is stopped. With lsof i found that Solr keeps open roughly 9k files 8 hours after indexing failed. Out of the 9k there are roughly 7.5k deleted files that still have a file descriptor open for the tomcat6 user, these are all segments files: /opt/solr/openindex_a/data/index.20120228101550/_34s.tvd java 10049 tomcat6 DEL REG 9,0 515607 /opt/solr/openindex_a/data/index.20120228101550/_34s.tvx java 10049 tomcat6 DEL REG 9,0 515504 /opt/solr/openindex_a/data/index.20120228101550/_34s.fdx java 10049 tomcat6 DEL REG 9,0 515735 /opt/solr/openindex_a/data/index.20120228101550/_34s_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515595 /opt/solr/openindex_a/data/index.20120228101550/_34v_nrm.cfs java 10049 tomcat6 DEL REG 9,0 515592 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.tim java 10049 tomcat6 DEL REG 9,0 515591 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.prx java 10049 tomcat6 DEL REG 9,0 515590 /opt/solr/openindex_a/data/index.20120228101550/_34v_0.frq .... any many more Did i misconfigure anything? This is a pretty standard (no changes to IndexDefaults section) and a recent Solr trunk revision. Is there a bug somewhere? Thanks, Markus