Hi All, I've had a first look at porting the patch I did for SOLR-7191 (SolrCloud with thousands of collections) in Solr 4.10 to the Solr trunk (1708905). Now I created 6,000 collections (3 nodes; 2 x replicas) and re-started the 3 nodes. What I noticed is that the cloud is starting but slowly. All the org.apache.solr.core.CoreContainer.create() threads are blocked in the ZkStateReader. I was hoping the changes to clusterstate.json from global to per collection would reduce the contention. Comments appreciated.
example jstacks: "coreLoadExecutor-6-thread-24-processing-n:ftet1:8003_solr" #70 prio=5 os_prio=64 tid=0x0000000000bcd800 nid=0x88 waiting for monitor entry [0x00007fefb29bc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.solr.common.cloud.ZkStateReader.addCollectionWatch(ZkStateReader.java:1048) - waiting to lock <0x00007ff0403ff020> (a org.apache.solr.common.cloud.ZkStateReader) at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1561) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:726) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:451) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:442) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) "zkCallback-4-thread-80-processing-n:ftet1:8003_solr" #268 prio=5 os_prio=64 tid=0x0000000002ee0000 nid=0x134 in Object.wait() [0x00007fefaed2d000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) - locked <0x00007ff0be17e600> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:350) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:350) at org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1030) at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1015) at org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(ZkStateReader.java:550) at org.apache.solr.common.cloud.ClusterState.getCollections(ClusterState.java:207) at org.apache.solr.common.cloud.ZkStateReader.constructState(ZkStateReader.java:462) at org.apache.solr.common.cloud.ZkStateReader.access$600(ZkStateReader.java:57) at org.apache.solr.common.cloud.ZkStateReader$StateWatcher.process(ZkStateReader.java:864) - locked <0x00007ff0403ff020> (a org.apache.solr.common.cloud.ZkStateReader) at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:269) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)