Hi All,

I've had a first look at porting the patch I did for SOLR-7191 (SolrCloud
with thousands of collections) in Solr 4.10 to the Solr trunk (1708905).
Now I created 6,000 collections (3 nodes; 2 x replicas) and re-started the
3 nodes. What I noticed is that the cloud is starting but slowly. All
the org.apache.solr.core.CoreContainer.create()
threads are blocked in the ZkStateReader. I was hoping the changes to
clusterstate.json from global to per collection would reduce the
contention. Comments appreciated.

example jstacks:
"coreLoadExecutor-6-thread-24-processing-n:ftet1:8003_solr" #70 prio=5
os_prio=64 tid=0x0000000000bcd800 nid=0x88 waiting for monitor entry
[0x00007fefb29bc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at
org.apache.solr.common.cloud.ZkStateReader.addCollectionWatch(ZkStateReader.java:1048)
        - waiting to lock <0x00007ff0403ff020> (a
org.apache.solr.common.cloud.ZkStateReader)
        at
org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1561)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:726)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:451)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:442)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"zkCallback-4-thread-80-processing-n:ftet1:8003_solr" #268 prio=5
os_prio=64 tid=0x0000000002ee0000 nid=0x134 in Object.wait()
[0x00007fefaed2d000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
        - locked <0x00007ff0be17e600> (a
org.apache.zookeeper.ClientCnxn$Packet)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:350)
        at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
        at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:350)
        at
org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1030)
        at
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1015)
        at
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(ZkStateReader.java:550)
        at
org.apache.solr.common.cloud.ClusterState.getCollections(ClusterState.java:207)
        at
org.apache.solr.common.cloud.ZkStateReader.constructState(ZkStateReader.java:462)
        at
org.apache.solr.common.cloud.ZkStateReader.access$600(ZkStateReader.java:57)
        at
org.apache.solr.common.cloud.ZkStateReader$StateWatcher.process(ZkStateReader.java:864)
        - locked <0x00007ff0403ff020> (a
org.apache.solr.common.cloud.ZkStateReader)
        at
org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:269)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Reply via email to