Thanks Roger. This was reported earlier but missed our attention. The issue is https://issues.apache.org/jira/browse/SOLR-11208
On Tue, Apr 2, 2019 at 5:56 PM Roger Lehmann <roger.lehm...@offerista.com> wrote: > To be more specific: I currently have 19 collections, where each node has > exactly one replica per collection. A new node will automatically create > new replicas on itself, one for each existing collection (see > cluster-policy above). > So when removing a node, all 19 collection replicas of it need to be > removed. This can't be done in one go due to thread count (parallel > synchronous execution) being only 10 and is not scaling up when necessary. > > On Fri, 29 Mar 2019 at 14:20, Roger Lehmann <roger.lehm...@offerista.com> > wrote: > > > Situation > > > > I'm currently trying to set up SolrCloud in an AWS Autoscaling Group, so > > that it can scale dynamically. > > > > I've also added the following triggers to Solr, so that each node will > > have 1 (and only one) replication of each collection: > > > > { > > "set-cluster-policy": [ > > {"replica": "<2", "shard": "#EACH", "node": "#EACH"} > > ], > > "set-trigger": [{ > > "name": "node_added_trigger", > > "event": "nodeAdded", > > "waitFor": "5s", > > "preferredOperation": "ADDREPLICA" > > },{ > > "name": "node_lost_trigger", > > "event": "nodeLost", > > "waitFor": "120s", > > "preferredOperation": "DELETENODE" > > }] > > } > > > > This works pretty well. But my problem is that when the a node gets > > removed, it doesn't remove all 19 replicas from this node and I have > > problems when accessing the "nodes" page: > > > > [image: enter image description here] > > <https://i.stack.imgur.com/QyJrY.png> > > > > In the logs, this exception occurs: > > > > Operation deletenode > failed:java.util.concurrent.RejectedExecutionException: Task > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$45/1104948431@467049e2 > rejected from > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@773563df[Running, > pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 1] > > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) > > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194) > > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) > > at > org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteCore(DeleteReplicaCmd.java:276) > > at > org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:95) > > at > org.apache.solr.cloud.api.collections.DeleteNodeCmd.cleanupReplicas(DeleteNodeCmd.java:109) > > at > org.apache.solr.cloud.api.collections.DeleteNodeCmd.call(DeleteNodeCmd.java:62) > > at > org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:292) > > at > org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:496) > > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > Problem description > > > > So, the problem is that it only has a pool size of 10, of which 10 are > > busy and nothing gets queued (synchronous execution). In fact, it really > > only removed 10 replicas and the other 9 replicas stayed there. When > > manually sending the API command to delete this node it works fine, since > > Solr only needs to remove the remaining 9 replicas and everything is good > > again. > > Question > > > > How can I either increase this (small) thread pool size and/or activate > > queueing the remaining deletion tasks? Another solution might be to retry > > the failed task until it succeeds. > > > > Using Solr 7.7.1 on Ubuntu Server installed with the installation script > > from Solr (so I guess it's using Jetty?). > > > > Thanks for your help! > > > -- Regards, Shalin Shekhar Mangar.