Varun Thacker created SOLR-12470: ------------------------------------ Summary: Search Rate Trigger created more than 3 replicas Key: SOLR-12470 URL: https://issues.apache.org/jira/browse/SOLR-12470 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Reporter: Varun Thacker Attachments: 4_docs.json, bug_report.txt, graph_view.png, multiple_replicas.zip, system_docs.json
Here's the trigger that I created . At this point the collection was one shard and one replica ( on node3 ) {code:java} curl -X POST -H 'Content-type:application/json' --data-binary '{ "set-trigger": { "name" : "search_rate_trigger", "event" : "searchRate", "collection" : "test_rate_trigger", "rate" : 1.0, "waitFor" : "1m", "enabled" : true, "actions" : [ { "name" : "compute_plan", "class": "solr.ComputePlanAction" }, { "name" : "execute_plan", "class": "solr.ExecutePlanAction" } ] } }' http://localhost:8983/solr/admin/autoscaling{code} I also had a trigger listener setup as I was testing the listener feature {code:java} curl -X POST -H 'Content-type:application/json' --data-binary '{ "set-listener": { "name": "search_rate_listener", "trigger": "search_rate_trigger", "stage": ["STARTED", "ABORTED", "SUCCEEDED", "FAILED"], "class": "solr.SystemLogListener" } }' http://localhost:8983/solr/admin/autoscaling{code} I ran a script to fire queries at every 100ms . The index didn't have any docs so it's a simple match all query {code:java} while [ 1 ] do curl -s "http://localhost:8984/solr/test_rate_trigger/select/?q=*:*" > /dev/null sleep .1 done{code} After a few minutes I see 4 replicas being created. Attaching logs from all 4 nodes. It should be fairly easy to reproduce with the above mentioned steps Also attaching all the docs from the .system collection for reference Here's another interesting this I noticed. I re-created the setup but this time removed the execute_plan part Now every 1 min the compute plan action tries to create 3 replicas . Why I found this interesting is that it was trying to create two replicas on the same node Does this look like a separate bug? {code:java} INFO - 2018-06-08 03:41:32.586; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0 INFO - 2018-06-08 03:41:40.909; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=1 INFO - 2018-06-08 03:41:40.932; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT INFO - 2018-06-08 03:41:40.933; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:41:40.934; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:41:40.934; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9184331 sessionWrapper.createTime 9184324085271, this.sessionWrapper.createTime 9184324085271 INFO - 2018-06-08 03:42:32.604; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0 INFO - 2018-06-08 03:42:41.525; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=0 INFO - 2018-06-08 03:42:41.559; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT INFO - 2018-06-08 03:42:41.560; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:42:41.560; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:42:41.561; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9244959 sessionWrapper.createTime 9244956725861, this.sessionWrapper.createTime 9244956725861 INFO - 2018-06-08 03:43:32.622; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0 INFO - 2018-06-08 03:43:42.158; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=1 INFO - 2018-06-08 03:43:42.178; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT INFO - 2018-06-08 03:43:42.180; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:43:42.181; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:43:42.181; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9305581 sessionWrapper.createTime 9305577119413, this.sessionWrapper.createTime 9305577119413 INFO - 2018-06-08 03:44:32.642; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0 INFO - 2018-06-08 03:44:42.759; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=0 INFO - 2018-06-08 03:44:42.778; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9366182 sessionWrapper.createTime 9366178796748, this.sessionWrapper.createTime 9366178796748{code} Thirdly, for this above mentioned test I started observing the .system collection . Here is a query that I used to get the documents created from the first time the listener kicked in {code:java} http://localhost:8983/solr/.system/select?fq=source_s:SystemLogListener&q=*:*&rows=4&sort=timestamp%20asc{code} My first expectation was that I'd see 3 docs but I saw 4 docs. Curious why it's 4 ( the docs are attached as 4_docs.json ) My intention here is to remove the system log listener with an http listener here I wanted to understand should I be looking out for 4 events or 3 The first reaction here is it's a minor bug hence I'm putting it as part of this jira Happy to break it up into smaller Jiras once I hear back if these are valid issues. This test was run against master -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org