On 7/21/14, 4:50 PM, "Shawn Heisey" <s...@elyograg.org> wrote:
>On 7/21/2014 5:37 PM, Jeff Wartes wrote: >> I¹d like to ensure an extended warmup is done on each SolrCloud node >>prior to that node serving traffic. >> I can do certain things prior to starting Solr, such as pump the index >>dir through /dev/null to pre-warm the filesystem cache, and post-start I >>can use the ping handler with a health check file to prevent the node >>from entering the clients load balancer until I¹m ready. >> What I seem to be missing is control over when a node starts >>participating in queries sent to the other nodes. >> >> I can, of course, add solrconfig.xml firstSearcher queries, which I >>assume (and fervently hope!) happens before a node registers itself in >>ZK clusterstate.json as ready for work, but that doesn¹t scale so well >>if I want that initial warmup to run thousands of queries, or run them >>with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive >>to the size. >> >> Any ideas, or corrections to my assumptions? > >I think that firstSearcher/newSearcher (and making sure useColdSearcher >is set to false) is going to be the only way you can do this in a way >that's compatible with SolrCloud. If you were doing manual distributed >search without SolrCloud, you'd have more options available. > >If useColdSearcher is set to false, that should keep *everything* from >using the searcher until the warmup has finished. I cannot be certain >that this is the case, but I have some reasonable confidence that this >is how it works. If you find that it doesn't behave this way, I'd call >it a bug. > >Thanks, >Shawn Thanks for the quick reply. Since distributed search latency is the max of the shard sub-requests, I¹m trying my best to minimize any spikes in cluster latency due to node restarts. I double-checked useColdSearcher was false, but the doc says this means requests ³block until the first searcher is done warming², which translates pretty clearly to ³latency spike². The more I think about it, the more worried I am that a node might indeed register itself in live_nodes and get distributed requests before it¹s got a searcher to work with. *Especially* if I have lots of serial firstSearcher queries. I¹ll look through the code myself tomorrow, but if anyone can help confirm/deny the order of operations here, I¹d appreciate it.