On 5/7/2018 5:05 PM, Jay Potharaju wrote: > There are some deletes by query. I have not had any issues with DBQ, > currently have 5.3 running in production.
Here's the big problem with DBQ. Imagine this sequence of events with these timestamps: 13:00:00: A commit for change visibility happens. 13:00:00: A segment merge is triggered by the commit. (It's a big merge that takes exactly 3 minutes.) 13:00:05: A deleteByQuery is sent. 13:00:15: An update to the index is sent. 13:00:25: An update to the index is sent. 13:00:35: An update to the index is sent. 13:00:45: An update to the index is sent. 13:00:55: An update to the index is sent. 13:01:05: An update to the index is sent. 13:01:15: An update to the index is sent. 13:01:25: An update to the index is sent. {time passes, more updates might be sent} 13:03:00: The merge finishes. Here's what would happen in this scenario: The DBQ and all of the update requests sent *after* the DBQ will block until the merge finishes. That means that it's going to take up to three minutes for Solr to respond to those requests. If the client that is sending the request is configured with a 60 second socket timeout, which inter-node requests made by Solr are by default, then it is going to experience a timeout error. The request will probably complete successfully once the merge finishes, but the connection is gone, and the client has already received an error. Now imagine what happens if an optimize (forced merge of the entire index) is requested on an index that's 50GB. That optimize may take 2-3 hours, possibly longer. A deleteByQuery started on that index after the optimize begins (and any updates requested after the DBQ) will pause until the optimize is done. A pause of 2 hours or more is a BIG problem. This is why deleteByQuery is not recommended. If the deleteByQuery were changed into a two-step process involving a query to retrieve ID values and then one or more deleteById requests, then none of that blocking would occur. The deleteById operation can run at the same time as a segment merge, so neither it nor subsequent update requests will have the significant pause. From what I understand, you can even do commits in this scenario and have changes be visible before the merge completes. I haven't verified that this is the case. Experienced devs: Can we fix this problem with DBQ? On indexes with a uniqueKey, can DBQ be changed to use the two-step process I mentioned? Thanks, Shawn