Hi all,

I am updating a Solr Collection (Solr 7.3.1 in Cloud mode using SolrJ Java API) 
with requests that include both adding new documents as well as deleting 
existing ones (by query). The deletion part is meant to make sure any earlier 
revisions of the indexed source are deleted as part of the index update. This 
has worked well for a long time, but in some rare cases, there has been issues 
where the update process returns success, but the added document(s) are nowhere 
to be found in the collection.

After some investigation, I'm suspecting that there is an edge case where the 
delete query can actually overlap the documents added in the same update. 
Obviously the first suspect to look at here is the delete query, but I also had 
to start looking into what the documented semantics (if any) for the 
multi-command update API (JSON update command) actually are. I cannot find any 
documentation that seems to even touch on this subject.

I've looked through most of the online Solr documentation chapters 
(https://lucene.apache.org/solr/guide/7_3/), though only as an overview. The 
documentation detailing multi-operation JSON update requests 
(https://lucene.apache.org/solr/guide/7_3/uploading-data-with-index-handlers.html#solr-style-json
 - JSON Update Command) doesn't seem to have any details or even link to 
further reading. I've also read the javadoc for 
org.apache.solr.client.solrj.request.UpdateRequest (part of SolrJ).

Is there is a specific order in which operations in an update request will be 
executed? Is the order guaranteed for any of the possible operations (add, 
delete by id / query, optimize, commit) in a single update command? Since I 
cannot find any details, I have to assume it's undefined and that I should 
never rely on any order.

I suspect that the developers that did this part of our code either assumed it 
would always be performed in the same order or that the delete query could 
never overlap. Or perhaps it was just an oversight and we've been lucky so far.

Related: in the case where I cannot rely on the operations order in a single 
update request, is there a recommended way to do these kinds of updates 
"atomically" in a single request? Ideally, I obviously don't want the 
collection to be left in a state where the deletion has happened but not the 
additions or the other way around.

Thanks in advance,
Andreas

Reply via email to