Does anyone know with certainty how (or even if) order is evaluated when
updates are performed by batch?

Our application internally buffers solr documents for speed of ingest before
sending them to the server in chunks. The XML documents sent to the solr
server contain all documents in the order they arrived without any settings
changed from the defaults (so overwrite = true). We are careful to avoid
things like HashMaps on our side since they'd lose the order, but I can't be
certain what occurs inside Solr.

Sometimes if an object has been indexed twice for various reasons it could
appear twice in the buffer but the most up-to-date version is always last. I
have however observed instances where the first copy of the document is
indexed and differences in the second copy are missing. Does this sound
likely? And if so are there any obvious settings I can play with to get the
behavior I desire?

I looked at:
http://wiki.apache.org/solr/UpdateXmlMessages

but there is no mention of order, just the overwrite flag (which I'm unsure
how it is applied internally to an update message) and the deprecated
duplicates flag (which I have no idea about).

Would switching to SolrInputDocuments on a CommonsHttpSolrServer help? as
per http://wiki.apache.org/solr/Solrj. This is no mention of order there
either however.

Thanks to anyone who took the time to read this.

Ta,
Greg

Reply via email to