Op vr 23 sep. 2022 om 18:17 schreef Shawn Heisey <apa...@elyograg.org.invalid>:
> On 9/23/22 09:51, gnandre wrote: > > Is there a way to make atomic indexing default? > > > > Say, even if some clients send non-atomic indexing requests, it should > get > > converted to atomic indexing requests on Solr end, is that possible? > > > > I am asking because we usually run into the following issue: > > 1. Client A is the major contributor of almost all the fields of a Solr > > document. This is non-atomic indexing. > > 2. Client B contributes some additional fields to the same document and > > does this with atomic indexing. > > 3. If Client A indexes again, the fields populated by Client B are wiped > > out. > > > > If we make all indexing atomic indexing on Solr end then we won't run > into > > this problem (except in a rare case where Client A deletes the document > > then indexes it back, this is fine and we can deal with it because it is > > rare) > > We would be surprising a LOT of users if we did that. Right now they > can simply reindex a document to delete fields that were indexed before > but shouldn't be there. If we made atomic indexing the default, we > would definitely get complaints about the fact that these fields did not > get removed. > > And what about users that have a schema that is not appropriate for > atomic indexing? Quite a lot of users, me included, have fields that > are indexed but not stored and have no docValues. I can guarantee you > that if we made atomic indexing the default, that users would assume > that all their existing fields will be preserved, and that might not be > the case. > > It sounds like what you should do is have client A be aware that a > document might have changes done after they indexed it, and they should > do a check to see whether a doc already exists, and if it does, change > their indexing to atomic. > > It is extremely problematic to have one index be built by two different > entities in this way. Maybe instead you should have separate indexes > for each client and use Solr's join capability to combine the info from > both indexes into one result. Just be aware that Solr's join capability > will NOT do everything a relational database expert might expect. > > Thanks, > Shawn > > Client A can use Optimistic Concurrency <https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#optimistic-concurrency> to check if a document has been updated by client B. Use the /get handler from client A to get the _version_ after indexing and store it locally. Use that _version_ for further updates from client A to check if the document was changed by client B. Thomas