Re: Atomic indexing as default indexing

Thomas Corthals Fri, 23 Sep 2022 09:39:43 -0700

Op vr 23 sep. 2022 om 18:17 schreef Shawn Heisey
<apa...@elyograg.org.invalid>:


> On 9/23/22 09:51, gnandre wrote:
> > Is there a way to make atomic indexing default?
> >
> > Say, even if some clients send non-atomic indexing requests, it should
> get
> > converted to atomic indexing requests on Solr end, is that possible?
> >
> > I am asking because we usually run into the following issue:
> > 1. Client A is the major contributor of almost all the fields of  a Solr
> > document. This is non-atomic indexing.
> > 2. Client B contributes some additional fields to the same document and
> > does this with atomic indexing.
> > 3. If Client A indexes again, the fields populated by Client B are wiped
> > out.
> >
> > If we make all indexing atomic indexing on Solr end then we won't run
> into
> > this problem (except in a rare case where Client A deletes the document
> > then indexes it back, this is fine and we can deal with it because it is
> > rare)
>
> We would be surprising a LOT of users if we did that.  Right now they
> can simply reindex a document to delete fields that were indexed before
> but shouldn't be there.  If we made atomic indexing the default, we
> would definitely get complaints about the fact that these fields did not
> get removed.
>
> And what about users that have a schema that is not appropriate for
> atomic indexing?  Quite a lot of users, me included, have fields that
> are indexed but not stored and have no docValues.  I can guarantee you
> that if we made atomic indexing the default, that users would assume
> that all their existing fields will be preserved, and that might not be
> the case.
>
> It sounds like what you should do is have client A be aware that a
> document might have changes done after they indexed it, and they should
> do a check to see whether a doc already exists, and if it does, change
> their indexing to atomic.
>
> It is extremely problematic to have one index be built by two different
> entities in this way.  Maybe instead you should have separate indexes
> for each client and use Solr's join capability to combine the info from
> both indexes into one result.  Just be aware that Solr's join capability
> will NOT do everything a relational database expert might expect.
>
> Thanks,
> Shawn
>
>
Client A can use Optimistic Concurrency
<https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#optimistic-concurrency>
to check if a document has been updated by client B.

Use the /get handler from client A to get the _version_ after indexing and
store it locally. Use that _version_ for further updates from client A to
check if the document was changed by client B.

Thomas

Re: Atomic indexing as default indexing

Reply via email to