René Cordier created JAMES-3202:
-----------------------------------

             Summary: ReIndexing "filtering" for only outdated indexed data
                 Key: JAMES-3202
                 URL: https://issues.apache.org/jira/browse/JAMES-3202
             Project: James Server
          Issue Type: Improvement
            Reporter: René Cordier


*Why?*

ReIndexing is slow, and requires to read all messages in the DB, then trigger 
the full reIndexing, even when the document is not outdated.

All these document changes creates a lot of deleted documents. Lucene "marks 
them as deleted", polluting the entire index until segment merging happens (yet 
another costly operation). The less we do updates the better. To be noted that 
partial updates still leads to a full new document in Lucene, and just 
optimises bandwith + avoids reads.

*Need specification*

As an admin, I want to run a reIndex. Current sequential reactive reindexing 
reaches the speed of `21 messages/seconds` on UPN, below the mentioned 
objective of `1.000 msg/s`.

We furtermore handle `RunningOptions` allowing to specify the message rate 
attempted. See [https://github.com/linagora/james-project/pull/3394]

While it enables more parralelization, we have doubts on the fact UPN can keep 
up with the mentionned rate after mentionning the current search index 
limitation.

We thus need, given a message, get it's search index representation (at least 
for its mutable data). From this we will be able to condition the reindexing to 
outdated/non exsting data, significantly fasting up the reindexing process on 
mostly valid indexes. The admin could then mention via query parameter this 
option (carried over in running options).

*MessageSearchIndex API changes*:
{code:java}
inderface MessageSearchIndex {
   //...
   Mono<Flags> retrieveIndexedFlags(MailboxId mailboxId, MessageUid uid);
   //...
}
{code}
ElasticSearch will rely on the _GET_ verb (not search).

Unit test will be written for this new method.

ReIndexing `RunningOptions` will then carry over the option, that 
ReIndexerPerformer will need to take into account.

Sample webadmin API:
{code:bash}
curl -XPOST http://james:8000/mailboxes?action=reindex&filter=outdatedIndex
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to