René Cordier created JAMES-3202:
-----------------------------------
Summary: ReIndexing "filtering" for only outdated indexed data
Key: JAMES-3202
URL: https://issues.apache.org/jira/browse/JAMES-3202
Project: James Server
Issue Type: Improvement
Reporter: René Cordier
*Why?*
ReIndexing is slow, and requires to read all messages in the DB, then trigger
the full reIndexing, even when the document is not outdated.
All these document changes creates a lot of deleted documents. Lucene "marks
them as deleted", polluting the entire index until segment merging happens (yet
another costly operation). The less we do updates the better. To be noted that
partial updates still leads to a full new document in Lucene, and just
optimises bandwith + avoids reads.
*Need specification*
As an admin, I want to run a reIndex. Current sequential reactive reindexing
reaches the speed of `21 messages/seconds` on UPN, below the mentioned
objective of `1.000 msg/s`.
We furtermore handle `RunningOptions` allowing to specify the message rate
attempted. See [https://github.com/linagora/james-project/pull/3394]
While it enables more parralelization, we have doubts on the fact UPN can keep
up with the mentionned rate after mentionning the current search index
limitation.
We thus need, given a message, get it's search index representation (at least
for its mutable data). From this we will be able to condition the reindexing to
outdated/non exsting data, significantly fasting up the reindexing process on
mostly valid indexes. The admin could then mention via query parameter this
option (carried over in running options).
*MessageSearchIndex API changes*:
{code:java}
inderface MessageSearchIndex {
//...
Mono<Flags> retrieveIndexedFlags(MailboxId mailboxId, MessageUid uid);
//...
}
{code}
ElasticSearch will rely on the _GET_ verb (not search).
Unit test will be written for this new method.
ReIndexing `RunningOptions` will then carry over the option, that
ReIndexerPerformer will need to take into account.
Sample webadmin API:
{code:bash}
curl -XPOST http://james:8000/mailboxes?action=reindex&filter=outdatedIndex
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]