Benoit Tellier created JAMES-3266:
-------------------------------------

             Summary: Distributed James: make ElasticSearch indexing optional?
                 Key: JAMES-3266
                 URL: https://issues.apache.org/jira/browse/JAMES-3266
             Project: James Server
          Issue Type: New Feature
          Components: elasticsearch, guice
    Affects Versions: master
            Reporter: Benoit Tellier
             Fix For: 3.6.0


{code:java}
Raphaël Ouazana-Sustowski Thu, 11 Jun 2020 09:02:24 -0700

Hi,

Here is a proposal to make ElasticSearch optional in our distributed 
product/flavor/server.


Comments are welcome.


## Why?

Some people have expressed the need of using a distributed James without 
ElasticSearch:

- in some comment here: https://issues.apache.org/jira/browse/JAMES-3086

- one of our customers plan to deploy a distributed James server for serving 
POP3 encrypted emails. This deployment does not rely on searching features. 
However as part of current Distributed James server he is forced to rely on 
ElasticSearch email indexing.


This results in wasted resources as maintaining an ElasticSearch cluster to 
keep up with the volume is expensive. Maintaining an ElasticSearch cluster when 
not needed is costly at several levels:

- cost of infrastructure to deploy it
- cost of people having to maintain it
- performance cost on James to unnecessarily index data

## How ?

Scanning search is a search implementation that is running on top of any 
mailbox implementation, even distributed ones and does not require to index 
data.

Scanning Search is tested both at the component level (unit test) but also 
passes IMAP (MPT) tests on top of Cassandra implementation, as well as JMAP 
memory tests, thus delivers correct results. Of course it does not support full 
text search.

We should allow Distributed James to optionally rely on scanning search instead 
of ElasticSearch.

 - Scanning search should be advised for deployments rarely searching data

 - ElasticSearch should be advised when search is frequent or requires high 
performance

We could use module choosing [1] to choose between scanning search and 
ElasticSearch.

To be noted that scanning search introduces no other dependencies as it is part 
of mailbox-store thus causes no risk of library clashes.

To be noted also that metric collection and log collection using ElasticSearch 
is unaffected.

## Alternative

The alternative would be to build a different product/flavor/server than the 
distributed one, where the only difference with the distributed one is that 
indexing will rely on scanning instead of ElasticSearch.

The maintenance cost of such a product/flavor/server is higher than of a 
configuration option (Docker images to release, time and energy to run 
integration tests on it).

Such a product/flavor is hard to brand because even if it answers a need, it is 
not so far of the distributed one, and does not answer needs that are very far 
from it neither.

The advantage is that is would allow to more fine tune this solution to answer 
to the exact needs.


## Work in Progress

See pull request: https://github.com/linagora/james-project/pull/3425

Regards,

Raphaël.

[1] 
https://github.com/apache/james-project/blob/master/src/adr/0036-against-use-of-conditional-statements-in-guice-modules.md

{code}




Mailing list thread: 
https://www.mail-archive.com/[email protected]/msg66319.html

PR: https://github.com/linagora/james-project/pull/3425



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to