Benoit Tellier created JAMES-3777:
-------------------------------------

             Summary: Event sourcing - O[n²] storage for filters
                 Key: JAMES-3777
                 URL: https://issues.apache.org/jira/browse/JAMES-3777
             Project: James Server
          Issue Type: Improvement
    Affects Versions: 3.7.0
            Reporter: Benoit Tellier


h2. Symptoms

```  
Largest Partitions:     
[FilteringRule/x...@linagora.com] 44952069 (45.0 MB)
```

Every time this guy sends an email we load 45 MB of JSON, which can yield  big 
performance impact.

h2. What?

We implemented event sourcing with reset. Given rule A, B if we want to persist 
rule C then we store a "reset to A, B, C" event.

So, if we want to store N filter, the resulting structure with have a size 
depending of O[n²] which proves to be barely sustainable.

h2. How to fix

Coming back to O[n] likely would help.

Implement filter addition / removal both at the storage and JMAP layer

h2.  Alternatives

h3. The read projection

Currently we are loading the full history, building the aggregate each time we 
process emails, and performing SERIAL lightweight transactions. Which is very 
common. And impactfull.

It would be possible to introduce  read projection, maintained by a subscriber 
to the event source, that would allow efficiently reading current filters for a 
given user.

This mean the history would be loaded only upon writes, which are rare.

Impact: yet another table. Also the solution is local to this usage and does 
not help other event sourcing usages.

h3. Event sourcing snapshots

Augment James event sourcing implementation with a Snapshot mechanism.

Upon reading history, we would start reading available snapshots, then read the 
history from that snapshot.

Event store would be responsible of taking snapshots. Even a one change out of 
10 would do the job here.

This implies being able to serialize state. This implies an additional table 
for storing event sourcing snapshots.

My take on it: going `O[n2` -> `O[n]` will likely be a good enough mitigation 
that we don't need to grow the complexity of the event sourcing code.

On the other hand, this ewould harden event sourcing code and likely lift most 
of the limitation for adoptions on the mailboxes write path (to enforce mailbox 
name unicity constraint).

Note that both solutions are not exclusive.

h3. The dirty fix

For filters the history prior reset event can be dropped, this can be used to 
solve the immediate problem, even if it is not very clean.

h1. Proposal

 - Implement a read projection
 - Implement addition / removal patches to filtering event sourcing aggregate
 - Don't implement event sourcing snapshots now

And also... Remove the obligation to configure JMAP filtering mailet inside 
JMAP servers: after all this extension is not standard...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to