[
https://issues.apache.org/jira/browse/JAMES-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714014#comment-17714014
]
Benoit Tellier commented on JAMES-3777:
---------------------------------------
Edito of 19/04...
=> Increments in place with massive gains CF
https://github.com/apache/james-project/pull/1530#issuecomment-1514138889
Before after 14 minutes I failed at creating 1000 rules.
After this is conducted out in 1 minute.
=> Snapshot
By saving the snapshot as an event (reset of the rules) and adding a static
column for tracking the lastest snapshot and skipping the history before.
This further enhances the performance of filtering: 5000 rules created in
6min18 instead of 31min42 (last event creation took 124ms instead of 954ms)
I will propose snapshot in another pull request.
> Event sourcing - O[n²] storage for filters
> ------------------------------------------
>
> Key: JAMES-3777
> URL: https://issues.apache.org/jira/browse/JAMES-3777
> Project: James Server
> Issue Type: Improvement
> Affects Versions: 3.7.0
> Reporter: Benoit Tellier
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> h2. Symptoms
> ```
> Largest Partitions:
> [FilteringRule/[email protected]] 44952069 (45.0 MB)
> ```
> Every time this guy sends an email we load 45 MB of JSON, which can yield
> big performance impact.
> h2. What?
> We implemented event sourcing with reset. Given rule A, B if we want to
> persist rule C then we store a "reset to A, B, C" event.
> So, if we want to store N filter, the resulting structure with have a size
> depending of O[n²] which proves to be barely sustainable.
> h2. How to fix
> Coming back to O[n] likely would help.
> Implement filter addition / removal both at the storage and JMAP layer
> h2. Alternatives
> h3. The read projection
> Currently we are loading the full history, building the aggregate each time
> we process emails, and performing SERIAL lightweight transactions. Which is
> very common. And impactfull.
> It would be possible to introduce read projection, maintained by a
> subscriber to the event source, that would allow efficiently reading current
> filters for a given user.
> This mean the history would be loaded only upon writes, which are rare.
> Impact: yet another table. Also the solution is local to this usage and does
> not help other event sourcing usages.
> h3. Event sourcing snapshots
> Augment James event sourcing implementation with a Snapshot mechanism.
> Upon reading history, we would start reading available snapshots, then read
> the history from that snapshot.
> Event store would be responsible of taking snapshots. Even a one change out
> of 10 would do the job here.
> This implies being able to serialize state. This implies an additional table
> for storing event sourcing snapshots.
> My take on it: going `O[n2` -> `O[n]` will likely be a good enough mitigation
> that we don't need to grow the complexity of the event sourcing code.
> On the other hand, this ewould harden event sourcing code and likely lift
> most of the limitation for adoptions on the mailboxes write path (to enforce
> mailbox name unicity constraint).
> Note that both solutions are not exclusive.
> h3. The dirty fix
> For filters the history prior reset event can be dropped, this can be used to
> solve the immediate problem, even if it is not very clean.
> h1. Proposal
> - Implement a read projection
> - Implement addition / removal patches to filtering event sourcing aggregate
> - Don't implement event sourcing snapshots now
> And also... Remove the obligation to configure JMAP filtering mailet inside
> JMAP servers: after all this extension is not standard...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]