Benoit Tellier created JAMES-4198:
-------------------------------------
Summary: Optimize message write path and DB footprint
Key: JAMES-4198
URL: https://issues.apache.org/jira/browse/JAMES-4198
Project: James Server
Issue Type: Improvement
Components: cassandra, IMAPServer, jpa, mailbox
Reporter: Benoit Tellier
h3. Why ?
- We needlessly store information that is either in the header or easy to
opbtain in Messagev3 properties. This is taking space on Cassandra...
Tables (not tiered, with per message entries) for 66 million emails (3 nodes
RF=3) :
- `messagev3` table 17 GB
- `imapuidtable` table 10GB
- `messageidtable` table 7 GB
- `email_query_view_received_at` table 2 GB
- `firstunseen` table 287 MB
- `thread_lookup_3` 6GB
We see a footprint of ~2-3KB (replicated, tiered) per message.
We can expect a 33% reduction of messagev3 size by removing the content
description and properties field. Translating to a 10-13% space saving. At
scale for 10 billion messages this means 20TB -> 18TB... Sad for something that
is useful only for IMAP FETCH BODYSTRUCTURE and could be easily recomputed.
- We count line with unoptimized input stream for each message with content
type `text/*` reading byte per byte (PERF KILLER!) while it is useful only upon
IMAP FETCH BODYSTRUCTURE - we'd rather move it at read time.
- At last MessageStorer calls parsing for each and every message. We could
easily cary other (after removing PropertyBuilder) the content type and trigger
this expensive parsing IF and only IF content type is `multipart/*` or
`content-disposition` is `attachment` in main headers, saving CPU on the write
path.
h3. How ?
Remove propertyBuider from Message POJOs.
IMAP FETCH BODYSTRUCTURE operates on full content: we can easily recompute this
in MessageResult POJO when (and only when) needed.
Take care to still carry other contentType and ContentDescription for the
unrelated but connex and interesting MessageStorer optimization.
h3. Expected gains
Significant CPU gains for `text/*` message APPEND / reception
~ 10% data reduction on Cassandra
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]