This is an automated email from the ASF dual-hosted git repository. btellier pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/james-project.git
commit d3022fd4ed0bed025b63fea7711aab7fa3485758 Author: Benoit Tellier <[email protected]> AuthorDate: Mon Apr 13 10:47:42 2020 +0700 JAMES-2997 [ADR] Separate attachment content and metadata --- ...030-separate-attachment-content-and-metadata.md | 99 ++++++++++++++++++++++ 1 file changed, 99 insertions(+) diff --git a/src/adr/0030-separate-attachment-content-and-metadata.md b/src/adr/0030-separate-attachment-content-and-metadata.md new file mode 100644 index 0000000..704d603 --- /dev/null +++ b/src/adr/0030-separate-attachment-content-and-metadata.md @@ -0,0 +1,99 @@ +# 30. Separate attachment content and metadata + +Date: 2020-04-13 + +## Status + +Accepted (lazy consensus) + +## Context + +Some mailbox implementations of James store already parsed attachments for faster retrieval. + +This attachment storage capabilities are required for two features: + + - JMAP attachment download + - JMAP message search "attachment content" criteria + +Only Memory and Cassandra backends can be relied upon as a JMAP backend. + +Other protocols relies on dynamic EML parsing to expose message subparts (IMAP) + +Here are the POJOs related to these attachments: + + - **Attachment** : holds an attachmentId, the attachment content, as well as the content type + - **MessageAttachment** : composes an attachment with its disposition within a message (cid, inline and name) + - **Message** exposes its list of MessageAttachment when it is read with FetchType Full. + - **Blob** represents some downloadable content, and can be either an attachment or a message. Blob has a byte array + payload too. + +The following classes work with the aforementioned POJOs: + + - **AttachmentMapper** and **AttachmentManager** are responsible of storing and retrieving an attachment content. + - **BlobManager** is used by JMAP to allow blob downloads. + - Mailbox search exposes attachment content related criteria. These criteria are used by the JMAP protocol. + +This organisation causes attachment content to be loaded every time a message is fully read (which happens for instance +when you open a message using JMAP) despite the fact that it is not needed, as attachments are downloadable through a +separate JMAP endpoint, their content is not attached to the JMAP message JSON. + +Also, the content being loaded "at once", we allocate memory space to store the whole attachment, which is sub-optimal. We +want to keep the consumed memory low per-message because a given server should be able to handle a high number of messages +at a given time. + +To be noted that JPA and maildir mailbox implementations do not support attachment storage. To retrieve attachments of a +message, these implementations parse the messages to extract their attachments. + +Cassandra mailbox prior schema version 4 stored attachment and its metadata in the same table, but from version 5 relies +on the blobStore to store the attachment content. + +## Decision + +Enforce cassandra schema version to be 5 from James release 3.5.0. This allows to drop attachment management prior version +5. + +We will re-organize the attachment POJOs: + + - **Attachment** should hold an attachmentId, a content type, and a size. It will no longer hold the content. The + content can be loaded from its **AttachmentId** via the **AttachmentLoader** API that the **AttachmentManager** + implements. + - **MessageAttachment** : composes an attachment with its disposition within a message (cid, inline and name) + - **Blob** would no longer hold the content as a byte array but rather a content retriever (`Supplier<InputStream>`) + - **ParsedAttachment** is the direct result of attachment parsing, and composes a **MessageAttachment** and the + corresponding content as byte array. This class is only relied upon when saving a message in mailbox. This is used as + an output of `MessageParser`. + +Some adjustments are needed on class working with attachment: + + - **AttachmentMapper** and **AttachmentManager** need to allow from an attachmentId to retrieve the attachment content + as an `InputStream`. This is done through a separate `AttachmentLoader` interface. + - **AttachmentMapper** and **AttachmentManager** need the Attachment and its content to persist an attachment + - **MessageManager** then needs to return attachment metadata as a result of Append operation. + - **InMemoryAttachmentMapper** needs to store attachment content separately. + - **MessageStorer** will take care of storing a message on the behalf of `MessageManager`. This enables to determine if + attachment should be parsed or not on an implementation aware fashion, saving attachment parsing upon writes for JPA + and Maildir. + +Maildir and JPA no longer support attachment content loading. Only the JMAP protocol requires attachment content loading, +which is not supported on top of these technologies. + +Mailbox search attachment content criteria will be supported only on implementation supporting attachment storage. + +## Consequences + +Users running Cassandra schema version prior version 5 will have to go through James release 3.5.0 to upgrade to a +version after version 5 before proceeding with their update. + +We noticed performance enhancement when using IMAP FETCH and JMAP GetMessages. Running a gatling test suite exercising +JMAP getMessages on a dataset containing attachments leads to the following observations: + + - Overall better average performance for all JMAP queries (10% global p50 improvement) + - Sharp decrease in tail latency of getMessages (x40 time faster) + +We also expect improvements in James memory allocation. + +## References + + - [Contribution on this topic](https://github.com/linagora/james-project/pull/3061). Also contains benchmark for this + proposal. + - [JIRA](https://issues.apache.org/jira/browse/JAMES-2997) \ No newline at end of file --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
