Thanks for the reply! Regarding byte Array usage I cannot more than share your views.
Work would be needed in my opinion to have the blobStore used in place of `Message` byte fields. However, as much as I do care about it, memory usage is not my concern of the day. My concern of the day is the fact **To read fast JMAP property I need slow full message loading**, and this is what my proposition addresses. I encourage you to split your **James usage of byte arrays** discussion in a separated thread (let's discuss this! but let's honor this with it's standalone discussions :-)) On 28/11/2019 17:26, Matthieu Baechler wrote: > Hi, > > On Thu, 2019-11-28 at 15:51 +0700, Tellier Benoit wrote: >> Hello all, >> >> ## Context >> >> We are working on JMAP, and EMail::hasAttachments metadata is listed >> as >> a fast property. >> >> [...] > > I'm glad you trigger this discussion because it's a very important one > if we want to have a fast server in the future. > > I will share my analysis regarding James usage of byte arrays. > > [...] > > And it leads me to the actual topic of this email: of course we need to > compute as many things as possible to prevent loading the raw data. Let's not confuse things. I'm not speaking of computing anything here. I'm just speaking of better structuring already available data. > >> ## Proposal >> >> Introduce a new POJO: MessageAttachmentMetadata (mailbox-api) >> - id >> - name >> - cid >> - isInline >> - size >> - type >> >> - Message (mailbox-store) & MessageResult (mailbox-api) SHOULD >> return >> MessageAttachmentMetadata NOT MessageAttachment. Thus these metadata >> will be added at the FetchGroup.profile.MINIMAL. > > Do we always need attachments information? Of course not. We can think about a separate FetchGroup / FetchType read level if that is an issue. I want to bring your attention to our current usage: - Cassandra is always reading these metadata, which can be cheaply coputed. - Memory has it for free - Over implementations do lazy load this from the full content (and parse it) - maildir + jpa. Which mean that the impact of this decision is limited. > >> We need to port "scanning search" to do on the fly message parsing. > > Not sure to understand this sentence Scanning search iterates messages. Upon **attachment** criterion, read the message full content and parse it to retrieve attachment list (that might not be stored with some implementation). Did I explained it better? > >> This >> is OK as: >> - memory-guice is not intending for production usage, no need to be >> performant >> - Usage of scanning search is not exposed as a product >> - jpa and maildir do not store attachment so recomputation is the >> current behaviour. >> >> ## Consequences >> >> JMAP Email::hasAttachment property would rely on >> FetchGroup.Profile.MINIMAL, allowing the implementation of >> https://github.com/apache/james-project/blob/master/src/adr/0013-precompute-jmap-preview.md >> >> >> JMAP GetMessages with full profile will read 2 time less data >> allowing >> both a cost and performance improvment. >> >> Note that all caller reading full message will also benefit from >> these >> changes (IMAP fetch, mailbox backup, review recomputation) >> >> ## Alternative >> >> We could merge MessageAttachment & Attachment together however this >> would lead to significant datastructure re-arranging for no >> behavioural >> gains and just a slightly more lean API. >> >> Thus I propose not to takle this now. > > I'm not sure to understand this alternative. That's not important here. I'm saying there is a 1 to 1 replationship between Attachment and MessageAttachment POJOs (as I recall it this is a vestige of content deduplication at the attachment level). I see that we can get rid of Attachment POJO, but that would require (cassandra) storage re-structuration. For API gains. > > What about this one: remove bytes from Attachment and always call > BlobStore when you need to read the raw data (or put it in a temp file > for inbound emails) The issue with this is that BlobStore is an implementation details of mailbox-* and is not part of the API. Sure, that's achievable with some indirection: ``` public interface AttachmentMapper extends Mapper { //... void storeAttachmentForOwner(Attachment attachment, Username owner byte[] payload) throws MailboxException; byte[] loadContent(AttachmentId attachmentId) //.. } ``` Doable too. To preserve attachment search capabilities for back-end not supporting attachment storage, additional levels of indirection might be required (and could be tricky). > > Looking at the code, it should not be that hard to implement and we > could monitor BlobStore usage by implementing something in glowroot. > That would be a first step toward a no-byte-array strategy. > > [...] > > Cheers, > --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org