On Thu, 2019-11-28 at 18:04 +0700, Tellier Benoit wrote:
> Thanks for the reply!
> 
> Regarding byte Array usage I cannot more than share your views.
> 
> Work would be needed in my opinion to have the blobStore used in
> place
> of `Message` byte fields.
> 
> However, as much as I do care about it, memory usage is not my
> concern
> of the day.
> 
> My concern of the day is the fact **To read fast JMAP property I need
> slow full message loading**, and this is what my proposition
> addresses.

First, my explanation about byte array usage was in the _Context_
section, no more. I don't expect you to work on this problem right now.

Second, the reason why I talked about it is that it's another solution
for the same problem: you read more data than needed thus this code
path is slow.

> 
> I encourage you to split your **James usage of byte arrays**
> discussion
> in a separated thread (let's discuss this! but let's honor this with
> it's standalone discussions :-))

I don't know if there's anything to discuss right now. I don't expect
Linagora teams to work on this specific topic in the forseable future.

> On 28/11/2019 17:26, Matthieu Baechler wrote:
> > Hi,
> > 
> > On Thu, 2019-11-28 at 15:51 +0700, Tellier Benoit wrote:
> > > Hello all,
> > > 
> > > ## Context
> > > 
> > > We are working on JMAP, and EMail::hasAttachments metadata is
> > > listed
> > > as
> > > a fast property.
> > > 
> > > [...]
> > 
> > I'm glad you trigger this discussion because it's a very important
> > one
> > if we want to have a fast server in the future.
> > 
> > I will share my analysis regarding James usage of byte arrays.
> > 
> > [...]
> > 
> > And it leads me to the actual topic of this email: of course we
> > need to
> > compute as many things as possible to prevent loading the raw data.
> 
> Let's not confuse things.
> 
> I'm not speaking of computing anything here.

Well, technically, hasAttachment being a data derived from some other,
it means we compute it at some point. My argument was: if we need it
often (and based on your explanation, that's the case) then we should
compute it and store it. My sentence was maybe too general but there's
no confusion for me.

> I'm just speaking of better structuring already available data.
> 
> > > ## Proposal
> > > 
> > > Introduce a new POJO: MessageAttachmentMetadata (mailbox-api)
> > >  - id
> > >  - name
> > >  - cid
> > >  - isInline
> > >  - size
> > >  - type
> > > 
> > >  - Message (mailbox-store) & MessageResult (mailbox-api) SHOULD
> > > return
> > > MessageAttachmentMetadata NOT MessageAttachment. Thus these
> > > metadata
> > > will be added at the FetchGroup.profile.MINIMAL.
> > 
> > Do we always need attachments information?
> 
> Of course not.
> 
> We can think about a separate FetchGroup / FetchType read level if
> that
> is an issue.
> 
> I want to bring your attention to our current usage:
>  - Cassandra is always reading these metadata, which can be cheaply
> coputed.

Reading a list of MessageAttachmentMetadata means it's probably not
done with a single query (I think we don't have a projection for that).

I'm a bit concerned that the minimal profile always read attachments
metadata, we should consider adding some profiles.

>  - Memory has it for free
>  - Over implementations do lazy load this from the full content (and
> parse it) - maildir + jpa.
> 
> Which mean that the impact of this decision is limited.

But we would probably miss an opportunity to be faster, right?

> > > We need to port "scanning search" to do on the fly message
> > > parsing. 
> > 
> > Not sure to understand this sentence
> 
> Scanning search iterates messages. Upon **attachment** criterion,
> read
> the message full content and parse it to retrieve attachment list
> (that
> might not be stored with some implementation).
> 
> Did I explained it better?

I somehow understand what Scanning search does but I still don't
understand what we need to change.

> > > This
> > > is OK as:
> > >  - memory-guice is not intending for production usage, no need to
> > > be
> > > performant
> > >  - Usage of scanning search is not exposed as a product
> > >  - jpa and maildir do not store attachment so recomputation is
> > > the
> > > current behaviour.
> > > 
> > > ## Consequences
> > > 
> > > JMAP Email::hasAttachment property would rely on
> > > FetchGroup.Profile.MINIMAL, allowing the implementation of
> > > https://github.com/apache/james-project/blob/master/src/adr/0013-precompute-jmap-preview.md
> > > 
> > > 
> > > JMAP GetMessages with full profile will read 2 time less data
> > > allowing
> > > both a cost and performance improvment.
> > > 
> > > Note that all caller reading full message will also benefit from
> > > these
> > > changes (IMAP fetch, mailbox backup, review recomputation)
> > > 
> > > ## Alternative
> > > 
> > > We could merge MessageAttachment & Attachment together however
> > > this
> > > would lead to significant datastructure re-arranging for no
> > > behavioural
> > > gains and just a slightly more lean API.
> > > 
> > > Thus I propose not to takle this now.
> > 
> > I'm not sure to understand this alternative.
> 
> That's not important here. I'm saying there is a 1 to 1 replationship
> between Attachment and MessageAttachment POJOs (as I recall it this
> is a
> vestige of content deduplication at the attachment level).
> 
> I see that we can get rid of Attachment POJO, but that would require
> (cassandra) storage re-structuration. For API gains.

API is more important than implementation as far as I'm concerned. We
can postpone things because we don't want to invest right now but if
it's not an investment issue, we should adapt the storage to the better
API.

> > What about this one: remove bytes from Attachment and always call
> > BlobStore when you need to read the raw data (or put it in a temp
> > file
> > for inbound emails)
> 
> The issue with this is that BlobStore is an implementation details of
> mailbox-* and is not part of the API.
> 
> Sure, that's achievable with some indirection:
> 
> ```
> public interface AttachmentMapper extends Mapper {
>    //...
> 
>    void storeAttachmentForOwner(Attachment attachment, Username owner
> byte[] payload) throws MailboxException;
> 
>    byte[] loadContent(AttachmentId attachmentId)
> 
>    //..
> }
> ```
> 
> Doable too.
> 
> To preserve attachment search capabilities for back-end not
> supporting
> attachment storage, additional levels of indirection might be
> required
> (and could be tricky).


All backends support storing attachment as far as I know.
I guess you think "no using a BlobStore".
And don't see what we be harder in this case, could you elaborate.

Cheers,

-- 
Matthieu Baechler


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to