Hi everyone,

In light of benoit's latest comments on the related PRs, I want to report
that I spent some time setting up an infra to do benchmarks against a full
server (mx only) I'm not an ops at core, I don't know james all that well
and spare time is limited and I got sidetracked quite a few times, so I
still haven't run the benchmarks but I'm getting there.

I am strongly in favor of integrating at least #282.

#285 is nice to have for me but I won't have time to investigate the config
file option rapidly.

Jean

Le jeu. 7 janv. 2021 à 18:04, Raphaël Ouazana <rouaz...@linagora.com> a
écrit :

> Hi,
>
> I completely agree with what Matthieu said.
>
> Correctness is needed, so we have to fix this issue.
>
> But before merging on master and/or deploying this version I would
> expect some Gatling runs to show which potential performance decrease we
> can expect on real use cases.
>
> Thanks,
>
> Raphaël.
>
> Le 04/01/2021 à 14:42, Matthieu Baechler a écrit :
> > Hi there,
> >
> > Thank you for bringing this topic to the mailing-list.
> >
> > To me safety and correctness is much more important than raw
> > performance so I would like the always-copy implementation to replace
> > the COW version.
> >
> > However, keep in mind that the JMH benchmark figures did not told the
> > full story about the consequences of this change and be ready to
> > experience slower real-world performances.
> >
> > Cheers,
> >
> > -- Matthieu Baechler
> >
> > On Tue, 2020-12-29 at 12:54 +0700, Tellier Benoit wrote:
> >> Hello there!
> >>
> >> We had been discussing on GitHub recently about an optimization in
> >> james
> >> core around the usage of MimeMessage.
> >>
> >> Javax.mail MimeMessage is currently used to represent a message of an
> >> email as part of the mail processing in James. It is part of the Mail
> >> interface (mailet-api).
> >>
> >> As Mail envelope is composed of several recipients, mail related
> >> operations are performed once for all these recipients (we enqueue
> >> the
> >> mail one time, we strip bcc one time etc...). Troubles arise when we
> >> need different behaviors as part of mail processing across recipients
> >> (think remote recipients, that needs there mail to be relayed, versus
> >> local recipients that needs to be locally delivered). The email get's
> >> duplicated (in MatcherSplitter) and the processing will then be
> >> distinct
> >> for both entities. The underlying MimeMessage may - or may not be
> >> modified.
> >>
> >> In order to prevent MimeMessage duplication in the event the
> >> underlying
> >> MimeMessage is not modified, a Copy On Write mechanism was introduced
> >> (I
> >> guess... Sorry, I was not there yet).
> >>
> >> Upon his CI effort, Jean Helou with the help of Matthieu Baechler
> >> made
> >> he unpleasant finding that this was not thread safe, that was leading
> >> to
> >> build instabilities. The mailet processing happens in Camel, which is
> >> multi-threaded. Concurrency issues arised between modifications, and
> >> message disposal, when a same MimeMessage instance was shared. [1]
> >>
> >> A first effort was to try to achieve thread-safety, which leaded to a
> >> brittle double reetrant read-write locks in order to govern data
> >> access.
> >> However, another performance enhancement bypassed these lock
> >> mechanism
> >> (MimeMessageWrapper allows accessing the data as an InputStream
> >> instead
> >> of requiring to copy it). The effort seemed overwhelming, not to
> >> metion
> >> possible risks of dead-locks. [2]
> >>
> >> We then came up with an always copy implementation [3]. Simpler,
> >> safer... The underlying logic is to avoid trying being smarter than
> >> mutability, and leverage immutability to achieve thread safety, which
> >> is
> >> a classic functional programming idiom.
> >>
> >> JMH benchmarks were conducted. We highlighter little performance
> >> difference for small messages, in the percent realm for both memory
> >> allocation and compute time. Differences are however higher for
> >> bigger
> >> messages (~10%) for both metrics.
> >>
> >> Please note that above 100KB the MimeMessage would be stored on disk,
> >> thus limiting memory impact (see MimeMessageInputStreamSource). Maybe
> >> we
> >> should make the threshold configurable, via a system property for
> >> instance?
> >>
> >> I just want to further mentioned I encountered that very issue on a
> >> production instance: the underlying email had been corrupted by the
> >> above mentioned COW bug and kept throwing NullPointerExceptions every
> >> time the content was accessed. This resulted (on top of
> >> distributed-james) in a RabbitMQ nack of the message, that ended up
> >> in a
> >> dead-letter queue. Replaying its processing required admin
> >> intervention
> >> and had been interpreted by the user as an email loss...
> >>
> >> To conclude this effort we (Jean an I) would like to merge the
> >> "Always
> >> copy" pull request.
> >>
> >> Also, would it be beneficial to write an ADR about this topic?
> >>
> >> Thoughts?
> >>
> >> Cheers,
> >>
> >> Benoit
> >>
> >> [1] The unfamous COW bug:
> >>
> https://github.com/apache/james-project/pull/280/commits/09b5554bbcbbb98757910d59bac54f97ee1f8b4f
> >>
> >> [2] The double nested reetrant read-write lock fix attempt:
> >> https://github.com/apache/james-project/pull/280
> >> [3] The always copy fix:
> >> https://github.com/apache/james-project/pull/282
> >> [4] Benchmarks:
> >> https://github.com/apache/james-project/pull/280#issuecomment-745211736
> >> &
> >> https://github.com/apache/james-project/pull/280#issuecomment-745701937
> >> [5] The JIRA ticket: https://issues.apache.org/jira/browse/JAMES-3477
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
> >> For additional commands, e-mail: server-dev-h...@james.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
> > For additional commands, e-mail: server-dev-h...@james.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
> For additional commands, e-mail: server-dev-h...@james.apache.org
>
>

Reply via email to