Hi everyone, In light of benoit's latest comments on the related PRs, I want to report that I spent some time setting up an infra to do benchmarks against a full server (mx only) I'm not an ops at core, I don't know james all that well and spare time is limited and I got sidetracked quite a few times, so I still haven't run the benchmarks but I'm getting there.
I am strongly in favor of integrating at least #282. #285 is nice to have for me but I won't have time to investigate the config file option rapidly. Jean Le jeu. 7 janv. 2021 à 18:04, Raphaël Ouazana <rouaz...@linagora.com> a écrit : > Hi, > > I completely agree with what Matthieu said. > > Correctness is needed, so we have to fix this issue. > > But before merging on master and/or deploying this version I would > expect some Gatling runs to show which potential performance decrease we > can expect on real use cases. > > Thanks, > > Raphaël. > > Le 04/01/2021 à 14:42, Matthieu Baechler a écrit : > > Hi there, > > > > Thank you for bringing this topic to the mailing-list. > > > > To me safety and correctness is much more important than raw > > performance so I would like the always-copy implementation to replace > > the COW version. > > > > However, keep in mind that the JMH benchmark figures did not told the > > full story about the consequences of this change and be ready to > > experience slower real-world performances. > > > > Cheers, > > > > -- Matthieu Baechler > > > > On Tue, 2020-12-29 at 12:54 +0700, Tellier Benoit wrote: > >> Hello there! > >> > >> We had been discussing on GitHub recently about an optimization in > >> james > >> core around the usage of MimeMessage. > >> > >> Javax.mail MimeMessage is currently used to represent a message of an > >> email as part of the mail processing in James. It is part of the Mail > >> interface (mailet-api). > >> > >> As Mail envelope is composed of several recipients, mail related > >> operations are performed once for all these recipients (we enqueue > >> the > >> mail one time, we strip bcc one time etc...). Troubles arise when we > >> need different behaviors as part of mail processing across recipients > >> (think remote recipients, that needs there mail to be relayed, versus > >> local recipients that needs to be locally delivered). The email get's > >> duplicated (in MatcherSplitter) and the processing will then be > >> distinct > >> for both entities. The underlying MimeMessage may - or may not be > >> modified. > >> > >> In order to prevent MimeMessage duplication in the event the > >> underlying > >> MimeMessage is not modified, a Copy On Write mechanism was introduced > >> (I > >> guess... Sorry, I was not there yet). > >> > >> Upon his CI effort, Jean Helou with the help of Matthieu Baechler > >> made > >> he unpleasant finding that this was not thread safe, that was leading > >> to > >> build instabilities. The mailet processing happens in Camel, which is > >> multi-threaded. Concurrency issues arised between modifications, and > >> message disposal, when a same MimeMessage instance was shared. [1] > >> > >> A first effort was to try to achieve thread-safety, which leaded to a > >> brittle double reetrant read-write locks in order to govern data > >> access. > >> However, another performance enhancement bypassed these lock > >> mechanism > >> (MimeMessageWrapper allows accessing the data as an InputStream > >> instead > >> of requiring to copy it). The effort seemed overwhelming, not to > >> metion > >> possible risks of dead-locks. [2] > >> > >> We then came up with an always copy implementation [3]. Simpler, > >> safer... The underlying logic is to avoid trying being smarter than > >> mutability, and leverage immutability to achieve thread safety, which > >> is > >> a classic functional programming idiom. > >> > >> JMH benchmarks were conducted. We highlighter little performance > >> difference for small messages, in the percent realm for both memory > >> allocation and compute time. Differences are however higher for > >> bigger > >> messages (~10%) for both metrics. > >> > >> Please note that above 100KB the MimeMessage would be stored on disk, > >> thus limiting memory impact (see MimeMessageInputStreamSource). Maybe > >> we > >> should make the threshold configurable, via a system property for > >> instance? > >> > >> I just want to further mentioned I encountered that very issue on a > >> production instance: the underlying email had been corrupted by the > >> above mentioned COW bug and kept throwing NullPointerExceptions every > >> time the content was accessed. This resulted (on top of > >> distributed-james) in a RabbitMQ nack of the message, that ended up > >> in a > >> dead-letter queue. Replaying its processing required admin > >> intervention > >> and had been interpreted by the user as an email loss... > >> > >> To conclude this effort we (Jean an I) would like to merge the > >> "Always > >> copy" pull request. > >> > >> Also, would it be beneficial to write an ADR about this topic? > >> > >> Thoughts? > >> > >> Cheers, > >> > >> Benoit > >> > >> [1] The unfamous COW bug: > >> > https://github.com/apache/james-project/pull/280/commits/09b5554bbcbbb98757910d59bac54f97ee1f8b4f > >> > >> [2] The double nested reetrant read-write lock fix attempt: > >> https://github.com/apache/james-project/pull/280 > >> [3] The always copy fix: > >> https://github.com/apache/james-project/pull/282 > >> [4] Benchmarks: > >> https://github.com/apache/james-project/pull/280#issuecomment-745211736 > >> & > >> https://github.com/apache/james-project/pull/280#issuecomment-745701937 > >> [5] The JIRA ticket: https://issues.apache.org/jira/browse/JAMES-3477 > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > >> For additional commands, e-mail: server-dev-h...@james.apache.org > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > > For additional commands, e-mail: server-dev-h...@james.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > For additional commands, e-mail: server-dev-h...@james.apache.org > >