Hello Jean,

I do support merging https://github.com/apache/james-project/pull/282
along with https://github.com/apache/james-project/pull/286 as soon as
we get a green build.

I did finally succeed to run an SMTP benchmark against a simple workload
mixing small and big messages, the outcome was good, eventually
answering people wanting Gatling performance tests to complete JMH
benchmarks:
https://github.com/apache/james-project/pull/286#issuecomment-780221455

Cheers,

Benoit

Le 15/02/2021 à 14:58, Jean Helou a écrit :
> Hi everyone,
> 
> In light of benoit's latest comments on the related PRs, I want to report
> that I spent some time setting up an infra to do benchmarks against a full
> server (mx only) I'm not an ops at core, I don't know james all that well
> and spare time is limited and I got sidetracked quite a few times, so I
> still haven't run the benchmarks but I'm getting there.
> 
> I am strongly in favor of integrating at least #282.
> 
> #285 is nice to have for me but I won't have time to investigate the config
> file option rapidly.
> 
> Jean
> 
> Le jeu. 7 janv. 2021 à 18:04, Raphaël Ouazana <[email protected]> a
> écrit :
> 
>> Hi,
>>
>> I completely agree with what Matthieu said.
>>
>> Correctness is needed, so we have to fix this issue.
>>
>> But before merging on master and/or deploying this version I would
>> expect some Gatling runs to show which potential performance decrease we
>> can expect on real use cases.
>>
>> Thanks,
>>
>> Raphaël.
>>
>> Le 04/01/2021 à 14:42, Matthieu Baechler a écrit :
>>> Hi there,
>>>
>>> Thank you for bringing this topic to the mailing-list.
>>>
>>> To me safety and correctness is much more important than raw
>>> performance so I would like the always-copy implementation to replace
>>> the COW version.
>>>
>>> However, keep in mind that the JMH benchmark figures did not told the
>>> full story about the consequences of this change and be ready to
>>> experience slower real-world performances.
>>>
>>> Cheers,
>>>
>>> -- Matthieu Baechler
>>>
>>> On Tue, 2020-12-29 at 12:54 +0700, Tellier Benoit wrote:
>>>> Hello there!
>>>>
>>>> We had been discussing on GitHub recently about an optimization in
>>>> james
>>>> core around the usage of MimeMessage.
>>>>
>>>> Javax.mail MimeMessage is currently used to represent a message of an
>>>> email as part of the mail processing in James. It is part of the Mail
>>>> interface (mailet-api).
>>>>
>>>> As Mail envelope is composed of several recipients, mail related
>>>> operations are performed once for all these recipients (we enqueue
>>>> the
>>>> mail one time, we strip bcc one time etc...). Troubles arise when we
>>>> need different behaviors as part of mail processing across recipients
>>>> (think remote recipients, that needs there mail to be relayed, versus
>>>> local recipients that needs to be locally delivered). The email get's
>>>> duplicated (in MatcherSplitter) and the processing will then be
>>>> distinct
>>>> for both entities. The underlying MimeMessage may - or may not be
>>>> modified.
>>>>
>>>> In order to prevent MimeMessage duplication in the event the
>>>> underlying
>>>> MimeMessage is not modified, a Copy On Write mechanism was introduced
>>>> (I
>>>> guess... Sorry, I was not there yet).
>>>>
>>>> Upon his CI effort, Jean Helou with the help of Matthieu Baechler
>>>> made
>>>> he unpleasant finding that this was not thread safe, that was leading
>>>> to
>>>> build instabilities. The mailet processing happens in Camel, which is
>>>> multi-threaded. Concurrency issues arised between modifications, and
>>>> message disposal, when a same MimeMessage instance was shared. [1]
>>>>
>>>> A first effort was to try to achieve thread-safety, which leaded to a
>>>> brittle double reetrant read-write locks in order to govern data
>>>> access.
>>>> However, another performance enhancement bypassed these lock
>>>> mechanism
>>>> (MimeMessageWrapper allows accessing the data as an InputStream
>>>> instead
>>>> of requiring to copy it). The effort seemed overwhelming, not to
>>>> metion
>>>> possible risks of dead-locks. [2]
>>>>
>>>> We then came up with an always copy implementation [3]. Simpler,
>>>> safer... The underlying logic is to avoid trying being smarter than
>>>> mutability, and leverage immutability to achieve thread safety, which
>>>> is
>>>> a classic functional programming idiom.
>>>>
>>>> JMH benchmarks were conducted. We highlighter little performance
>>>> difference for small messages, in the percent realm for both memory
>>>> allocation and compute time. Differences are however higher for
>>>> bigger
>>>> messages (~10%) for both metrics.
>>>>
>>>> Please note that above 100KB the MimeMessage would be stored on disk,
>>>> thus limiting memory impact (see MimeMessageInputStreamSource). Maybe
>>>> we
>>>> should make the threshold configurable, via a system property for
>>>> instance?
>>>>
>>>> I just want to further mentioned I encountered that very issue on a
>>>> production instance: the underlying email had been corrupted by the
>>>> above mentioned COW bug and kept throwing NullPointerExceptions every
>>>> time the content was accessed. This resulted (on top of
>>>> distributed-james) in a RabbitMQ nack of the message, that ended up
>>>> in a
>>>> dead-letter queue. Replaying its processing required admin
>>>> intervention
>>>> and had been interpreted by the user as an email loss...
>>>>
>>>> To conclude this effort we (Jean an I) would like to merge the
>>>> "Always
>>>> copy" pull request.
>>>>
>>>> Also, would it be beneficial to write an ADR about this topic?
>>>>
>>>> Thoughts?
>>>>
>>>> Cheers,
>>>>
>>>> Benoit
>>>>
>>>> [1] The unfamous COW bug:
>>>>
>> https://github.com/apache/james-project/pull/280/commits/09b5554bbcbbb98757910d59bac54f97ee1f8b4f
>>>>
>>>> [2] The double nested reetrant read-write lock fix attempt:
>>>> https://github.com/apache/james-project/pull/280
>>>> [3] The always copy fix:
>>>> https://github.com/apache/james-project/pull/282
>>>> [4] Benchmarks:
>>>> https://github.com/apache/james-project/pull/280#issuecomment-745211736
>>>> &
>>>> https://github.com/apache/james-project/pull/280#issuecomment-745701937
>>>> [5] The JIRA ticket: https://issues.apache.org/jira/browse/JAMES-3477
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to