Hi there,

Thank you for bringing this topic to the mailing-list.

To me safety and correctness is much more important than raw
performance so I would like the always-copy implementation to replace
the COW version.

However, keep in mind that the JMH benchmark figures did not told the
full story about the consequences of this change and be ready to
experience slower real-world performances.

Cheers,

-- Matthieu Baechler

On Tue, 2020-12-29 at 12:54 +0700, Tellier Benoit wrote:
> Hello there!
> 
> We had been discussing on GitHub recently about an optimization in
> james
> core around the usage of MimeMessage.
> 
> Javax.mail MimeMessage is currently used to represent a message of an
> email as part of the mail processing in James. It is part of the Mail
> interface (mailet-api).
> 
> As Mail envelope is composed of several recipients, mail related
> operations are performed once for all these recipients (we enqueue
> the
> mail one time, we strip bcc one time etc...). Troubles arise when we
> need different behaviors as part of mail processing across recipients
> (think remote recipients, that needs there mail to be relayed, versus
> local recipients that needs to be locally delivered). The email get's
> duplicated (in MatcherSplitter) and the processing will then be
> distinct
> for both entities. The underlying MimeMessage may - or may not be
> modified.
> 
> In order to prevent MimeMessage duplication in the event the
> underlying
> MimeMessage is not modified, a Copy On Write mechanism was introduced
> (I
> guess... Sorry, I was not there yet).
> 
> Upon his CI effort, Jean Helou with the help of Matthieu Baechler
> made
> he unpleasant finding that this was not thread safe, that was leading
> to
> build instabilities. The mailet processing happens in Camel, which is
> multi-threaded. Concurrency issues arised between modifications, and
> message disposal, when a same MimeMessage instance was shared. [1]
> 
> A first effort was to try to achieve thread-safety, which leaded to a
> brittle double reetrant read-write locks in order to govern data
> access.
> However, another performance enhancement bypassed these lock
> mechanism
> (MimeMessageWrapper allows accessing the data as an InputStream
> instead
> of requiring to copy it). The effort seemed overwhelming, not to
> metion
> possible risks of dead-locks. [2]
> 
> We then came up with an always copy implementation [3]. Simpler,
> safer... The underlying logic is to avoid trying being smarter than
> mutability, and leverage immutability to achieve thread safety, which
> is
> a classic functional programming idiom.
> 
> JMH benchmarks were conducted. We highlighter little performance
> difference for small messages, in the percent realm for both memory
> allocation and compute time. Differences are however higher for
> bigger
> messages (~10%) for both metrics.
> 
> Please note that above 100KB the MimeMessage would be stored on disk,
> thus limiting memory impact (see MimeMessageInputStreamSource). Maybe
> we
> should make the threshold configurable, via a system property for
> instance?
> 
> I just want to further mentioned I encountered that very issue on a
> production instance: the underlying email had been corrupted by the
> above mentioned COW bug and kept throwing NullPointerExceptions every
> time the content was accessed. This resulted (on top of
> distributed-james) in a RabbitMQ nack of the message, that ended up
> in a
> dead-letter queue. Replaying its processing required admin
> intervention
> and had been interpreted by the user as an email loss...
> 
> To conclude this effort we (Jean an I) would like to merge the
> "Always
> copy" pull request.
> 
> Also, would it be beneficial to write an ADR about this topic?
> 
> Thoughts?
> 
> Cheers,
> 
> Benoit
> 
> [1] The unfamous COW bug:
> https://github.com/apache/james-project/pull/280/commits/09b5554bbcbbb98757910d59bac54f97ee1f8b4f
> 
> [2] The double nested reetrant read-write lock fix attempt:
> https://github.com/apache/james-project/pull/280
> [3] The always copy fix:
> https://github.com/apache/james-project/pull/282
> [4] Benchmarks:
> https://github.com/apache/james-project/pull/280#issuecomment-745211736
> &
> https://github.com/apache/james-project/pull/280#issuecomment-745701937
> [5] The JIRA ticket: https://issues.apache.org/jira/browse/JAMES-3477
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
> For additional commands, e-mail: server-dev-h...@james.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to