Re: RabbitMQ MailQueue & delay : implementation proposal

Tellier Benoit Fri, 20 Sep 2019 04:19:22 -0700

Hi Matthieu,

It look like it is going to be a long thread...


Yes, that is definitly worth the discussion.

On 20/09/2019 16:00, Matthieu Baechler wrote:
> Hi Benoit,
> 
> Thank you for bringing that subject to the mailing list.
> 
> On Fri, 2019-09-20 at 13:46 +0700, Tellier Benoit wrote:
>> Hello all,
>>
>> As off 20/09/2019 delays are not supported on top of RabbitMQ
>> MailQueue.
>>
>> While this is not a problem for a "Mail Delivery Agent" server, this
>> is
>> a major concern for a "Mail Exchange" server, as stated out by
>> @splainez
>> on the gitter channel.
>>
>> A possible implementation came to my mind regarding this concern:
>>
>>  - When a delay is specified, we save the message in the object
>> storage,
>>  fire a message on a **MailQueueDelayExchange**, and persist it on
>> the
>> MailQueueView.
>>  - Each James listens on a single Queue plugged to the
>> **MailQueueDelayExchange**.
>>  - For each incoming message, the receiver will position a timer
>> until
>> planned delivery (date).
>>  - Upon timer completion, we ack the message of
>> MailQueueDelayExchange,
>> then we put the corresponding message in the mail RabbitMQMailQueue
>> (no
>> need to update the mailQueueView nor store again the blob).
>>  - Upon connection loss, the message will be nack and will be then
>> handled by another s/consumer/jamesServer/.
>>
>> Obviously:
>>  - We need synchronized clocks "best effort" - think NTP
>>  - This solution can duplicate emails upon connection loss - a local
>> James needs invalidate the entries he is waiting for upon connection
>> loss.
>>
> 
> It may work. However, I'm not satisfied by the state of the RabbitMQ
> Mailqueue implementation.
> 
> The design is very complex (mostly because of the coupling with
> Cassandra) and it looks very brittle to me. We managed to break it
> recently without even noticing the problem (AFAIR, we broke delete
> feature).
> 
> I would like to challenge the initial choice once more.
> 
> Here is a list of facts:
> 
> 1. Given that we are not able, for now, to setup a reliable cluster of
> RabbitMQ servers, we probably don't gain anything at using RabbitMQ vs
> not-embedded ActiveMQ
> 
> 2. Once we'll have a clustered RabbitMQ, chances are high that we'll
> need to fix some issues (see https://www.rabbitmq.com/ha.html about
> mirrored queues).
> 
> 3. The code is very complex and add some load to Cassandra.
> 
> Here is the list of question I think we should try to answer: 
> 
> 1. Do we have any evidence ActiveMQ is a limiting factor of MailQueue
> handling?
> 
> 2. Do we have any evidence that single-node RabbitMQ is better than
> ActiveMQ?
> 
> 3. What is the estimated load we think we can handle with ActiveMQ if
> we invest in reasonable optimizations?
> 
> 4. Do we think we'll ever get a robust MailQueue with that design? At
> which kind of cost?
> 
> All these questions should be evaluated in a short-term and long-term
> perspective: 
> 
> * Should we stop investing for now in RabbitMQ because it solves no
> issue without adding some investment?
> 
> * Do we think it's the right solution in a long-term perspective or
> will we switch to a better alternative?
> 
> Sorry to bring my uncertainties to this discussion but it's probably
> still time to look at what we've done, evaluate it and maybe change our
> strategy if needed.
> 
> Cheers,
> 

Regarding your facts:

1. That may be a fact for Linagora, I would not assume that for all
James users.

2. This don't look like major changes to me.

This also sounds like something that can be done via 'rabbitmqctl'. Of
course native support would be nicer but that could be a solution for
user wanting RabbitMQ HA for James, no?

3. Switching to RabbitMQ unlocked a 30x enqueue speed enhancement.
ActiveMQ was slow and barely catch up with 5 mail/sec at the cost of
massive GC.

Given the code quality of this component, I would consider the
"reasonable optimization" being hard to implement, and for the record we
never tackled them the years before starting implementing the RabbitMQ
MailQueue.

4. I think we do. An append-only time-series with tombstones (current
design) does not look much complex to me. And I believe we can get it at
the cost of:
 - A necessary projection for RabbitMQMailQueue :: getSize.
   A proposal proof of concept was made here:
   https://github.com/linagora/james-project/pull/2565
 - And the aforementioned implementation for delays

Me I would add finer granularity onto the use case:
 - Mail Delivery Agent: RabbitMQ with no delays, and no management
capabilities would be a good match and very simple to implement. We
already have it, we just need to remove the associated Cassandra view.
 - Mail Exchange (MX). Here delays and management capabilities makes sense.

Thus I believe the short term and long term conclusion might be
correlated to the intended use of the distributed-james software (let's
bring one more topic on the table...).

And we need at the very least a clear documentation of today's behavior
on the "packaging / support matrix" documentation page.

Best regards,

Benoit

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Re: RabbitMQ MailQueue & delay : implementation proposal

Reply via email to