[jira] [Updated] (JAMES-3599) Improve the design of the RabbitMQ eventbus

Benoit Tellier (Jira) Sun, 13 Jun 2021 20:17:06 -0700


     [ 
https://issues.apache.org/jira/browse/JAMES-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benoit Tellier updated JAMES-3599:
----------------------------------
    Attachment: design_before.png
                design_after.png

> Improve the design of the RabbitMQ eventbus
> -------------------------------------------
>
>                 Key: JAMES-3599
>                 URL: https://issues.apache.org/jira/browse/JAMES-3599
>             Project: James Server
>          Issue Type: Task
>          Components: mailbox, rabbitmq
>    Affects Versions: 3.6.0
>            Reporter: Benoit Tellier
>            Priority: Major
>             Fix For: 3.7.0
>
>         Attachments: design_after.png, design_before.png, 
> rabbitmq-management.png
>
>
> Mailing list discussion: 
> https://www.mail-archive.com/server-dev@james.apache.org/msg70437.html
> I did spend a bit of time digging within the RabbitMQ performances and
> stability.
> I was surprised to discover weeks ago the amount of work performed by
> play.json library and could not just quite explain why it was hogging 3%
> of CPU time, and be the most CPU consumer for mailbox events. RabbitMQ
> acks account for another 1.20% of CPU time.
> Investigating in the RabbitMQ eventbus I realized the events are routed
> to all group queues, dispatched and deserialized then applied if relevant.
> Given 200 events/s and given that the JMAP server has 10 groups we end
> up deserializing 2000 events/s, even if irrelevant for the groups.
> As I recall, we wanted the the event per group to be the unit of retry.
> Noble design goal.
> I think parallelizing groups is a non goal: this kind of optimization
> would not improve response time as it is asynchronous, running in the
> background, and makes little sense at 1000s requests per seconds.
> However ending up having one queue per event is likely sub-optimal. I
> think the design can be improved by, in the nominal case, transmitting
> only one message to all groups. The receiving groups will then try to
> execute all groups. We can keep reties for individual groups (with their
> dedicated exchanges and queues): upon failure, we republish to the retry
> exchange of the incriminated listener. This makes the upgrade path easy
> too, as the group queue keeps being consumed. One would just need to do
> some unbindings...
> Note that such an evolution would:
>  - also enable us, if we want, to enforce some execution orders for
> listeners, opening the way to fix things like JAMES-3561
> <https://issues.apache.org/jira/browse/JAMES-3561> ...
>  - it could serve as an inspiration for future eventBus implementations
> like the Pulsar one, hence getting feedback on the existing design is
> IMO useful.
> I will create a JIRA ticket holding the design proposal (schema) and how
> it does defer from the previous one, as well as some RabbitMQ management
> screenshots.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

[jira] [Updated] (JAMES-3599) Improve the design of the RabbitMQ eventbus

Reply via email to