[ https://issues.apache.org/jira/browse/JAMES-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benoit Tellier updated JAMES-3599: ---------------------------------- Attachment: design_before.png design_after.png > Improve the design of the RabbitMQ eventbus > ------------------------------------------- > > Key: JAMES-3599 > URL: https://issues.apache.org/jira/browse/JAMES-3599 > Project: James Server > Issue Type: Task > Components: mailbox, rabbitmq > Affects Versions: 3.6.0 > Reporter: Benoit Tellier > Priority: Major > Fix For: 3.7.0 > > Attachments: design_after.png, design_before.png, > rabbitmq-management.png > > > Mailing list discussion: > https://www.mail-archive.com/server-dev@james.apache.org/msg70437.html > I did spend a bit of time digging within the RabbitMQ performances and > stability. > I was surprised to discover weeks ago the amount of work performed by > play.json library and could not just quite explain why it was hogging 3% > of CPU time, and be the most CPU consumer for mailbox events. RabbitMQ > acks account for another 1.20% of CPU time. > Investigating in the RabbitMQ eventbus I realized the events are routed > to all group queues, dispatched and deserialized then applied if relevant. > Given 200 events/s and given that the JMAP server has 10 groups we end > up deserializing 2000 events/s, even if irrelevant for the groups. > As I recall, we wanted the the event per group to be the unit of retry. > Noble design goal. > I think parallelizing groups is a non goal: this kind of optimization > would not improve response time as it is asynchronous, running in the > background, and makes little sense at 1000s requests per seconds. > However ending up having one queue per event is likely sub-optimal. I > think the design can be improved by, in the nominal case, transmitting > only one message to all groups. The receiving groups will then try to > execute all groups. We can keep reties for individual groups (with their > dedicated exchanges and queues): upon failure, we republish to the retry > exchange of the incriminated listener. This makes the upgrade path easy > too, as the group queue keeps being consumed. One would just need to do > some unbindings... > Note that such an evolution would: > - also enable us, if we want, to enforce some execution orders for > listeners, opening the way to fix things like JAMES-3561 > <https://issues.apache.org/jira/browse/JAMES-3561> ... > - it could serve as an inspiration for future eventBus implementations > like the Pulsar one, hence getting feedback on the existing design is > IMO useful. > I will create a JIRA ticket holding the design proposal (schema) and how > it does defer from the previous one, as well as some RabbitMQ management > screenshots. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org