Hi, I just wanted to present my work on James event system.
## What is James event system ? The mailbox event system conveys notifications about modifications of the mailboxes and messages states. You can register listener to it so that you can be notified. ## What it is used for ? It is used for : - IMAP IDLE : allow one to subscribe to a specific mailbox and gets notified about changes without to pull the mailbox. - Quota system : updates about stored quota are made outside the MailboxManager as it may involve large quota calculations - Indexing of messages for the Search feature (ElasticSearch and Lucene implementation ) - IMAP Sequence Number handling. - Cache invalidation (caching project, not yet exposed to configuration) - Many others ## Why do we need it to be distributed ? I want to see this feature distributed as I personally really love IDLE feature. I want my Thunderbird to be allowed to use this in a distributed environment. I also think one might be interested to make several James work in parallel with any kind of architecture (Quotas, messages search indexes). ## What are different configuration options ? I reviewed the event system. First thing is to explicitly specify a listener distributed status. It can be either : - Registered per mailbox - The listener needs just to be notified about all local events - The listener needs to be notified about all events in your James cluster. Then, we keep the in memory default implementation (little reworked using guava). And I added two other architectures for the event system. #### Registration based event system With this implementation, you want to exchange events on the network. You want a James system to be only notified about events it explicitly registered to. Because of that : - This approach is thought for architecture with a large number of James server - It does not support event listener that needs to be notified of all events in the cluster. Each server listens on a message queue and a registration mechanism is used to identify to which server we need to send the events. Of course you have event serialization / deserialization. Today : - Kafka is used for the messaging - Cassandra is used for registration management This solution was presented at Paris Cassandra Meet-up. #### Broadcast event system With this implementation, you want to have several James working together but you relies on Mailbox Listeners that needs to be notified about every event in your data center. These listeners could be : - Lucene document indexing - In memory quotas - In memory cache The idea here is to naively broadcast the events to all your James. They are notified about every events (so scalability will be limited). You also have to be aware that events can be duplicated /non emitted (james server crash, network partitions) so local data might be inconsistent. It seems OK for instance for quota calculation. ## What do I need to know as an administrator ? Distributed use of Message Sequence Number (that demands high degree of coordination) is risky. The inconsistency window between server may be large, and the corresponding between UID and message sequence number is not eventually consistent. This topic is in discussion on the dev mailing list. I corrected an issue I spotted month before : a faulty mailbox listener might stop the event delivery chain and generate IMAP service unavailability. I added a commit to not propagate errors inside mailbox Listeners. I want to finish this section by speaking of event serialization. You can either choose : - JSON - MessagePack The first one is faster to compute but larger. So it let you trade compute power versus network. ## Event delivery modes As you might have noticed, Mailbox Listener can take a long time to execute, and for some of them, they can safely be executed asynchronously (IDLE, indexation and even quotas). I added an Event Delivery abstraction. Thanks to this, you can configure your James to : - Synchronously deliver events (todays behavior) - Asynchronously deliver events ( returns before having delivered events, Mailbox Listener are notified in parallel in a thread pool) - Mixed mode : Every Mailbox Listener indicates if it should be synchronously or asynchronously executed. The asynchronous option can be considered as risky. The mixed one is safe, and significantly reduces latencies if you rely on document indexing. ## Re indexers I also added the availability to re index documents in a Message Search index using the CLI : - per mailbox : the event system is used to track changes made to the given mailbox and significantly reduce the concurrent changes window. - your whole James mailboxes : the event system is used to keep track of deleted mailboxes. ## My future works on the event system. Finish the work on MAILBOX-257 : one should be able to recalculate quotas. Unfortunately it is not yet planned in my todo list... Benoit --------------------------------------------------------------------- To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org For additional commands, e-mail: server-user-h...@james.apache.org