Benoit, This is very interesting work, thank you for contributing. I am interested in learning more about the event system.
Robert > On Nov 28, 2015, at 10:09 AM, Tellier Benoit <btell...@apache.org> wrote: > > Hi, > > I just wanted to present my work on James event system. > > ## What is James event system ? > > The mailbox event system conveys notifications about modifications of > the mailboxes and messages states. You can register listener to it so > that you can be notified. > > ## What it is used for ? > > It is used for : > > - IMAP IDLE : allow one to subscribe to a specific mailbox and gets > notified about changes without to pull the mailbox. > > - Quota system : updates about stored quota are made outside the > MailboxManager as it may involve large quota calculations > > - Indexing of messages for the Search feature (ElasticSearch and Lucene > implementation ) > > - IMAP Sequence Number handling. > > - Cache invalidation (caching project, not yet exposed to configuration) > > - Many others > > ## Why do we need it to be distributed ? > > I want to see this feature distributed as I personally really love IDLE > feature. I want my Thunderbird to be allowed to use this in a > distributed environment. > > I also think one might be interested to make several James work in > parallel with any kind of architecture (Quotas, messages search indexes). > > ## What are different configuration options ? > > I reviewed the event system. > > First thing is to explicitly specify a listener distributed status. It > can be either : > > - Registered per mailbox > - The listener needs just to be notified about all local events > - The listener needs to be notified about all events in your James cluster. > > Then, we keep the in memory default implementation (little reworked > using guava). And I added two other architectures for the event system. > > #### Registration based event system > > With this implementation, you want to exchange events on the network. > You want a James system to be only notified about events it explicitly > registered to. Because of that : > > - This approach is thought for architecture with a large number of > James server > - It does not support event listener that needs to be notified of all > events in the cluster. > > Each server listens on a message queue and a registration mechanism is > used to identify to which server we need to send the events. Of course > you have event serialization / deserialization. > > Today : > - Kafka is used for the messaging > - Cassandra is used for registration management > > This solution was presented at Paris Cassandra Meet-up. > > #### Broadcast event system > > With this implementation, you want to have several James working > together but you relies on Mailbox Listeners that needs to be notified > about every event in your data center. > > These listeners could be : > > - Lucene document indexing > - In memory quotas > - In memory cache > > The idea here is to naively broadcast the events to all your James. They > are notified about every events (so scalability will be limited). > > You also have to be aware that events can be duplicated /non emitted > (james server crash, network partitions) so local data might be > inconsistent. It seems OK for instance for quota calculation. > > ## What do I need to know as an administrator ? > > Distributed use of Message Sequence Number (that demands high degree of > coordination) is risky. The inconsistency window between server may be > large, and the corresponding between UID and message sequence number is > not eventually consistent. This topic is in discussion on the dev > mailing list. > > I corrected an issue I spotted month before : a faulty mailbox listener > might stop the event delivery chain and generate IMAP service > unavailability. I added a commit to not propagate errors inside mailbox > Listeners. > > I want to finish this section by speaking of event serialization. You > can either choose : > > - JSON > - MessagePack > > The first one is faster to compute but larger. So it let you trade > compute power versus network. > > ## Event delivery modes > > As you might have noticed, Mailbox Listener can take a long time to > execute, and for some of them, they can safely be executed > asynchronously (IDLE, indexation and even quotas). > > I added an Event Delivery abstraction. Thanks to this, you can configure > your James to : > > - Synchronously deliver events (todays behavior) > - Asynchronously deliver events ( returns before having delivered > events, Mailbox Listener are notified in parallel in a thread pool) > - Mixed mode : Every Mailbox Listener indicates if it should be > synchronously or asynchronously executed. > > The asynchronous option can be considered as risky. The mixed one is > safe, and significantly reduces latencies if you rely on document indexing. > > ## Re indexers > > I also added the availability to re index documents in a Message Search > index using the CLI : > > - per mailbox : the event system is used to track changes made to the > given mailbox and significantly reduce the concurrent changes window. > - your whole James mailboxes : the event system is used to keep track > of deleted mailboxes. > > ## My future works on the event system. > > Finish the work on MAILBOX-257 : one should be able to recalculate quotas. > > Unfortunately it is not yet planned in my todo list... > > Benoit > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org > For additional commands, e-mail: server-user-h...@james.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org For additional commands, e-mail: server-user-h...@james.apache.org