This is an automated email from the ASF dual-hosted git repository. rcordier pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/james-project.git
commit 8efc932aa6a942f292a0f118b3d9ce37005034c3 Author: Tran Tien Duc <[email protected]> AuthorDate: Mon Feb 17 15:48:42 2020 +0700 JAMES-3052 Solving Cassandra inconsistencies Administration Procedures --- .../server/manage-guice-distributed-james.md | 70 +++++++++++++++++++++- src/site/markdown/server/manage-webadmin.md | 9 ++- 2 files changed, 77 insertions(+), 2 deletions(-) diff --git a/src/site/markdown/server/manage-guice-distributed-james.md b/src/site/markdown/server/manage-guice-distributed-james.md index 6be9cb3..126a02c 100644 --- a/src/site/markdown/server/manage-guice-distributed-james.md +++ b/src/site/markdown/server/manage-guice-distributed-james.md @@ -20,7 +20,8 @@ advanced users. - [Mailbox Event Bus](#mailbox-event-bus) - [Mail Processing](#mail-processing) - [ElasticSearch Indexing](#elasticsearch-indexing) - + - [Solving cassandra inconsistencies](#solving-cassandra-inconsistencies) + ## Overall architecture Guice distributed James server intends to provide a horizontally scalable email server. @@ -260,3 +261,70 @@ by setting the parameter `elasticsearch.index.mailbox.name` to the name of your re-creates index upon restart _Note_: keep in mind that reindexing can be a very long operation depending on the volume of mails you have stored. + +## Solving cassandra inconsistencies + +Cassandra backend uses data duplication to workaround Cassandra query limitations. +However, Cassandra is not doing transaction when writing in several tables, +this can lead to consistency issues for a given piece of data. +The consequence could be data that is in transient state (that should never appear outside of the system). + +Because of the lack of transactions, it's hard to prevent these kind of issues. We had developed some features to +fix some existing cassandra inconsistency issues that had been reported to James. + +Here is the list of known inconsistencies: + - [RRT (RecipientRewriteTable) mapping sources](#rrt-recipientrewritetable-mapping-sources) + - [Jmap message fast view projections](#jmap-message-fast-view-projections) + - [Mailboxes](#mailboxes) + +### RRT (RecipientRewriteTable) mapping sources + +`rrt` and `mappings_sources` tables store information about address mappings. +The source of truth is `rrt` and `mappings_sources` is the projection table containing all +mapping sources. + +#### How to detect the inconsistencies + +Right now there's no tool for detecting that, we're proposing a [development plan](https://issues.apache.org/jira/browse/JAMES-3069). +By the mean time, the recommendation is to execute the `SolveInconsistencies` task below +in a regular basis. + +#### How to solve + +Execute the Cassandra mapping `SolveInconsistencies` task described in [webadmin documentation](https://james.apache.org/server/manage-webadmin.html#Operations_on_mappings_sources) + +### Jmap message fast view projections + +When you read a Jmap message, some calculated properties are expected to be fast to retrieve, like `preview`, `hasAttachment`. +James achieves it by pre-calculating and storing them into a message projection table(`message_fast_view_projection`). +Consequently the following fetches are optimized by reading directly from the projection table instead of calculating it again. +The underlying data is immutable so there's no inconsistency risk if the projections is outdated. +But still you can face a performance issue, how bad it is depends on how much the projection is lagging behind. + +#### How to detect the outdated projections + +You can watch the `MessageFastViewProjection` health check at [webadmin documentation](manage-webadmin.html#Check_all_components). +It provides a check based on the ratio of missed projection reads. + +#### How to solve + +Since the MessageFastViewProjection is self healing, you should be concerned only if +the health check still returns `degraded` for a while, there's a possible thing you +can do is looking at James logs for more clues. + +### Mailboxes + +`mailboxPath` and `mailbox` tables share common fields like `mailboxId` and mailbox `name`. +A successful operation of creating/renaming/delete mailboxes has to succeed at updating `mailboxPath` and `mailbox` table. +Any failure on creating/updating/delete records in `mailboxPath` or `mailbox` can produce inconsistencies. + +#### How to detect the inconsistencies + +If you found the suspicious `MailboxNotFoundException` in your logs. +Currently, there's no dedicated tool for that, we recommend scheduling +the SolveInconsistencies task below for the mailbox object on a regular basis, +avoiding peak traffic in order to address both inconsistencies diagnostic and fixes. + +#### How to solve + +Under development: Task for solving mailbox inconsistencies \ No newline at end of file diff --git a/src/site/markdown/server/manage-webadmin.md b/src/site/markdown/server/manage-webadmin.md index 76dccc6..40d5d3f 100644 --- a/src/site/markdown/server/manage-webadmin.md +++ b/src/site/markdown/server/manage-webadmin.md @@ -99,7 +99,14 @@ Supported health checks include: - **ElasticSearch Backend**: ElasticSearch storage. Included in Cassandra Guice based products. - **Guice application lifecycle**: included in all Guice products. - **JPA Backend**: JPA storage. Included in JPA Guice based products. - - **MessageFastViewProjection**: included in memory and Cassandra based Guice products. + - **MessageFastViewProjection**: included in memory and Cassandra based Guice products. + Health check of the component storing JMAP properties which are fast to retrieve. + Those properties are computed in advance from messages and persisted in order to archive a better performance. + There are some latencies between a source update and its projections updates. + Incoherency problems arise when reads are performed in this time-window. + We piggyback the projection update on missed JMAP read in order to decrease the outdated time window for a given entry. + The health is determined by the ratio of missed projection reads. (lower than 10% causes `degraded`) + - **RabbitMQ backend**: RabbitMQ messaging. Included in Distributed Guice based products. Response codes: --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
