This is an automated email from the ASF dual-hosted git repository. btellier pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/james-project.git
commit 0b26b27df158dfd961b6d66966afad63005c701d Author: Benoit Tellier <btell...@linagora.com> AuthorDate: Tue May 2 11:26:22 2023 +0700 [DOCUMENTATION] Performance checklist --- .../distributed-app/docs/modules/ROOT/nav.adoc | 1 + .../docs/modules/ROOT/pages/operate/index.adoc | 2 +- .../ROOT/pages/operate/performanceChecklist.adoc | 164 +++++++++++++++++++++ 3 files changed, 166 insertions(+), 1 deletion(-) diff --git a/server/apps/distributed-app/docs/modules/ROOT/nav.adoc b/server/apps/distributed-app/docs/modules/ROOT/nav.adoc index 14b3a379f2..d415d64d67 100644 --- a/server/apps/distributed-app/docs/modules/ROOT/nav.adoc +++ b/server/apps/distributed-app/docs/modules/ROOT/nav.adoc @@ -52,6 +52,7 @@ **** xref:configure/dsn.adoc[ESMTP DSN support] ** xref:operate/index.adoc[Operate] *** xref:operate/guide.adoc[] +*** xref:operate/performanceChecklist.adoc[] *** xref:operate/logging.adoc[] *** xref:operate/webadmin.adoc[] *** xref:operate/metrics.adoc[] diff --git a/server/apps/distributed-app/docs/modules/ROOT/pages/operate/index.adoc b/server/apps/distributed-app/docs/modules/ROOT/pages/operate/index.adoc index 487cdc8100..e3d2816cd5 100644 --- a/server/apps/distributed-app/docs/modules/ROOT/pages/operate/index.adoc +++ b/server/apps/distributed-app/docs/modules/ROOT/pages/operate/index.adoc @@ -20,7 +20,7 @@ The xref:operate/metrics.adoc[metrics] allows to build latency and throughput graphs, that can be visualized, for instance in *Grafana*. We did put together a xref:operate/guide.adoc[detailed guide] for -distributed James operators. +distributed James operators. We also propose a xref:operate/performanceChecklist.adoc[performance checklist]. We also included a guide for xref:operate/migrating.adoc[migrating existing data] into the distributed server. diff --git a/server/apps/distributed-app/docs/modules/ROOT/pages/operate/performanceChecklist.adoc b/server/apps/distributed-app/docs/modules/ROOT/pages/operate/performanceChecklist.adoc new file mode 100644 index 0000000000..b231880ae1 --- /dev/null +++ b/server/apps/distributed-app/docs/modules/ROOT/pages/operate/performanceChecklist.adoc @@ -0,0 +1,164 @@ += Distributed James Server — Performance checklist +:navtitle: Performance checklist + +This guide aims to help James operators refine their James configuration and set up to achieve better performance. + +== Database setup + +Cassandra, OpenSearch, RabbitMQ is a large topic in itself that we do not intend to cover here. Yet, here are some +very basic recommendation that are always beneficial to keep in mind. + +We recommend: + +* Running Cassandra, OpenSearch on commodity hardware with attached SSD. SAN disks are known to cause performance +issues for these technologies. HDD disks are to be banned for these performance related applications. +* We recommend getting an Object Storage SaaS offering that suites your needs. Most generalist S3 offers will suite +James needs. +* We do provide a guide on xref:[Database benchmarks] that can help identify and fix issues. + +== James configuration + +=== Cassandra + +People tunning for performance would likely accept relaxing their consistency needs. James allows doing this. + +**LightWeight Transactions (LWT)** can be disabled where they are not essential. This can be done within +xref:[cassandra.properties]: + +.... +mailbox.read.strong.consistency=false +message.read.strong.consistency=false +message.write.strong.consistency.unsafe=false +mailrepository.strong.consistency=false +.... + +Also, James allows for **Read repairs** where consistency checks are piggy backed on reads randomly. This of course +comes at a performance cost as it generates extre reads, thus minimizing read repair probability can help improving +performance. This can be done within +xref:[cassandra.properties]: + +.... +mailbox.read.repair.chance=0.00 +mailbox.counters.read.repair.chance.max=0.000 +mailbox.counters.read.repair.chance.one.hundred=0.000 +.... + +One can also avoid some Cassandra requests by disabling ACLs (meaning users will only access to the mailboxes they own, +all mailbox-sharing features will thus not be achievable). This can be done within +xref:[cassandra.properties]: + +.... +acl.enabled=false +.... + +Important settings in the `` file includes: + +* Throttling: if too low then the Cassandra cluster is under-utilized. If too high, request bursts can cause significant +Cassandra overload. + +.... + advanced.throttler { + class = org.apache.james.backends.cassandra.utils.LockLessConcurrencyLimitingRequestThrottler + + max-queue-size = 10000 + + max-concurrent-requests = 192 + } +.... + +=== Object storage + +We recommend the use of the blob store cache, which will be populated by email headers which shall be treated as metadata. + +`blob.properties`: + +.... +cache.enable=true +cache.cassandra.ttl=1year +cache.cassandra.timeout=100ms +cache.sizeThresholdInBytes=16 KiB +.... + +=== RabbitMQ + +We recommend against the use of the CassandraMailQueueView, as browsing and advanced queue management features +is unnecessary for Mail Delivery Agent and are not meaningful in the absence of delays. + +Similarly, we recommend turning off queue size metrics, which are expensive to compute. + +We also recommend against the use of publish confirms, which comes at a high performance price. + +In `rabbitmq.properties`: + +.... +cassandra.view.enabled=false + +mailqueue.size.metricsEnabled=false + +event.bus.publish.confirm.enabled=false +mailqueue.publish.confirm.enabled=false +.... + +=== JMAP protocol + +If you are not using JMAP, disabling it will avoid you the cost of populating related projections and thus is recommended. +Within `jmap.properties`: + +.... +enabled=false +.... + +We recommend turning on EmailQueryView as it enables resolution of mailbox listing against Cassandra, thus unlocking massive +stability / performance gains. Within `jmap.properties`: + +.... +view.email.query.enabled=true +.... + +=== IMAP / SMTP + +We recommend against resolving client connection DNS names. This behaviour can be disabled via a system property within +`jvm.properties`: + +.... +james.protocols.mdc.hostname=false +.... + +Concurrent IMAP request count is the critical setting. In `imapServer.xml`: + +.... +<concurrentRequests>200</concurrentRequests> +<maxQueueSize>4096</maxQueueSize> +.... + +Other recommendation includes avoiding unecessary work upon IMAP IDLE, not starting dedicated BOSS threads: + +.... +<ignoreIDLEUponProcessing>false</ignoreIDLEUponProcessing> +<bossWorkerCount>0</bossWorkerCount> +.... + +=== Other generic recommendations + +* Remove unneeded listeners / mailets +* Reduce duplication of Matchers within mailetcontainer.xml +* Limit usage of "DEBUG" loglevel. INFO should be more than decent in most cases. +* While GC tunning is a science in itself, we had good results with G1GC and a low pause time: + +.... +-Xlog:gc*:file=/root/gc.log -XX:MaxGCPauseMillis=20 -XX:ParallelGCThreads=2 +.... + +* We recommand tunning bach sizes: `batchsizes.properties`. This allows, limiting parallel S3 reads, while loading many +messages concurrently on Cassandra, and improves IMAP massive operations support. + +.... +fetch.metadata=200 +fetch.headers=30 +fetch.body=30 +fetch.full=30 + +copy=8192 + +move=8192 +.... \ No newline at end of file --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@james.apache.org For additional commands, e-mail: notifications-h...@james.apache.org