This is an automated email from the ASF dual-hosted git repository. btellier pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/james-project.git
The following commit(s) were added to refs/heads/master by this push: new 6cc42a7f51 ADR-67 Quota for JMAP uploads (#1688) 6cc42a7f51 is described below commit 6cc42a7f5124e49fd482cc0a95a23583862b6794 Author: Benoit TELLIER <btell...@linagora.com> AuthorDate: Wed Aug 23 22:29:43 2023 +0700 ADR-67 Quota for JMAP uploads (#1688) --- src/adr/0048-cleanup-jmap-uploads.md | 2 + src/adr/0067-quota-for-jmap-uploads.md | 90 ++++++++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+) diff --git a/src/adr/0048-cleanup-jmap-uploads.md b/src/adr/0048-cleanup-jmap-uploads.md index 55a2499ca0..84aa833044 100644 --- a/src/adr/0048-cleanup-jmap-uploads.md +++ b/src/adr/0048-cleanup-jmap-uploads.md @@ -8,6 +8,8 @@ Accepted (lazy consensus). Implemented. +Overridden by [Quota for JMAP uploads](0067-quota-for-jmap-uploads.md) + ## Context JMAP allows users to upload binary content called blobs to be later referenced via method calls. This includes but is not diff --git a/src/adr/0067-quota-for-jmap-uploads.md b/src/adr/0067-quota-for-jmap-uploads.md new file mode 100644 index 0000000000..9d5f6f0228 --- /dev/null +++ b/src/adr/0067-quota-for-jmap-uploads.md @@ -0,0 +1,90 @@ +# 67. Quota for JMAP uploads + +Date: 2023-08-17 + +## Status + +Accepted (lazy consensus). + +Not implemented yet. + +Overrides [ADR-48 Cleaup JMAP uploads](0048-cleanup-jmap-uploads.md). + +## Context + +The [JMAP] protocol offers a distinct API to upload blobs, that can later be referenced when creating emails. The +specification mentions that implementers `SHOULD` enforce a quota for this use case. For security reason this quota is +set by user, and exceeded his quota should result in older data being deleted. + +Apache James currently do not implement limitations on data being uploaded by users, meaning that authenticated user can +essentially store unlimited amount of binary data. This is especially problematic for deployments whose users can be +attackers (eg SAAS). + +## Decision + +Implement quota for JMAP uploads. We need a generic interface for JMAP upload quota current values that existing +implementation can implement. + +Store current values on a per user basis. Current value is increased upon uploads, and decreased when a blob is deleted. + +Limit is set globally via the JMAP configuration. Default value: 10MB. + +## Consequences + +Improved security for SaaS operation. + +Storing such values in Cassandra incurs a cost as it needs extra tables. The count of tables shall be limited (memory and +operational overhead per table.) We plan complementary work to expose a technical Cassandra storage interface for quota, +that can be used to implement arbitrary quota-like use cases. + +Cassandra counters that would be used to keep track of users current space usage are easy to get out of synchronisation +(namely because of counters consistency level ONE usage and non-idempotence causing the driver not to retry failed +updates). We thus need a corrective task in order to recompute the current values. + +Care needs to be taken with concurrency. Given the nature of the quota, we expect data races (because 100MB of storage +space is not much, exceeding the quota should be considered a regular operation. Clients uploading files parallely might +trigger data races upon older data deletion). In practice this means: + - Be eventually consistent and cleanup data after the upload returns as upfront quota validation with JMAP upload +constraints on to of Cassandra counter data model is especially prone to data races + - Upon cleanup, free at least 50% of the space: this would decrease the frequency of updates + - Expose a configurable probability of recomputing the upload quota + - If inconsistent space usage is reported, recompute the quota + +JMAP upload storage evolutions: + - As we add an application behaviour, common for any implementation, we need further layers in the design in +order to mutualize quota handling for all implementations. A service layer `UploadService` would expose the JMAP facade +(today `UploadRepository` interface) and would be responsible to enforce quotas, and related behaviour. Ten it would +act on the storage layer, `UploadReposiory`, implemented by `cassandra` and `memory`. + - Upon exceeded quota, we need to delete older uploads. In order to do so, we need to add the date of upload to the +upload metadata. Migration is trivial: we can assume UNIX timestamp when missing, causing the upload to be considered +the oldest. + - Recomputation of JMAP upload quotas requires listing stored upload metadata, we need to add a way to list uploads +of a user on `UploadReposiory` (without retrieving the uploads contents). + - `UploadService` needs a method to delete + +Asynchronous storage based cleanup using Cassandra TTL and object storage buckets is furthermore out of question as the +application needs to be aware of what is stored in order to expose a coherent quota. We will need to rework JMAP uploads +in order to base it on the date of the items stored in the UploadRepository. + +## Alternatives + +Not operating in SaaS mode would allow to better trust users. As such we might simply document the limitation and +skip the work. Such a proposal is not acceptable for some members of the community. + +We might have chose to store maximum limits for JMAP upload quotas. Doing so requires extensive webadmin endpoints, and +incurs extra Cassandra reads upon uploads, which have a slight minor performance negative impact. Aggregating `global`, +`domain` and `user` scopes together might also be some complex logic to write. All this work is of limited use as a moderate +space like 100MB is plenty enough for dozens of mail to be composed in parallel without issues, even for power user. +Furthermore, the JMAP specification behaviour is lenient once the space is exceeded, hence we never block the user. +This clearly claims for the simpler option. + +Not storing the current value, and just listing actual uploads in order to retrieve the current value might lead to a huge +tombstone read (queue use case) that we likely want to avoid, even if it solves concurrency issues. Furthermore, for such +a frequent use case the performance cost of event sourcing would be a sow stopper (as Casandra implementation is lightweight +transaction based). + +## References + + - [JMAP api for uploads](https://jmap.io/spec-core.html#binary-data) + - [JIRA ticket: JAMES-3925](https://issues.apache.org/jira/browse/JAMES-3925) + - [ADR pull request](https://github.com/apache/james-project/pull/1688) \ No newline at end of file --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@james.apache.org For additional commands, e-mail: notifications-h...@james.apache.org