Benoit Tellier created JAMES-3544:
-------------------------------------
Summary: JMAP uploaded blobs are never deleted
Key: JAMES-3544
URL: https://issues.apache.org/jira/browse/JAMES-3544
Project: James Server
Issue Type: Sub-task
Components: Blob, JMAP
Affects Versions: 3.6.0
Reporter: Benoit Tellier
Assignee: Antoine Duprat
This is a concern both to privacy and cost control (as one need to pay for
storage).
JMAP deploys no method to delete uploaded blobs (maybe I could propose
something on the IETF)
https://jmap.io/spec-core.html#uploading-binary-data suggest that the server
might decide to delete the data.
{code:java}
Under rare circumstances, the server may have deleted the blob before the
client uses it;
the client should keep a reference to the local file so it can upload it again
in such a situation.
{code}
*Root cause of the issue*
We rely on the AttachmentManager for uploads - which is inherited from JMAP
draft.
Attachment manager uses the following fallback right mechanism:
- First see if the user accessing content is holding a message referencing
that attachment
- If not, second, check if he did upload that attachment.
AttachmentManager holds some data referenced by user messages, thus automatic
deletion without a clear separation of concepts looks scary...
*How:*
We should deprecate the following AttachmentMapper methods (and underlying
storage code) - and simplify AttachmentManager code accordingly:
{code:java}
public interface AttachmentMapper extends Mapper {
// to be deprecated
Publisher<AttachmentMetadata> storeAttachmentForOwner(ContentType
contentType, InputStream attachmentContent, Username owner);
Collection<Username> getOwners(AttachmentId attachmentId) throws
MailboxException;
}
{code}
We should write an UploadedContentRepository, holding only the content, the
content-type, the owner and the size of the data. Upload date can be useful too
even if not requested by JMAP APIs. Backed by the BlobStore (and thus
ObjectStorage), we will need also a metadata system on top of it (Cassandra).
Data expiracy would be achieved via bucket deletion: all data uploaded in a
month are held in a bucket, and at month+2 the bucket can be dropped - in order
to ensure no data younger than a month is deleted. We can likely accept
dandling metadata as no critical data is help there (user, size, content type).
If needed a scroll could come and cleanup expired metadata, but it might be
expensive to run.
A webAdmin endpoint would trigger the cleanup and rely on an external
scheduler to trigger the cleanup.
We follow a similar design on the DeletedMessageVault
(https://issues.apache.org/jira/browse/JAMES-2811)
I bet my team could be working on this topic, but we do not have a plan on this
just yet.
*Impact*
- Blob uploaded before this proposed changed will be accessible via the use of
the AttachmentManager uploader right path (before its deletion), inaccessible
after
- Cleanup of blobContent uploaded before this change gets applied is a non
goal of my proposal. A separate batch could be use, reading cassandra data, and
deleting uploaded blobs. A task could maybe even be exposed for such needs...
*Definition of done*
Demostrate data expiracy in an integration test, paying with a mocked clock
injected via guice.
Documentation needs to be written so that admins do not forget to schedule the
cleanup task.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]