Benoit Tellier created JAMES-3544:
-------------------------------------

             Summary: JMAP uploaded blobs are never deleted
                 Key: JAMES-3544
                 URL: https://issues.apache.org/jira/browse/JAMES-3544
             Project: James Server
          Issue Type: Sub-task
          Components: Blob, JMAP
    Affects Versions: 3.6.0
            Reporter: Benoit Tellier
            Assignee: Antoine Duprat


This is a concern both to privacy and cost control (as one need to pay for 
storage).

JMAP deploys no method to delete uploaded blobs (maybe I could propose 
something on the IETF)

https://jmap.io/spec-core.html#uploading-binary-data suggest that the server 
might decide to delete the data.

{code:java}
Under rare circumstances, the server may have deleted the blob before the 
client uses it; 
the client should keep a reference to the local file so it can upload it again 
in such a situation.
{code}

*Root cause of the issue*

We rely on the AttachmentManager for uploads - which is inherited from JMAP 
draft.

Attachment manager uses the following fallback right mechanism:
 - First see if the user accessing content is holding a message referencing 
that attachment
 - If not, second, check if he did upload that attachment.

AttachmentManager holds some data referenced by user messages, thus automatic 
deletion without a clear separation of concepts looks scary...

*How:* 

We should deprecate the following AttachmentMapper methods (and underlying 
storage code) - and simplify AttachmentManager code accordingly:

{code:java}
public interface AttachmentMapper extends Mapper {
    // to be deprecated

    Publisher<AttachmentMetadata> storeAttachmentForOwner(ContentType 
contentType, InputStream attachmentContent, Username owner);

    Collection<Username> getOwners(AttachmentId attachmentId) throws 
MailboxException;
}
{code}

We should write an UploadedContentRepository, holding only the content, the 
content-type, the owner and the size of the data. Upload date can be useful too 
even if not requested by JMAP APIs. Backed by the BlobStore (and thus 
ObjectStorage), we will need also a metadata system on top of it (Cassandra).

Data expiracy would be achieved via bucket deletion: all data uploaded in a 
month are held in a bucket, and at month+2 the bucket can be dropped - in order 
to ensure no data younger than a month is deleted. We can likely accept 
dandling metadata as no critical data is help there (user, size, content type). 
If needed a scroll could come and cleanup expired metadata, but it might be 
expensive to run.

A webAdmin endpoint would trigger the cleanup and rely on an external  
scheduler to trigger the cleanup.

We follow a similar design on the DeletedMessageVault 
(https://issues.apache.org/jira/browse/JAMES-2811)

I bet my team could be working on this topic, but we do not have a plan on this 
just yet.

*Impact*

 - Blob uploaded before this proposed changed will be accessible via the use of 
the AttachmentManager uploader right path (before its deletion), inaccessible 
after
 - Cleanup of blobContent uploaded before this change gets applied is a non 
goal of my proposal. A separate batch could be use, reading cassandra data, and 
deleting uploaded blobs. A task could maybe even be exposed for such needs... 

*Definition of done*

Demostrate data expiracy in an integration test, paying with a mocked clock 
injected via guice.

Documentation needs to be written so that admins do not forget to schedule the 
cleanup task.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to