Folks: I've never been satisfied with our garbage collection scheme. It requires that the client repeatedly refresh leases on its data, and if it fails to do so then the server will eventually delete that data. It makes me feel unsafe about the longevity of my data. What if I get sick? What if I the renewer script breaks and I don't notice that it broke?
A temporary work-around is disable garbage collection so that the server never deletes any data! That's what LeastAuthority.com does, for now. The drawback is that if you sign up with us and upload some files, and then you decide you don't need to keep those and you unlink those files from your directories, you're still going to pay for the storage space for us to hold copies of the ciphertext of those files ($1.00/GB/month) until you close that LeastAuthority.com account or until we fix this technical limitation. Okay, so I've often expressed my dissatisfaction with this arrangement in the past, and wished for a more permanent agreement: "You, storage server, hold onto this *UNTIL I TELL YOU OTHERWISE*.". Brian has rightly questioned the usefulness of this instruction. First of all, why would the storage server agree to do you an infinitely costly service? It's not like you started by giving him an infinite amount of money. Secondly, a problem is that you might decide you no longer need some data, and throw it out of your local worldview (i.e. delete all links you have that could lead you back to that data), but fail to inform the storage server that you are done with it (for example, your network connection or the storage server itself might be down right at that moment when your client was about to tell the storage server that you, the user, have permanently lost all interest in that data). If that happened, the storage server would be stuck holding onto it forever. Okay, recently I think I've seen my way clear to how I want to solve these problems. First of all, I *am* now convinced that it is useful to make these kinds of commitments on the part of the storage server. The fleetingness of all things is a problem that can be dealt with at a higher layer, and *within the context* of that higher layer, the storage server will commit to indefinite duration retention until positively notified to the contrary. In particular, for LeastAuthority.com, we'll hold your data until death — or account cancellation or credit card decline — do us part. (In fact, LeastAuthority.com actually won't delete your data even if you *do* stop paying. But we'll cease allowing you to upload or download until you bring your account into good standing. And we do reserve the right to change our minds and delete your data eventually.) Secondly, a good way to handle the problem of forgotten garbage — stuff that you've deleted all references to but that the storage server is still holding the ciphertext — is for you to run a "mark and sweep" or "lease renewal and garbage collection" process, when you are ready to do so. You tell your local Tahoe-LAFS gateway to do a "deep-add-lease" on all of your files which are reachable from a certain starting directory. Make sure that everything you care about is reachable from there! Then once that's done, you tell the storage server "Anything that hasn't been marked (lease-renewed) recently, you can delete that now.". The only difference between this and the current scheme is that the storage server will never do that on its own — it only does it when you tell it that it is okay to do it. (The current scheme is that the storage server does that automatically every so often, for example every month.) There's one consequence of this use case request which affects the leasedb design. That is: what if there are two different users, Amber and Bryce, and Amber has said "Okay server, I've marked everything I care about, and I hereby cease paying for anything that I haven't marked, so if you want you can sweep it all out.", but Bryce hasn't (yet) said that. Bryce has said "Keep everything I've touched until I tell you otherwise, and I'll pay you to do.". In order to implement the right behavior here, the storage server is going to have to remember the fact that Alice has issued a sweep command but Bob hasn't. This will require additional data to be stored in the leasedb beyond what we've already designed: https://github.com/davidsarah/tahoe-lafs/blob/666-accounting/docs/specifications/leasedb.rst Thanks for listening! Please let me know if I'm making an error here, including an error in what our customers want. ☺ Regards, Zooko Wilcox-O'Hearn Founder, CEO, and Customer Support Rep https://LeastAuthority.com _______________________________________________ tahoe-dev mailing list tahoe-dev@tahoe-lafs.org https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev