Folks:

I've never been satisfied with our garbage collection scheme. It
requires that the client repeatedly refresh leases on its data, and if
it fails to do so then the server will eventually delete that data. It
makes me feel unsafe about the longevity of my data. What if I get
sick? What if I the renewer script breaks and I don't notice that it
broke?

A temporary work-around is disable garbage collection so that the
server never deletes any data! That's what LeastAuthority.com does,
for now. The drawback is that if you sign up with us and upload some
files, and then you decide you don't need to keep those and you unlink
those files from your directories, you're still going to pay for the
storage space for us to hold copies of the ciphertext of those files
($1.00/GB/month) until you close that LeastAuthority.com account or
until we fix this technical limitation.

Okay, so I've often expressed my dissatisfaction with this arrangement
in the past, and wished for a more permanent agreement: "You, storage
server, hold onto this *UNTIL I TELL YOU OTHERWISE*.". Brian has
rightly questioned the usefulness of this instruction. First of all,
why would the storage server agree to do you an infinitely costly
service? It's not like you started by giving him an infinite amount of
money. Secondly, a problem is that you might decide you no longer need
some data, and throw it out of your local worldview (i.e. delete all
links you have that could lead you back to that data), but fail to
inform the storage server that you are done with it (for example, your
network connection or the storage server itself might be down right at
that moment when your client was about to tell the storage server that
you, the user, have permanently lost all interest in that data). If
that happened, the storage server would be stuck holding onto it
forever.

Okay, recently I think I've seen my way clear to how I want to solve
these problems.

First of all, I *am* now convinced that it is useful to make these
kinds of commitments on the part of the storage server. The
fleetingness of all things is a problem that can be dealt with at a
higher layer, and *within the context* of that higher layer, the
storage server will commit to indefinite duration retention until
positively notified to the contrary. In particular, for
LeastAuthority.com, we'll hold your data until death — or account
cancellation or credit card decline — do us part.

(In fact, LeastAuthority.com actually won't delete your data even if
you *do* stop paying. But we'll cease allowing you to upload or
download until you bring your account into good standing. And we do
reserve the right to change our minds and delete your data
eventually.)

Secondly, a good way to handle the problem of forgotten garbage —
stuff that you've deleted all references to but that the storage
server is still holding the ciphertext — is for you to run a "mark and
sweep" or "lease renewal and garbage collection" process, when you are
ready to do so. You tell your local Tahoe-LAFS gateway to do a
"deep-add-lease" on all of your files which are reachable from a
certain starting directory. Make sure that everything you care about
is reachable from there! Then once that's done, you tell the storage
server "Anything that hasn't been marked (lease-renewed) recently, you
can delete that now.".

The only difference between this and the current scheme is that the
storage server will never do that on its own — it only does it when
you tell it that it is okay to do it.

(The current scheme is that the storage server does that automatically
every so often, for example every month.)

There's one consequence of this use case request which affects the
leasedb design.

That is: what if there are two different users, Amber and Bryce, and
Amber has said "Okay server, I've marked everything I care about, and
I hereby cease paying for anything that I haven't marked, so if you
want you can sweep it all out.", but Bryce hasn't (yet) said that.
Bryce has said "Keep everything I've touched until I tell you
otherwise, and I'll pay you to do.".

In order to implement the right behavior here, the storage server is
going to have to remember the fact that Alice has issued a sweep
command but Bob hasn't. This will require additional data to be stored
in the leasedb beyond what we've already designed:

https://github.com/davidsarah/tahoe-lafs/blob/666-accounting/docs/specifications/leasedb.rst

Thanks for listening! Please let me know if I'm making an error here,
including an error in what our customers want. ☺

Regards,

Zooko Wilcox-O'Hearn

Founder, CEO, and Customer Support Rep

https://LeastAuthority.com
_______________________________________________
tahoe-dev mailing list
tahoe-dev@tahoe-lafs.org
https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev

Reply via email to