On 4/7/10 5:09 AM, Francois Deppierraz wrote: > > URI:LIT:ge3qaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> It looks weird to me to see a cryptographically-secure identifier > which doesn't look random enough. Wouldn't such feature lead to > potential attacks? Ah! What an excellent question! The short answer is no, this is perfectly secure, but the reason why is a great thing to discuss. (if you have time to think but not to read, just ponder the implications of this: % echo -n "" | tahoe put URI:LIT: ) Filecaps (because they are capabilities) are required to be "necessary and sufficient" to access the resource that they represent (in this case, the ability to read/get/learn the bytes of the file). * necessary: if you don't know the filecap, you cannot get the file. This requires unforgeability (I cannot independently create a valid filecap for data that I don't know). In a distributed system, unforgeability is implemented with unguessable strings. * sufficient: If I have the filecap, I don't need any other secrets or abilities ("non-ambient authority") to access the file. Filecaps are transferrable without appealing to some central admin or gatekeeper. LIT filecaps are simply the base32 encoding of the file data, and are used for very small files (I think the threshold is 65 bytes, which is the break-even point at which the LIT filecap is the same length as a typical CHK filecap). They are sufficient (you don't even need network access to turn the LIT filecap into the data), and necessary (if you don't know the filecap for my data, you can't figure out the data). I frequently use a physical analogy. Suppose that we're sitting in a room next to each other (i.e. we have a pre-established secure connection) and I have a book in my hand that I want you to read, because I think it's cool. You have pockets. There are marauding intruders circling outside the room who can snatch things out of your hands or put other things in them (but your pockets are safe). Our hands are safe as long as we're inside the room, but rooms are for meetings, not for reading, so you want to read my book at home later. You want to read the book too, but your time is very limited, so you want to make sure you only read my cool book. One of the intruders is a furniture broker[1] who, the moment you leave the safety of the room, will fill your hands with an Ikea catalog and interior design magazines, and you don't want to accidentally read this garbage instead of my book (this is the integrity/sufficiency property). Also, my book is about something embarrassing, sensitive, and controversial: our mutual admiration of the Git version control system, so we want the ability to keep the identity of the book secret from the Mercurial torch-and-pitchfork mob outside[2] (this is the confidentiality/necessity property). Of course, you might elect to reveal your Git-fondness, which is your own business, and over which I have no control, but the system must have the property that we *can* retain confidentiality if we want to. Now, how can you get home with a copy of the right book, privately? There are two main options: 1: I use the xerox machine in this room to copy the whole book, then hand you the big stack of paper that comes out. You jam the whole stack into your pocket. 2: I use the xerox machine to copy just the back cover, which includes the ISBN number, and hand you the single page that comes out, and you put that in your pocket. (we assume you can later buy the book anonymously, and that ISBNs are strong/immutable references to a specific edition) The first is equivalent to emailing me a file, or storing the whole file on your computer. The second is equivalent to uploading the file into Tahoe and then emailing me the filecap, or storing the filecap on your computer (perhaps as your "rootcap"). The "sufficient" property is provided either directly (you now have a copy of the full book in your pocket) or by the combination of the safe reference in your pocket and the immutable mapping property of ISBNs. The intruder who wants to cause you to read a different book cannot intercept+replace the thing you have in your pocket, nor can they subvert the publishing industry to violate the ISBN-to-content mapping. The "necessary" property is provided by virtue of the fact that the intruder cannot see what I'm handing to you inside the room, or look inside your pocket later. The thing I give you is necessary: the intruders (who do not have it) cannot access the right book. The first involves a full copy of the data, which is expensive (in bandwidth, or storage costs, or pockets), at least in the marginal case where you've already uploaded the file to tahoe and are now looking to hand out a new copy. The cost is proportional to the size of the file. The second is a cheap fixed cost, proportional to the size of a filecap. Now, a LIT filecap is analogous to a really tiny book, perhaps just a single page. It's just as cheap for me to hand you a single page that contains the whole document as it is to hand you a single page that contains the ISBN of the document. It is sufficient, because you now have everything you need to read the book, and it is necessary, because without knowing what I handed you, the intruder cannot find out what book you're reading. (ok, really, I delve into this sort of analogy when I'm talking about signatures and secure data distribution schemes, but I wanted to work out some of the terminology. Besides, the idea of stuffing a whole xeroxed book into my pants pocket makes me laugh.) Another view: The confidentiality of a CHK file can be evaluated by assuming the attacker gets the ciphertext (but not the filecap), and access to some sort of confirmation mechanism (known as an "oracle" in the cryptographic literature). If they're trying to guess your login password, then the oracle is to try to use the password to actually log in. If the encrypted file contains a secure hash of the plaintext (or any error-checking mechanism at all), then the oracle is to try to decrypt the file and then check to see if the error-checking codes look ok. (in this case the oracle is not perfect: sometimes it will give you false positives. I think this is known as a "random oracle", which gives you some probability of saying "yes" that is influenced by the accuracy of your guess). The CHK mechanism is considered secure if the effort the attacker must expend to get your plaintext is sufficiently high (no better than random guessing). But it's a relative thing: does the attacker who already knows thing X get any advantage by learning thing Y? For Tahoe, we assume that attackers (including the storage server) get the shares that you upload, so they know the filesize and the ciphertext. We don't currently include a hash of the plaintext, but for argument's sake let's assume that there is enough error-checking data in the plaintext to allow the attacker to tell whether they've correctly decrypted the data. The plaintext could be any possible N-byte string (they know the file length, so they can rule out strings of all other lengths with no effort). Each guess requires a decryption attempt (and subsequent error-checking test) to confirm or deny. So their average effort is 2**(8*N)/2, no better than brute force. LIT filecaps have the same property, but not derived from cryptography, because there is no ciphertext. The attacker gets nothing, and is asked to distinguish between hypothetical ciphertexts. If you reveal to me that you have a LIT file (perhaps indirectly, by asking my storage server for a mutable-directory share but then not fetching any immutable shares immediately afterwards), then I can probably assume that it's shorter that 65 bytes, but that leaves nearly 2**(8*65) possibilities, and I have no way to distinguish between them (I don't even have a SHA256 hash to use as an oracle). Clearly the attacker has nothing to work with, so they can't do better than random chance. (they don't even get length with LITs). Of course, if you tell me that you have a secret file that's only 2 bytes long, then there aren't very many possibilities, so if I have some other means to ask whether my guess is right or not, then I can figure out your "secret" file without too much work. I'd bet you a zillion dollars that I can guess your secret one-byte file in no more than 256 guesses, and your secret zero-byte file is even easier. I can name that tune in zero notes if it's a work by John Cage :-). > URI:LIT:ge3qaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Oh, and the right way to evaluate the "random enough"-ness is to look at what the attacker gets to see, not what the filecap-holder sees. Two filecaps that we see might be: a: URI:LIT:mzxw6cq b: URI:CHK:onu7qbeeukr7hzijiuez27nmfi:jra4rqppihn6k4ki5tovlodr677nnszd255zzbh6ysjiijiluddq:3:10:291 what the attacker (or storage server) sees is basically: a: (nothing) b: URI:CHK-Verify:yd2musxnsi5lverlvf3hidzgcy:jra4rqppihn6k4ki5tovlodr677nnszd255zzbh6ysjiijiluddq:3:10:291 The "onu7q" encryption key is the thing that must remain unguessable, and the "yd2mus" storage-index is the thing that the attacker gets to use to try and guess it. Those strings must be long enough to be secure. The would-be LIT-cap attacker gets nothing. Huh, if the LIT file didn't base32-encode the data, this property might be even more obvious: % echo -n "here_is_my_secret" |tahoe put - URI:LIT:here_is_my_secret The base32 encoding ("code", not "crypt") is necessary, of course, but it's interesting to see how it smells of security, when in fact it is merely there to let short Tahoe files contain arbitrary 8-bit data but Tahoe filecaps continue to be ascii-safe. So, in short, LIT caps are just as secure as CHK caps, because the attacker never gets to see caps. LIT caps are even more secure than CHK, because attckers don't get error-checking information or ciphertext. But small files are just as guessable inside Tahoe as they are anywhere else. cheers, -Brian [1]: a furniture broker would, of course, be a middleman who negotiates the complex world of furniture sales, matching up buyers with sellers, because furniture, like stocks, bonds, and health insurance plans, are too complicated to simply buy from a store. I very much hope that these people do not actually exist. [2]: as the PyCon talk comparing Git and hg pointed out: SVN is our common enemy, we must destroy them _______________________________________________ tahoe-dev mailing list tahoe-dev@allmydata.org http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev