So in my previous email I mentioned the need for an Invitation protocol. The idea is to allow someone who's in a Tahoe grid to type:
tahoe admin invite Bob # -> "ixyn6bxeq6ydr3us6k3emwa23yq" and get back a short "Invitation Code",$IC. Then they deliver this to someone else, via some out-of-band mechanism (maybe they email it, or send it over IM, or even transcribe it over the phone). The recipient installs Tahoe, creates a node (no Introducer, no parameters), and does: tahoe admin accept Alice ixyn6bxeq6ydr3us6k3emwa23yq and then, after the message-passing dust has settled, Alice and Bob will wind up with each others' public keys, associated with the petnames they offered, and maybe some extra contact information (hostname/port). Everything else can be bootstrapped from those. My assumption is that we're allowed to bake-in a "relay": some server that provides a broadcast channel which all clients know how to access. This can be application-specific (e.g. all Tahoe clients would use http://relay.tahoe-lafs.org). By embedding the contact information (URL) into the application, we don't have to put it in the IC code. (you should be able to use your own relay, if you like, but that'll make the IC longer). I also assume that the IC remains secret until the protocol is complete, but is likely to become public after that point, so if we need any forward-secrecy, it must come from something other than the IC. I've spent a lot of time thinking about different protocols to make this safe, at various tradeoffs between length of the invitation code, restrictions imposed on how you use it, and resistance to attack. The attacker's goal here is to get her own keys injected into Alice and Bob's tables, allowing her to perform a persistent MitM attack on them. Getting her key into just one of their tables might be interesting too. A less interesting DoS attack is to flood the channel with messages, preventing the protocol from completion. ## Protocol If the IC can be at least 128 bits (plus a one-byte version marker), like "ixyn6bxeq6ydr3us6k3emwa23yq", and if the data being exchanged is public, then the simplest safe thing to do is this: Alice: IC = "i"+base32(os.urandom(128/8)) HMACkey,DestroyCap = HKDF(IC) ChannelID = HKDF(DestroyCap) relay.create(ChannelID, (msg1, HMAC(HMACkey, msg1))) Relay: creates messages[ChannelID] = [] stores (msg1, HMAC) under messages[ChannelID] Bob: receives IC computes HMACkey, ChannelID does relay.get(ChannelID), fetches all messages[ChannelID] checks HMAC, accepts msg1 relay.add(ChannelID, (msg2, HMAC(HMACkey, msg2))) Relay: adds (msg2, HMAC) to messages[ChannelID] Alice: polls relay.get(ChannelID) waiting for msg2 checks HMAC, accepts msg2 relay.destroy(DestroyCap) Relay: computes ChannelID=HKDF(DestroyCap) deletes messages[ChannelID] (here HKDF and HMAC are both using SHA256) The IC is base32-encoded in case it needs to be verbally transcribed, and contains no punctuation to be cut-and-paste -friendly and avoid line-wrapping inside email bodies. If the client wants to use an alternate relay, the relay URL should be base32-encoded (to retain those properties) and appended to the IC, taking advantage of the fixed length of the random part. A URL of "http://1.2.3.4:5678/relay" would result in an IC like: qa3wihm2bwtpewpepohz644kg4uhi5dqhixs6mjogixdglruhi2tmnzyf5zgk3dbpe If the data being exchanged needs to be kept secret, then msg1 and msg2 should be one-time Curve25519 public keys, and you add a msg3/msg4 which are both boxed (using nacl_box, so encrypted and authenticated by the curve25519-generated shared key). This provides forward-secrecy: as long as Mallory doesn't successfully inject her own curve25519 pubkey in the msg1/msg2 phase, and if Alice and Bob delete the generated private keys before anyone learns them, then Mallory is forever locked out of the msg3/msg4 contents, even after the IC is revealed. (there's an alternate scheme in which the IC is used to generate a symmetric auth+encrypt key, but that fails to provide forward secrecy once the IC is revealed, and might require a 256-bit key to achieve 128-bit security, not sure). ## Attacks The best attack is for Mallory to find a pre-image of the public ChannelID, allowing her to forge the HMAC and get Bob (and then Alice) to accept an alternate msg1. With a 128-bit IC, this attack ought to require 2^128 operations. (someone please tell me if I'm wrong.. I believe that we care about pre-images rather than collisions, so we don't need a 256-bit IC to achieve a 128-bit security level). Mallory can compute this in a Rainbow Table ahead of time, and use the same table against everybody, so shorter codes (e.g. 64-bit) are breakable (I estimate about $600K and 256GB to build that table). Since Mallory learns the ChannelID by eavesdropping, she can call relay.add() and inject bogus messages into the channel. The relay has no way to distinguish these from real ones. Alice and Bob will ignore them, but they might be annoying. This could be avoided by deriving a private Ed25519 signing key from IC, making ChannelID be the corresponding public key, and making the relay (and clients) check signatures. (again, this might require a 256-bit IC to achieve 128-bit security against this noise, but even 64-bit security against this noise would be plenty). ## Shorter Invitation Codes I've also thought about what needs to happen if you want to use a shorter code than 128 bits. I haven't put any numbers to it, but in general, as the code gets shorter, you need to add in more context, to reduce the attacker's ability to brute-force the IC (given whatever information they have available). The basic form is A=KDF(IC,stuff) and then derive HMACkey and the ChannelID from A. Some tools include: * add key-stretching into the derivation of A, like a few seconds of scrypt(). This increases the cost of the pre-image attack. * include public context into A: things like the sender's email address, and the receiver's email address. This prevents the attacker from amortizing guessing attacks across all users, so they must attack (Alice->Bob) separately from Alice->Carol, or other->Bob, or even Bob->Alice. On the downside, it requires that Alice accurately tell her client about Bob's email address (easy if her client is sending the email anyways, harder if the IC will go over IM), and the same for Bob (requires manual transcription of the email's "From" header, so error-prone) * for mobile-to-mobile applications (think Bump), include the rough location of the two nodes into A (trying multiple times to deal with fencepost errors), so that the attacker needs to know Bob's location to successfully spoof him. This may be easy for Alice (who is face-to-face with Bob), but expensive/annoying for Mallory. * when integrated with instant-messaging clients, include the user-account or chat-room-name information in A. This turns into OTR. * for interactive applications, when IC is too short to identify a single node (imagine millions of people are performing invitations at the same time), have the relay return a list of potential applicants, winnowed by name or proximity, then have Alice and Bob select icons on a screen with personal non-private information (picture, avatar, full name) to indicate the expected partner. Once partner is selected, perform the protocol. * put more trust into the relay (allowing it to perform attacks that other intruders cannot): * use secure transport to the relay, by embedding the relay's pubkey into the application. This removes eavesdropping and offline attacks. The external attacker must constantly poll for likely ChannelIDs, hoping to beat Bob to his relay.add() call. * have the relay publish a frequently-changing 256-bit salt. Include the salt when computing A. The receiver fetches salts too, and must try several different (older) ones until they find one that matches, depending upon how stale the invitation is. * The attacker must re-compute their preimage table for each salt. The attack window thus starts when a salt is published, and closes when Bob responds to the message. Needing to constantly rebuild the preimage table, even for windows during which Alice and Bob do not run the protocol (almost all of them), can be a large ongoing expense. The cost depends upon the size of the window (the relay could publish a new salt every 5 minutes, and Bob could tolerate invitations up to 20 minutes old), which depends upon how long an Invitation lasts before it expires. * to deter a colluding Relay from pre-computing salts (giving the attacker more time to build their pre-image table), require them to include a hash of some public predictable value, like a stock price (q.v. "geohashing"), or the bitcoin block chain. Clients must verify, of course. * relay can actively rate-limit guessing attacks, forcing attacker to use more IP (expensive) addresses. Other rate-limiting tools are available: CAPTCHA, proof-of-work, bitcoin-based bonds. Using Bitcoin could let you set arbitrarily high costs-per-guess, at the expense of including settlement time in the overall latency budget. * derive ChannelID only from public context, turning offline attacks ("find preimage of channelid") into online (guessing channelid, asking "is this channelid valid?"), which can be subject to rate-limiting * adaptive IC lengths: if more context information is available (e.g. sending invitation over email means Bob's address is easy to use, or integration with an IRC client makes the channel name available), then reduce the IC length. If Invitations are to be valid for several days (e.g. delivery over email rather than over IM or phone), make the IC longer. With the right context, rate-limiting techniques, and UI work, I suspect this could allow safe use of codes as short as a few bits. But, for Tahoe's current purposes, I think the 128-bit code is short enough, plenty simple, and entirely safe. cheers, -Brian _______________________________________________ tahoe-dev mailing list tahoe-dev@tahoe-lafs.org https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev