First, I am not a member of the security mafia (IANAMOTSM?), so question everything I say here.
Second, please send follow-ups to the [email protected] list. I've been researching Jingle (and, more generally, end-to-end) security. The landscape is a bit confused, so I'm attempting to clarify things (at least in my own mind). As far as I can see, we are interested in several goals: 1. Most pressingly, proper negotiation of a secure data transport for voice and video (or, more generally, any RTP traffic per XEP-0167). 2. A bit less pressingly, proper negotiation of a secure data transport for file transfer, where the transport method could be In-Band Bytestreams ("IBB"; XEP-0047), SOCKS5 Bytestreams (XEP-0065), etc. 3. As a generalization of #2 and #3, proper negotiation of transport method security no matter which streaming or datagram transport is used. 4. Use of Jingle to negotiate end-to-end encryption of XMPP traffic (a.k.a. "XTLS"), where the transport might be IBB or some other streaming transport (this *might* simply be a special case of #2). This email focuses mainly on Goal #1 because that's what I've researched so far. By research I mean a reading of the following specs: http://tools.ietf.org/html/rfc3711 (SRTP) http://tools.ietf.org/html/rfc4347 (DTLS) http://tools.ietf.org/html/rfc4567 http://tools.ietf.org/html/rfc4568 http://tools.ietf.org/html/rfc4572 http://tools.ietf.org/html/draft-ietf-sip-media-security-requirements http://tools.ietf.org/html/draft-ietf-sip-dtls-srtp-framework http://tools.ietf.org/html/draft-ietf-avt-dtls-srtp The following slide deck is also helpful (pretty pictures!): http://www.ietf.org/proceedings/06mar/slides/raiarea-1/sld1.htm For Goal #1, the IETF has settled on SRTP (RFC 3711) because it is optimized for media traffic. (Another alternative would have been RTP over DTLS, but it is not optimized in that way.) However, SRTP does not solve the problem of communicating the keying material that will be used in the transport channel. There are several major proposals for doing that: - SDP Security Descriptions <http://tools.ietf.org/html/rfc4568> (this defines the a=crypto SDP line, which is currently re-used in XEP-0167) - ZRTP <http://tools.ietf.org/html/draft-zimmermann-avt-zrtp> - DTLS-SRTP <http://tools.ietf.org/html/draft-ietf-avt-dtls-srtp> and <http://tools.ietf.org/html/draft-ietf-sip-dtls-srtp-framework> (these define the a=fingerprint SDP line and a method for using it by setting up a DTLS association over the host/port quartet and then pulling the SRTP keying material out of that DTLS association) The "Requirements and Analysis of Media Security Management Protocols" <http://tools.ietf.org/html/draft-ietf-sip-media-security-requirements-09> provides an overview of these and other approaches. According to my reading of RFC 4568, SDP Security Descriptions MUST NOT be used unless the signalling channel (that's XMPP for us) can "provide strong message authentication and packet-payload encryption, as well as effective replay protection". Because we don't provide those services in XMPP out of the box, I don't think we can securely use a=crypto (or our XMLish flavor of a=crypto as currently described in XEP-0167). But we might be able to use it if we negotiate XTLS (or some other e2e method) first. That leaves ZRTP or DTLS-SRTP. ZRTP is completely independent of the signalling channel (or can be, see Section 8 of the ZRTP spec), so we don't need to define anything in Jingle to support it. However, we could provide some hints in the Jingle signalling. For DTLS, we'd need to define an XMPP-friendly mapping of the SDP a=fingerprint line and the various SDP parameters discussed in http://tools.ietf.org/html/draft-ietf-sip-dtls-srtp-framework and http://tools.ietf.org/html/draft-ietf-avt-dtls-srtp -- but this seems fairly straightforward. I have not yet sketched out any of the Jingle (or more general XMPP) protocol bits to make this happen, but I figured I would share the fruits of my research so far. Please do correct me where I'm wrong. More soon. /psa
