Since some discussions at the all-hands around audio sandboxing and mtransport's blocking linux sandbox tightening, and discussions with the necko team, I decided to explore the options available to us, before we got too far down a path-of-least-resistance. The following is the result of that analysis and discussions with people involved (Media/WebRTC/Networking/Sandboxing teams).
The full document with details on the pluses and minuses of each option is here: https://docs.google.com/document/d/1cwc153l1Vo6CDuzCf7M7WbfFyHLqOcPq3JMwwYuJMRQ Randell Jesup Media, WebRTC and Network Sandboxing plans This document is meant to lay out the options for Media and Webrtc sandboxing, as well as options for Necko. Changes need to made here in order to tighten the Content sandboxes further, in particular for Audio input/output and because some mtransport code (ICE/TURN) uses OS calls for IP/interface discovery that we want to lock down. Before we start making changes, we should vet the design and determine if it makes sense to move more code & where the moved code should live - Master (“Chrome Process” in MDN docs), Content, or a separate sandbox. Conclusion: I recommend two new sandboxes in the short term (option 4.5 below) - one for Audio in/out and Video in, one for mtransport. (We combine with the audio/video one if the sandboxes cause too much (memory, startup) overhead.) For later stages, we should first consider sandboxing Necko protocols (especially new/external ones) ala GMP, and finally we should consider adding to the media sandbox MSG, WebAudio, gUM, and PeerConnection/pipeline code. That last would be a much larger change, though with some significant architectural security advantages potentially. The current state: * Video input lives in Master (CamerasParent) and talks to CamerasChild in Content * Video display is handled by Compositor (via independent sandbox and IPC channel) * Audio input and output are handled in Content - both by cubeb in full_duplex, and output by cubeb and input by webrtc.org code for non-full-duplex. * WebRTC networking code (media/mtransport) handles ICE/TURN and routing RTP/RTCP over the ICE/TURN channels via IPC, using UDPSocket/TCPSocket code with packet filters. * mtransport talks to PeerConnections and JSEP/signaling * Some IP/interface discovery code in mtransport makes OS calls from Content * PeerConnections run in the Content code * WebRTC codecs run there too, except for things like OpenH264 (in GMP sandbox) * MediaDecoders run in Content (jya has made some changes here on some platforms; DXVA decoders run in the GPU process) * Codecs run in Content, GPU , or in GMP sandboxes (especially EME) * Mac may do similarly in the future * MediaStreamGraph (MSG) runs in each Content process, as needed. In some cases if used by chrome code it will run in Master as well. * MSG runs either off a timer (if there’s no sound output) or off of audio driver callbacks on OS driver threads (with sound output). * There can be multiple MSGs in the same process (for example for disconnected non-realtime MSGs used for WebAudio, and in the future multiple MSGs to handle output device selection) * WebAudio runs as part of MSG * Necko code (used by HTTP/etc and by mtransport for low-level IO) runs in Master This includes protocols, encryption, cache and many other bits Needs: * For sandboxing work, we want to lock down Content process such that audio must move out of Content. * Deadlines for output data must be met to avoid underruns; tougher though doable when the request must transit IPC * mtransport needs to stop making OS calls from Content Wants: * Minimize complexity * IPC adds complexity, especially when anything is synchronous. * Complexity adds bugs * Complexity slows down maintenance and makes code harder to understand * Security * A sandbox is more secure than the Master process * Especially for code that touches arbitrary data from outside - i.e. codec data (encode (can come from canvas) or especially decode), and network data (packets can contain most anything if it gets by the packet filter, which is mostly about origin). * Complex code (codecs) often has hidden bugs, and fuzzing helps but doesn’t guarantee holes don’t exist. * ICE/TURN is pretty complex networking code, and a good chunk of it is in legacy C (no refcounting/etc). * Developing new wireline protocols (such as QUIC) in userspace adds new risks, especially as fixes and improvements will be rapidly iterated on, and they’re exposed to crafted packets. * Failure of the code in a sandbox doesn’t give up the whole system * Firewalling (sandboxing) vulnerable media/networking code from Content is also useful, since a break of the media/networking code would require a sandbox escape to get access to raw content (passwords, bank data, etc). * This is part of why GMP is used (though not the only reason). * Running imported code in a sandbox, especially large complicated code that changes often, reduces the risk of a vulnerability being usefully exploited. * Performance * IPC and use of shmems is slower than sharing between threads. Though for video input, we found the extra processing in this case to be unimportant. * Avoiding IPC for audio is a win (OS Priority threads, less chance of underrun) * Avoid/minimize memory bloat Options: Note: moving items into the Master process are included here, since people have been planning or suggesting just that, and I want to document the pros and cons of those options as well as more complex ones. 1) Move cubeb to Master; provide new IPC interfaces for mtransport for IP discovery 2) Move cubeb to Master; move mtransport to Master 3) Move cubeb to Master; create new sandbox process with direct network access and move mtransport there 4) Create new sandbox process with network access and audio/video device access, and move cubeb, video input, and mtransport there 4.5) Create two new sandbox processes, one for audio/video, one for mtransport --- Note, this is the solution I'm recommending for the short term. 5) Create new sandbox process with network access and audio/video device access, and move cubeb, video input, and mtransport there, and also move MSG, WebAudio there 6) Create new sandbox process with network access and audio/video device access, and move cubeb, video input, and mtransport there, and also move MSG, WebAudio and PeerConnection there 7) Create new sandbox process with network access and audio/video device access, and move cubeb, video input, and mtransport there, and also move MSG, WebAudio, PeerConnection and MediaDecoders there 8) Move necko to a new sandbox process (combined with #3-7, or an independent sandbox). Moving just the protocol pieces or STS stuff may be (relatively) simpler, and protect the most vulnerable bits exposed to the raw external packets. This would also keep PSM (I think) and cache access out of the sandbox, and avoid having to remote them. Note: we could use one sandbox for all protocols, or one per protocol - very similar to GMP media sandboxing. Detailed Pros/Cons of each one are in the document: https://docs.google.com/document/d/1cwc153l1Vo6CDuzCf7M7WbfFyHLqOcPq3JMwwYuJMRQ -- Randell Jesup, Mozilla Corp remove "news" for personal email _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform