Since some discussions at the all-hands around audio sandboxing and
mtransport's blocking linux sandbox tightening, and discussions with the
necko team, I decided to explore the options available to us, before we
got too far down a path-of-least-resistance.  The following is the
result of that analysis and discussions with people involved
(Media/WebRTC/Networking/Sandboxing teams).

The full document with details on the pluses and minuses of each option
is here:
https://docs.google.com/document/d/1cwc153l1Vo6CDuzCf7M7WbfFyHLqOcPq3JMwwYuJMRQ

  Randell Jesup

Media, WebRTC and Network Sandboxing plans

This document is meant to lay out the options for Media and Webrtc
sandboxing, as well as options for Necko.  Changes need to made here in
order to tighten the Content sandboxes further, in particular for Audio
input/output and because some mtransport code (ICE/TURN) uses OS calls
for IP/interface discovery that we want to lock down.  Before we start
making changes, we should vet the design and determine if it makes sense
to move more code & where the moved code should live - Master (“Chrome
Process” in MDN docs), Content, or a separate sandbox.

Conclusion: I recommend two new sandboxes in the short term (option 4.5
below) - one for Audio in/out and Video in, one for mtransport.  (We
combine with the audio/video one if the sandboxes cause too much
(memory, startup) overhead.)  For later stages, we should first consider
sandboxing Necko protocols (especially new/external ones) ala GMP, and
finally we should consider adding to the media sandbox MSG, WebAudio,
gUM, and PeerConnection/pipeline code.  That last would be a much larger
change, though with some significant architectural security advantages
potentially.

The current state:
* Video input lives in Master (CamerasParent) and talks to CamerasChild in 
Content
* Video display is handled by Compositor (via independent sandbox and IPC 
channel)
* Audio input and output are handled in Content - both by cubeb in
  full_duplex, and output by cubeb and input by webrtc.org code for
  non-full-duplex.
* WebRTC networking code (media/mtransport) handles ICE/TURN and routing
  RTP/RTCP over the ICE/TURN channels via IPC, using UDPSocket/TCPSocket
  code with packet filters. 
* mtransport talks to PeerConnections and JSEP/signaling
* Some IP/interface discovery code in mtransport makes OS calls from Content
* PeerConnections run in the Content code
* WebRTC codecs run there too, except for things like OpenH264 (in GMP sandbox)
* MediaDecoders run in Content (jya has made some changes here on some
  platforms; DXVA decoders run in the GPU process) 
* Codecs run in Content, GPU , or in GMP sandboxes (especially EME)
* Mac may do similarly in the future
* MediaStreamGraph (MSG) runs in each Content process, as needed.  In
  some cases if used by chrome code it will run in Master as well.
* MSG runs either off a timer (if there’s no sound output) or off of
  audio driver callbacks on OS driver threads (with sound output).
* There can be multiple MSGs in the same process (for example for
  disconnected non-realtime MSGs used for WebAudio, and in the future
  multiple MSGs to handle output device selection) 
* WebAudio runs as part of MSG
* Necko code (used by HTTP/etc and by mtransport for low-level IO) runs
  in Master
  This includes protocols, encryption, cache and many other bits

Needs:
* For sandboxing work, we want to lock down Content process such that
  audio must move out of Content. 
* Deadlines for output data must be met to avoid underruns; tougher
  though doable when the request must transit IPC 
* mtransport needs to stop making OS calls from Content

Wants:
* Minimize complexity
* IPC adds complexity, especially when anything is synchronous.
* Complexity adds bugs
* Complexity slows down maintenance and makes code harder to understand
* Security
* A sandbox is more secure than the Master process
* Especially for code that touches arbitrary data from outside -
  i.e. codec data (encode (can come from canvas) or especially decode),
  and network data (packets can contain most anything if it gets by the
  packet filter, which is mostly about origin). 
* Complex code (codecs) often has hidden bugs, and fuzzing helps but
  doesn’t guarantee holes don’t exist. 
* ICE/TURN is pretty complex networking code, and a good chunk of it is
  in legacy C (no refcounting/etc). 
* Developing new wireline protocols (such as QUIC) in userspace adds new
  risks, especially as fixes and improvements will be rapidly iterated
  on, and they’re exposed to crafted packets. 
* Failure of the code in a sandbox doesn’t give up the whole system
* Firewalling (sandboxing) vulnerable media/networking code from Content
  is also useful, since a break of the media/networking code would
  require a sandbox escape to get access to raw content (passwords, bank
  data, etc). 
* This is part of why GMP is used (though not the only reason).
* Running imported code in a sandbox, especially large complicated code
  that changes often, reduces the risk of a vulnerability being usefully
  exploited. 
* Performance
* IPC and use of shmems is slower than sharing between threads.
  Though for video input, we found the extra processing in this case to be 
unimportant.
* Avoiding IPC for audio is a win (OS Priority threads, less chance of underrun)
* Avoid/minimize memory bloat

Options:
Note: moving items into the Master process are included here, since
people have been planning or suggesting just that, and I want to
document the pros and cons of those options as well as more complex
ones. 

1) Move cubeb to Master; provide new IPC interfaces for mtransport for IP 
discovery
2) Move cubeb to Master; move mtransport to Master
3) Move cubeb to Master; create new sandbox process with direct network
   access and move mtransport there 
4) Create new sandbox process with network access and audio/video device
   access, and move cubeb, video input, and mtransport there 
4.5) Create two new sandbox processes, one for audio/video, one for
   mtransport --- Note, this is the solution I'm recommending for the
   short term.
5) Create new sandbox process with network access and audio/video device
   access, and move cubeb, video input, and mtransport there, and also
   move MSG, WebAudio there 
6) Create new sandbox process with network access and audio/video device
   access, and move cubeb, video input, and mtransport there, and also
   move MSG, WebAudio and PeerConnection there 
7) Create new sandbox process with network access and audio/video device
   access, and move cubeb, video input, and mtransport there, and also
   move MSG, WebAudio, PeerConnection and MediaDecoders there 

8) Move necko to a new sandbox process (combined with #3-7, or an
   independent sandbox).  Moving just the protocol pieces or STS stuff
   may be (relatively) simpler, and protect the most vulnerable bits
   exposed to the raw external packets.  This would also keep PSM (I
   think) and cache access out of the sandbox, and avoid having to
   remote them.  Note: we could use one sandbox for all protocols, or
   one per protocol - very similar to GMP media sandboxing.


Detailed Pros/Cons of each one are in the document:
https://docs.google.com/document/d/1cwc153l1Vo6CDuzCf7M7WbfFyHLqOcPq3JMwwYuJMRQ

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to