On Mon, Oct 23, 2017 at 05:59:44PM +0100, Julien Grall wrote: > Hi Volodymyr, Hi Julien,
> Let me begin the e-mail with I am not totally adversed to putting the TEE > mediator in Xen. At the moment, I am trying to understand the whole picture. Thanks for clarification. This is really reassuring :) In my turn, I'm not totally against TEE mediators in stubdoms. I'm only concerned about required efforts. > On 20/10/17 18:37, Volodymyr Babchuk wrote: > >On Fri, Oct 20, 2017 at 02:11:14PM +0100, Julien Grall wrote: > >>On 17/10/17 16:59, Volodymyr Babchuk wrote: > >>>On Mon, Oct 16, 2017 at 01:00:21PM +0100, Julien Grall wrote: > >>>>On 11/10/17 20:01, Volodymyr Babchuk wrote: > >>>>>I want to present TEE mediator, that was discussed earlier ([1]). > >>>>> > >>>>>I selected design with built-in mediators. This is easiest way, > >>>>>it removes many questions, it is easy to implement and maintain > >>>>>(at least I hope so). > >>>> > >>>>Well, it may close the technical questions but still leave the security > >>>>impact unanswered. I would have appreciated a summary of each approach and > >>>>explain the pros/cons. > >>>This is the most secure way also. In terms of trust between guests and > >>>Xen at least. I'm worked with OP-TEE guys mostly, so when I hear about > >>>"security", my first thoughts are "Can TEE OS trust to XEN as a > >>>mediator? Can TEE client trust to XEN as a mediator?". And with > >>>current approach answer is "yes, they can, especially if XEN is a part > >>>of a chain of trust". > >>> > >>>But you probably wanted to ask "Can guest compromise whole system by > >>>using TEE mediator or TEE OS?". This is an interesting question. > >>>First let's discuss requirements for a TEE mediator. So, mediator > >>>should be able to: > >>> > >>> * Receive request to handle trapped SMC. This request should include > >>> user registers + some information about guest (at least domain id). > >>> * Pin/unpin domain memory pages. > >>> * Map domain memory pages into own address space with RW access. > >>> * Issue real SMC to a TEE. > >>> * Receive information about guest creation and destruction. > >>> * (Probably) inject IRQs into a domain (this can be not a requester > >>> domain, > >>> but some other domain, that also called to TEE). > >>> > >>>This is a minimal list of requirements. I think, this should be enough to > >>>implement mediator for OP-TEE. But I can't say for sure for other TEEs. > >>> > >>>Let's consider possible approaches: > >>> > >>>1. Mediator right in XEN, works at EL2. > >>> Pros: > >>> * Mediator can use all XEN APIs > >>> * As mediator resides in XEN, it can be checked together with XEN > >>> for a validity (trusted boot). > >>> * Mediator is initialized before Dom0. Dom0 can work with a TEE. > >>> * No extra context switches, no special ABI between XEN and mediator. > >>> > >>> Cons: > >>> * Because it lives in EL2, it can compromise whole hypervisor, > >>> if there is a security bug in mediator code. > >>> * No support for closed source TEEs. > >> > >>Another cons is you assume TEE API is fully stable and will not change. > >>Imagine a new function is added, or a vendor decided to hence with a new set > >>of API. How will you know Xen is safe to use it? > >With whitelisting, as you correctly suggested below. XEN will process > >only know requests. Anything that looks unfimiliar should be rejected. > > Let's imagine the guest is running on a platform with a newer version of > TEE. This guest will probe the version of OP-TEE and knows the new function > is present. This request will be handled mediator. At this moment, OP-TEE client does not use versions. Instead it uses capability flags. So, mediator should filter all unknown caps. This will force guest to use only supported subset of features. If, in the future, client will relly on versions (i.e. due to dramatic protocol change), mediator can either downgrade version or refuse to work at all. > If as you said Xen is using a whitelist, this means the hypervisor will > return unimplemented. > How do you expect the guest to behave in that case? As I said above, guest should downgrade to supported features subset. > Note that I think a whitelist is a good idea, but I think we need to think a > bit more about the implication. At least now OP-TEE is designed in a such way, that it is compatible in both ways. I'm sure that future OP-TEE development will be done with virtualization support in mind, so it will not break existing setups. > > > >>If it is not safe, this means you have a whitelist solution and therefore > >>tie Xen to a specific OP-TEE version. So if you need to use a new function > >>you would need to upgrade Xen making the code of using new version > >>potentially high. > >Yes, any ABI change between OP-TEE and its clients will require mediator > >upgrade. Luckilly, OP-TEE maintains ABI backward-compatible, so if you'll > >install old XEN and new OP-TEE, OP-TEE will use only that subset of ABI, > >which is known to XEN. > > > >>Also, correct me if I am wrong, OP-TEE is a BSD 2-Clause. This means you > >>impose anyone wanted to modify OP-TEE for their own purpose can make a > >>closed version of the TEE. But if you need to introspect/whitelist call, you > >>impose the vendor to expose their API. > >Basically yes. Is this bad? OP-TEE driver in Linux is licensed under GPL v2. > >If vendor modifies interface between OP-TEE and Linux, they anyways obligued > >to expose API. > > Pardon me for potential stupid questions, my knowledge of OP-TEE is limited. > > My understanding is the OP-TEE will provide a generic way to access > different Trusted Application. While OP-TEE API may be generic, the TA API > is custom. AFAICT the latter is not part of Linux driver. Yes, you are perfectly right there. > So here my questions: > 1) Are you planning allow all the guests to access every Trusted > Applications? This is a good question. There are two types of TAs supported in OP-TEE: real TAs (as they are described in GlobalPlatform specs) and PseudoTAs. The latter ones are statically linked right into OP-TEE kernel and execute at S-EL1 level. Real TAs are provided by client. That means that NW userspace supplicant loads TA into OP-TEE. OP-TEE checks signature for the TA and then runs it in S-EL0. So, I'm planning to allow client to work with any real TA. I can't see real problem there. PseudoTAs can be used to access some platform-specific features, and thus it can be quite dangerous to allow anyone call them. But, generic OP-TEE includes only test and benchmark PseudoTAs, that should be disabled on production builds. So, I don't see why generic mediator should distinguish them. I think, XSM can be employed later to control which guest can access which PseudoTA. But this is not target for first version. > 2) Will you ever need to introspect those messages? No, I don't. > >>> > >>>2. Mediator in a stubdomain. Works at EL1. > >>> Pros: > >>> * Mediator is isolated from hypervisor (but it still can do > >>> potentially > >>> dangerous things like mapping domain memory or pining pages). > >>> * One can legally create and use mediator for a closed-source TEE. > >> > >> * Easier to upgrade to a new version of OP-TEE. > >Yes, this is true. But what about interface between XEN and mediator? > >This is a new entity that should be maintained. Will I abe able to use > >new XEN with old mediator? Or new mediator with old XEN? > > Why would you need to specific interface for the mediator? (see more below) At least following features in XEN control (I hope this is right term) API are missing right now: - domain creation/destruction hooks - ability to intercept only certain SMCs - way to inject IRQs to other guests Also, see more below > > > >>>> Cons: > >>> * Overhead in XEN<->Mediator communication. > >>> * XEN needs to be modified to boot mediator domain before Dom0. > >> > >>Is it a really cons? In the past, we had discussion to allow Xen creating > >>multiple domain, avoiding the overhead of Dom0. This could also benefits > >>here. > >As I understand, this is a significant change in XEN. What are the chances, > >that community will accept this change? As I can see, immediate benefit > >of this is only TEE mediator support. Looks like no one except us > >interested in this topic. > > The GSOC project was not added because of TEE mediator. We had companies > showing interest to start multiple domains at the same time. This would > significantly shrink down the boot time of the whole platform. Yes. Actually, we also interested in a faster boot. But my point was that what we need for mediator is not the same that is described in GSOC project. Functionality described at GSOC page has multiple uses. But for mediator we need something more intricate: as I said below, ability to delay boot of hwdom (and other domains). > > > >BTW, I checked "Xen on ARM: create multiple guests from device > >tree" at [1]. This is close, to what we need, but not exactly. You see, > >TEE mediator should be created *before* Dom0. So actually TEE mediator > >will receive domid 0. I suspect that this only change will break > >many things. > > Can you please give example? I'm sure that I seen checks for domid == 0 before, but now I can't find any. Probably, that was closed-source backends. So, sorry for false accusation :) > Technically none of the hypervisor, Linux and the toolstack should rely on > dom0 to be domid 0. > > AFAIK, the hypervisor and Linux are free of them. It might be possible to > have few hardcoded in the toolstack, but they should really disappear. Totaly agree there. > However, I can't see why you require the mediator to use domid 0. You could > for example keep the hardware domain paused until the mediator has started. So this will like: construct dom0, construct and run mediator domain, run dom0 by signal from DomMediator? Probably this will work. > > > >>> > >>>And yes, it seems obvious, but I want to say this explicitly: generic > >>>TEE mediator framework should and will use XSM to control which domain > >>>can work with TEE. So, if you don't trust your guest - don't let it > >>>to call TEE at all. > >> > >>Correct me if I am wrong. TEE could be used by Android guest which likely > >>run the user apps... right? So are you saying you fully trust that guest and > >>obviously the user installing rogue app? > >I don't think that app downloaded from Play Marget can access OP-TEE > >directly. > >OP-TEE can be used by Android itself as a key storage or to access to a SE, > >for example. But 3rd app that issues TEE calls... I don't think so. > > You didn't get my point here. That rogue app may be able to break into > kernel via an exploit or have enough privilege to break the guest. Who knows > what it will be able to do after... Only what hypervisor and TEE will allow it to do. Look, OP-TEE was not designed to rule the machine. There is ARM TF for that :) OP-TEE's task is to provide some safer environment for sensitive data and code. This environment has well-defined interfaces and is desgined to be as safe as possible. If rogue app breaks into kernel, then it can issue any SMC which it wants. But OP-TEE does not trust to NW. Hypervisor does not trust to guests. Mediator should be written in the same way. So, what can do rogue kernel? As I know - it can cause DoS in OP-TEE. This is known issue. If there is a security bug in OP-TEE, it probably can overcome whole system. But this is true for any system running OP-TEE. If there is a security flaw in mediator - it can compromise either hypervisor, or DomMediator and all TEE-capable guests. Yes, this is a risk. > The whole point of using an hypervisor is to isolate guest from each other. > So what is the isolation model with OP-TEE and the mediator? OP-TEE is written to isolate TAs, resources and clients from each other. Currently there are no plans for interaction between TAs from different VMs, no resource sharing, nothing like this. What do you mean under "isolation model"? Can you give some example? > > > >>>This feature is not implemented in this RFC only because > >>>currently only Dom0 calls are supported. > >>> > >>>>This would help to understand that maybe it is an easy way but also still > >>>>secure... > >>>In previous discussion we considered only two variants: in XEN or outside > >>>XEN. Stubdomain approach looks more secure, but I'm not sure that it is > >>>true. > >>>Such stubdomain will need access to all guests memory. If you managed to > >>>gain control on mediator stubdomain, you can do anything you want with all > >>>guests. > >> > >>That's slightly untrue. The stubdomain will only be able to mess with > >>domains using TEE. > >Yes, this is more strict. Then either you are not allowing your privileged > >domain to use TEE, or your system may be compromised anyways. > > Can you give an example of privilege domain for you? Do you consider Android > a privilege domain? In this case I used term "priviliged domain" in XEN meaning: is_privileged == 1. Android is not privileged domain, by all means. I wanted to say that you if you allow Dom0 to access TEE, then hacked DomMediator can compromise Dom0 and the hypervisor. > >>> > >>>>To be clear, this series don't look controversial at least for OP-TEE. > >>>>What > >>>>I am more concerned is about DomU supports. > >>>Your concern is that rogue DomU can compromise whole system, right? > >> > >>Yes. You seem to assume that DomU using TEE will always be trusted, I think > >>this is the wrong approach if the use is able to interact directly with > >>those guests. See above. > >No, I am not assuming that DomU that calls TEE should be trusted. Why do you > >think so? It should be able to use TEE services, but this does not mean that > >XEN should trust it. > > In a previous answer you said: "So, if you don't trust your guest - don't > let it". For me, this clearly means you consider that DomU using TEE are > trusted. > > So can you clarify by what you mean by trust then? Well... In real world "trust" isn't binary option. You don't want to allow all domains to access TEE. Breached TEE user domain doesn't automatically mean that your whole system is compromised. But this certainly increases attack surface. So it is safer to give TEE access only to those domains, which really require it. You can call them sligtly more trusted, then others. > >Even now, XEN processes requests from DomUs without > >trusting them. Why do you think, that TEE mediator usage will differ? > > I guess you are comparing with vGIC and PL011? IHMO, the main difference is > Xen is taking care alone of the isolation between guest. Here in the TEE > case, you rely on a combination of both TEE and Xen to do the isolation. Yes. This is will be less secure, than TEE-only or hypervisor-only system. > > > >Look, I generally not against idea of TEE mediator in stubdoms. But this > >approach require many changes in existing XEN code: > > > >1. Load domains before Dom0. > > > >2. Add special API for mediator. Or alter existing ones. You can't use > > existing APIs as it, because you need to enforce stricter XSM rules > > on them. > > Mind giving more explanation....? Xen has a default policy for XSM and > indeed may not fit your use case. But you can write your own policy and load > it. Yes. You need policy "allow this stubdom to map memory only from TEE-enabled guests". AFAIK, this is not possible right now. But I can be wrong, I'm not very familiar with XSM. > > > >3. Changes in scheduling to allow TEE mediator use credits/slices of > > calling guest. > > > >4. Support boilerplate code in stubdom. You know, you can't simply > > write mediator in stubdom. You need a kernel. You need to > > maintain it. > > Well, in a way or another someone will have to maintain the mediator... The > kernel does not need to be specific to TEE, it could be a unikernel. Right. But for me XEN looks better maintained "kernel" :) IMHO, XEN is mature, there are less bugs (especially security ones) than in any other kernel. > And before you say again no-one in the community seem to be interested. I > should remind you that Arm is working on it (see development update). You are talking about that "unicore" project by NEC guys? Sorry, can't find mentioned development update. Looks like search on markmail is down (or I'm doing something terribly wrong). > > > >This is a lot of a work. It requires changes in generic parts of XEN. > >I fear it will be very hard to upstream such changes, because no one > >sees an immediate value in them. How do you think, what are my chances > >to upstream this? > > It is fairly annoying to see you justifying back most of this thread with > "no one sees an immediate value in them". > > I am not the only maintainers in Xen, so effectively can't promise whether > it is going to be upstreamed. But I believe the community has been very > supportive so far, a lot of discussions happened (see [2]) because of the > OP-TEE support. So what more do you expect from us? I'm sorry, I didn't mean to offend you or someone else. You, guys, can be harsh sometimes, but I really appreciate help provided by the community. And I, certainly, don't ask you about any guarantees or something of that sort. I'm just bothered by amount of required work and by upstreaming process. But this is not a strong argument against mediators in stubdoms, I think :) Currently I'm developing virtualization support in OP-TEE, so in meantime we'll have much time to discuss mediators and stubdomain approach (if you have time). To test this feature in OP-TEE I'm extending this RFC, making optee.c to look like full-scale mediator. I need to do this anyways, to test OP-TEE. When I'll finish, I can show you how mediator can look like. Maybe this will persuade you to one or another approach. > > > > >Approach in this RFC is much simpler. Few hooks in arch code + additional > >subsystem, which can be easily turned off. > > Stefano do you have any opinion on this discussion? > > Regards, > > > > >[1] https://wiki.xenproject.org/wiki/Outreach_Program_Projects > > [2] > https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg01931.html > -- WBR, Volodymyr Babchuk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel