Re: [Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-18 Thread Julien Grall



On 09/12/2018 02:51 PM, Volodymyr Babchuk wrote:

Hi,


Hi,



On 12.09.18 13:59, Julien Grall wrote:

Hi Volodymyr,

On 09/11/2018 08:30 PM, Volodymyr Babchuk wrote:

On 11.09.18 14:53, Julien Grall wrote:

On 10/09/18 18:44, Volodymyr Babchuk wrote:

On 10.09.18 16:01, Julien Grall wrote:

On 03/09/18 17:54, Volodymyr Babchuk wrote:

OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.


Can you explain why you always need to keep the shared buffer 
mapped in Xen? Why not using access_guest_memory_by_ipa every time 
you want to get information from the guest?
Sorry, I just didn't know about this mechanism. But for performance 
reasons,

I'd like to keep this buffers always mapped. You see, RPC returns are
very frequent (for every IRQ, actually). So I think, it will be costly
to map/unmap this buffer every time.


This is a bit misleading... This copy will *only* happen for IRQ 
during an RPC. What are the chances for that? Fairly limited. If 
this is happening too often, then the map/unmap here will be your 
least concern.
Now, this copy will happen for every IRQ when CPU is in S-EL1/S-EL0 
mode. Chances are quite high, I must say.
Look: OP-TEE or (TA) is doing something, like encrypting some buffer, 
for example. IRQ fires, OP-TEE immediately executes RPC return (right 
from interrupt handler), so NW can handle interrupt. Then NW returns 
control back to OP-TEE, if it wants to.


I understand this... But the map/unmap should be negligible over the 
rest of the context.
I thought that map/unmap is quite costly operation, but I can be wrong 
there.


At the moment, map/unmap is nearly a nop on Arm64 because all the RAM is 
mapped (I would avoid to assume that thought :)). The only cost if going 
through the p2m to translate the IPA to PA.


For Arm32, each CPUs has its own page-tables and the map/unmap (and TLB 
flush) will be done locally. I would still expect the impact to be minimal.


Note that today map_domain_page on Arm32 is quite simplistic. It would 
be possible to optimize it for lowering the impact of map/unmap.


[...]





It feels quite suspicious to free the memory in Xen before calling 
OP-TEE. I think this need to be done afterwards.


No, it is OP-TEE asked to free buffer. This function is called, 
when NW returns from the RPC. So at this moment NW freed the buffer.


But you forward that call to OP-TEE after. So what would OP-TEE do 
with that?

Happily resume interrupted work. There is how RPC works:

1. NW client issues STD call (or yielding call in terms of SMCCC)
2. OP-TEE starts its work, but it is needed to be interrupted for some
    reason: IRQ arrived, it wants to block on a mutex, it asks NW to do
    some work (like allocating memory or loading TA). This is called 
"RPC

    return".
3. OP-TEE suspends thread and does return from SMC call with code
    OPTEE_SMC_RPC_VAL(SOME_CMD) in a0, and some optional parameters in
    other registers
4. NW sees that this is a RPC, and not completed STD call, so it does
    SOME_CMD and  issues another SMC with code
    OPTEE_SMC_CALL_RETURN_FROM_RPC in a0
5. OP-TEE wakes up suspended thread and continues execution
6. pts 2-5 are repeated until OP-TEE finishes the work
7. It returns from last SMC call with code OPTEE_SMC_RETURN_SUCCESS/
    OPTEE_SMC_RETURN_some_error in a0.
8. optee driver sees that call from pt.1 is finished at least and
    returns control back to client


Thank you for the explanation. As I mentioned in another thread, it 
would be good to have some kind of highly level explanation in the 
tree and all those interaction. If it is already existing, then 
pointer in the code.
High level is covered at [1], and  low level is covered in already 
mentioned header files.


Could you add those pointers at the top of the OP-TEE file?


But I don't know about any explanation at detail level I gave you above.


That's fine. Can you add that in the commit message?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-12 Thread Volodymyr Babchuk

Hi,

On 12.09.18 13:59, Julien Grall wrote:

Hi Volodymyr,

On 09/11/2018 08:30 PM, Volodymyr Babchuk wrote:

On 11.09.18 14:53, Julien Grall wrote:

On 10/09/18 18:44, Volodymyr Babchuk wrote:

On 10.09.18 16:01, Julien Grall wrote:

On 03/09/18 17:54, Volodymyr Babchuk wrote:

OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.


Can you explain why you always need to keep the shared buffer 
mapped in Xen? Why not using access_guest_memory_by_ipa every time 
you want to get information from the guest?
Sorry, I just didn't know about this mechanism. But for performance 
reasons,

I'd like to keep this buffers always mapped. You see, RPC returns are
very frequent (for every IRQ, actually). So I think, it will be costly
to map/unmap this buffer every time.


This is a bit misleading... This copy will *only* happen for IRQ 
during an RPC. What are the chances for that? Fairly limited. If this 
is happening too often, then the map/unmap here will be your least 
concern.
Now, this copy will happen for every IRQ when CPU is in S-EL1/S-EL0 
mode. Chances are quite high, I must say.
Look: OP-TEE or (TA) is doing something, like encrypting some buffer, 
for example. IRQ fires, OP-TEE immediately executes RPC return (right 
from interrupt handler), so NW can handle interrupt. Then NW returns 
control back to OP-TEE, if it wants to.


I understand this... But the map/unmap should be negligible over the 
rest of the context.
I thought that map/unmap is quite costly operation, but I can be wrong 
there.




This is how long job in OP-TEE can be preempted by linux kernel, for 
example. Timer IRQ ensures that control will be returned to linux, 
scheduler schedules some other task and OP-TEE patiently waits until 
its caller is scheduled back, so it can resume the work.




However, I would like to see any performance comparison here to 
weight with the memory impact in Xen (Arm32 have limited amount of VA 
available).

With current configuration, this is maximum 16 pages per guest.
As for performance comparison... This is doable, but will take  some 
time.


Let me write it differently, I will always chose the safe side until 
this is strictly necessary or performance has been proven. I might be 
convinced for just 16 pages, although it feels like a premature 
optimization...


Okay, then I'll stick with memory copy helpers for now.





It feels quite suspicious to free the memory in Xen before calling 
OP-TEE. I think this need to be done afterwards.


No, it is OP-TEE asked to free buffer. This function is called, when 
NW returns from the RPC. So at this moment NW freed the buffer.


But you forward that call to OP-TEE after. So what would OP-TEE do 
with that?

Happily resume interrupted work. There is how RPC works:

1. NW client issues STD call (or yielding call in terms of SMCCC)
2. OP-TEE starts its work, but it is needed to be interrupted for some
    reason: IRQ arrived, it wants to block on a mutex, it asks NW to do
    some work (like allocating memory or loading TA). This is called "RPC
    return".
3. OP-TEE suspends thread and does return from SMC call with code
    OPTEE_SMC_RPC_VAL(SOME_CMD) in a0, and some optional parameters in
    other registers
4. NW sees that this is a RPC, and not completed STD call, so it does
    SOME_CMD and  issues another SMC with code
    OPTEE_SMC_CALL_RETURN_FROM_RPC in a0
5. OP-TEE wakes up suspended thread and continues execution
6. pts 2-5 are repeated until OP-TEE finishes the work
7. It returns from last SMC call with code OPTEE_SMC_RETURN_SUCCESS/
    OPTEE_SMC_RETURN_some_error in a0.
8. optee driver sees that call from pt.1 is finished at least and
    returns control back to client


Thank you for the explanation. As I mentioned in another thread, it 
would be good to have some kind of highly level explanation in the tree 
and all those interaction. If it is already existing, then pointer in 
the code.
High level is covered at [1], and  low level is covered in already 
mentioned header files.

But I don't know about any explanation at detail level I gave you above.




Looking at that code, I just noticed there potential race condition 
here. Nothing prevent a guest to call twice with the same 
optee_thread_id.

OP-TEE has internal check against this.


I am not sure how OP-TEE internal check would help here. The user may 
know that thread-id 1 exist and will call it from 2 vCPUs concurrently.


So handle_rpc will fin

Re: [Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-12 Thread Julien Grall

Hi Volodymyr,

On 09/11/2018 08:30 PM, Volodymyr Babchuk wrote:

On 11.09.18 14:53, Julien Grall wrote:

On 10/09/18 18:44, Volodymyr Babchuk wrote:

On 10.09.18 16:01, Julien Grall wrote:

On 03/09/18 17:54, Volodymyr Babchuk wrote:

OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.


Can you explain why you always need to keep the shared buffer mapped 
in Xen? Why not using access_guest_memory_by_ipa every time you want 
to get information from the guest?
Sorry, I just didn't know about this mechanism. But for performance 
reasons,

I'd like to keep this buffers always mapped. You see, RPC returns are
very frequent (for every IRQ, actually). So I think, it will be costly
to map/unmap this buffer every time.


This is a bit misleading... This copy will *only* happen for IRQ 
during an RPC. What are the chances for that? Fairly limited. If this 
is happening too often, then the map/unmap here will be your least 
concern.
Now, this copy will happen for every IRQ when CPU is in S-EL1/S-EL0 
mode. Chances are quite high, I must say.
Look: OP-TEE or (TA) is doing something, like encrypting some buffer, 
for example. IRQ fires, OP-TEE immediately executes RPC return (right 
from interrupt handler), so NW can handle interrupt. Then NW returns 
control back to OP-TEE, if it wants to.


I understand this... But the map/unmap should be negligible over the 
rest of the context.




This is how long job in OP-TEE can be preempted by linux kernel, for 
example. Timer IRQ ensures that control will be returned to linux, 
scheduler schedules some other task and OP-TEE patiently waits until its 
caller is scheduled back, so it can resume the work.




However, I would like to see any performance comparison here to weight 
with the memory impact in Xen (Arm32 have limited amount of VA 
available).

With current configuration, this is maximum 16 pages per guest.
As for performance comparison... This is doable, but will take  some time.


Let me write it differently, I will always chose the safe side until 
this is strictly necessary or performance has been proven. I might be 
convinced for just 16 pages, although it feels like a premature 
optimization...






It feels quite suspicious to free the memory in Xen before calling 
OP-TEE. I think this need to be done afterwards.


No, it is OP-TEE asked to free buffer. This function is called, when 
NW returns from the RPC. So at this moment NW freed the buffer.


But you forward that call to OP-TEE after. So what would OP-TEE do 
with that?

Happily resume interrupted work. There is how RPC works:

1. NW client issues STD call (or yielding call in terms of SMCCC)
2. OP-TEE starts its work, but it is needed to be interrupted for some
    reason: IRQ arrived, it wants to block on a mutex, it asks NW to do
    some work (like allocating memory or loading TA). This is called "RPC
    return".
3. OP-TEE suspends thread and does return from SMC call with code
    OPTEE_SMC_RPC_VAL(SOME_CMD) in a0, and some optional parameters in
    other registers
4. NW sees that this is a RPC, and not completed STD call, so it does
    SOME_CMD and  issues another SMC with code
    OPTEE_SMC_CALL_RETURN_FROM_RPC in a0
5. OP-TEE wakes up suspended thread and continues execution
6. pts 2-5 are repeated until OP-TEE finishes the work
7. It returns from last SMC call with code OPTEE_SMC_RETURN_SUCCESS/
    OPTEE_SMC_RETURN_some_error in a0.
8. optee driver sees that call from pt.1 is finished at least and
    returns control back to client


Thank you for the explanation. As I mentioned in another thread, it 
would be good to have some kind of highly level explanation in the tree 
and all those interaction. If it is already existing, then pointer in 
the code.





Looking at that code, I just noticed there potential race condition 
here. Nothing prevent a guest to call twice with the same 
optee_thread_id.

OP-TEE has internal check against this.


I am not sure how OP-TEE internal check would help here. The user may 
know that thread-id 1 exist and will call it from 2 vCPUs concurrently.


So handle_rpc will find a context associated to it and use it for 
execute_std_call. If OP-TEE return an error (or is done with it), you 
will end up to free twice the same context.


Did I miss anything?



So it would be possible for two vCPU to call concurrently the same 
command and free it.

Maybe you noticed that mediator uses shadow buffer to read cookie id.


Re: [Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-11 Thread Volodymyr Babchuk

Hi Julien,

On 11.09.18 14:53, Julien Grall wrote:



On 10/09/18 18:44, Volodymyr Babchuk wrote:

Hi Julien,

On 10.09.18 16:01, Julien Grall wrote:

Hi Volodymyr,

On 03/09/18 17:54, Volodymyr Babchuk wrote:

OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.


Can you explain why you always need to keep the shared buffer mapped 
in Xen? Why not using access_guest_memory_by_ipa every time you want 
to get information from the guest?
Sorry, I just didn't know about this mechanism. But for performance 
reasons,

I'd like to keep this buffers always mapped. You see, RPC returns are
very frequent (for every IRQ, actually). So I think, it will be costly
to map/unmap this buffer every time.


This is a bit misleading... This copy will *only* happen for IRQ during 
an RPC. What are the chances for that? Fairly limited. If this is 
happening too often, then the map/unmap here will be your least concern.
Now, this copy will happen for every IRQ when CPU is in S-EL1/S-EL0 
mode. Chances are quite high, I must say.
Look: OP-TEE or (TA) is doing something, like encrypting some buffer, 
for example. IRQ fires, OP-TEE immediately executes RPC return (right 
from interrupt handler), so NW can handle interrupt. Then NW returns 
control back to OP-TEE, if it wants to.


This is how long job in OP-TEE can be preempted by linux kernel, for 
example. Timer IRQ ensures that control will be returned to linux, 
scheduler schedules some other task and OP-TEE patiently waits until its 
caller is scheduled back, so it can resume the work.




However, I would like to see any performance comparison here to weight 
with the memory impact in Xen (Arm32 have limited amount of VA available).

With current configuration, this is maximum 16 pages per guest.
As for performance comparison... This is doable, but will take  some time.

[...]

+static void free_shm_rpc(struct domain_ctx *ctx, uint64_t cookie)
+{
+    struct shm_rpc *shm_rpc;
+    bool found = false;
+
+    spin_lock(&ctx->lock);
+
+    list_for_each_entry( shm_rpc, &ctx->shm_rpc_list, list )
+    {
+    if ( shm_rpc->cookie == cookie )


What does guarantee you the cookie will be uniq?

Normal World guarantees. This is the part of the protocol.


By NW, do you mean the guest? You should know by now we should not trust 
what the guest is doing. If you think it is still fine, then I would 
like some writing to explain what is the impact of a guest putting twice 
the same cookie ID.
Ah, I see your point. Yes, I'll add check to ensure that cookie is not 
reused.

Thank you for  pointing to this.



It feels quite suspicious to free the memory in Xen before calling 
OP-TEE. I think this need to be done afterwards.


No, it is OP-TEE asked to free buffer. This function is called, when 
NW returns from the RPC. So at this moment NW freed the buffer.


But you forward that call to OP-TEE after. So what would OP-TEE do with 
that?

Happily resume interrupted work. There is how RPC works:

1. NW client issues STD call (or yielding call in terms of SMCCC)
2. OP-TEE starts its work, but it is needed to be interrupted for some
   reason: IRQ arrived, it wants to block on a mutex, it asks NW to do
   some work (like allocating memory or loading TA). This is called "RPC
   return".
3. OP-TEE suspends thread and does return from SMC call with code
   OPTEE_SMC_RPC_VAL(SOME_CMD) in a0, and some optional parameters in
   other registers
4. NW sees that this is a RPC, and not completed STD call, so it does
   SOME_CMD and  issues another SMC with code
   OPTEE_SMC_CALL_RETURN_FROM_RPC in a0
5. OP-TEE wakes up suspended thread and continues execution
6. pts 2-5 are repeated until OP-TEE finishes the work
7. It returns from last SMC call with code OPTEE_SMC_RETURN_SUCCESS/
   OPTEE_SMC_RETURN_some_error in a0.
8. optee driver sees that call from pt.1 is finished at least and
   returns control back to client


Looking at that code, I just noticed there potential race condition 
here. Nothing prevent a guest to call twice with the same optee_thread_id.

OP-TEE has internal check against this.

So it would be possible for two vCPU to call concurrently the same 
command and free it.
Maybe you noticed that mediator uses shadow buffer to read cookie id. So 
it will free the buffer mentioned by OP-TEE.

Basically what happened:

1. OP-TEE asks "free buffer with cookie X" in RPC return
2. guests says "I freed that buffer" in SMC call
3. mediator frees b

Re: [Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-11 Thread Julien Grall



On 10/09/18 18:44, Volodymyr Babchuk wrote:

Hi Julien,

On 10.09.18 16:01, Julien Grall wrote:

Hi Volodymyr,

On 03/09/18 17:54, Volodymyr Babchuk wrote:

OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.


Can you explain why you always need to keep the shared buffer mapped 
in Xen? Why not using access_guest_memory_by_ipa every time you want 
to get information from the guest?
Sorry, I just didn't know about this mechanism. But for performance 
reasons,

I'd like to keep this buffers always mapped. You see, RPC returns are
very frequent (for every IRQ, actually). So I think, it will be costly
to map/unmap this buffer every time.


This is a bit misleading... This copy will *only* happen for IRQ during 
an RPC. What are the chances for that? Fairly limited. If this is 
happening too often, then the map/unmap here will be your least concern.


However, I would like to see any performance comparison here to weight 
with the memory impact in Xen (Arm32 have limited amount of VA available).






Signed-off-by: Volodymyr Babchuk 
---
  xen/arch/arm/tee/optee.c | 121 
++-

  1 file changed, 119 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 1008eba..6d6b51d 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -21,6 +21,7 @@
  #include 
  #define MAX_STD_CALLS   16
+#define MAX_RPC_SHMS    16
  /*
   * Call context. OP-TEE can issue multiple RPC returns during one 
call.

@@ -35,11 +36,22 @@ struct std_call_ctx {
  int rpc_op;
  };
+/* Pre-allocated SHM buffer for RPC commands */
+struct shm_rpc {
+    struct list_head list;
+    struct optee_msg_arg *guest_arg;
+    struct page *guest_page;
+    mfn_t guest_mfn;
+    uint64_t cookie;
+};
+
  struct domain_ctx {
  struct list_head list;
  struct list_head call_ctx_list;
+    struct list_head shm_rpc_list;
  struct domain *domain;
  atomic_t call_ctx_count;
+    atomic_t shm_rpc_count;
  spinlock_t lock;
  };
@@ -145,8 +157,10 @@ static int optee_enable(struct domain *d)
  ctx->domain = d;
  INIT_LIST_HEAD(&ctx->call_ctx_list);
+    INIT_LIST_HEAD(&ctx->shm_rpc_list);
  atomic_set(&ctx->call_ctx_count, 0);
+    atomic_set(&ctx->shm_rpc_count, 0);
  spin_lock_init(&ctx->lock);
  spin_lock(&domain_ctx_list_lock);
@@ -256,11 +270,81 @@ static struct std_call_ctx 
*find_call_ctx(struct domain_ctx *ctx, int thread_id)

  return NULL;
  }
+static struct shm_rpc *allocate_and_map_shm_rpc(struct domain_ctx 
*ctx, paddr_t gaddr,


I would prefer if you pass a gfn instead of the address here.


+    uint64_t cookie)


NIT: Indentation


+{
+    struct shm_rpc *shm_rpc;
+    int count;
+
+    count = atomic_add_unless(&ctx->shm_rpc_count, 1, MAX_RPC_SHMS);
+    if ( count == MAX_RPC_SHMS )
+    return NULL;
+
+    shm_rpc = xzalloc(struct shm_rpc);
+    if ( !shm_rpc )
+    goto err;
+
+    shm_rpc->guest_mfn = lookup_and_pin_guest_ram_addr(gaddr, NULL);
+
+    if ( mfn_eq(shm_rpc->guest_mfn, INVALID_MFN) )
+    goto err;
+
+    shm_rpc->guest_arg = map_domain_page_global(shm_rpc->guest_mfn);
+    if ( !shm_rpc->guest_arg )
+    {
+    gprintk(XENLOG_INFO, "Could not map domain page\n");


You don't unpin the guest page if Xen can't map the page.


+    goto err;
+    }
+    shm_rpc->cookie = cookie;
+
+    spin_lock(&ctx->lock);
+    list_add_tail(&shm_rpc->list, &ctx->shm_rpc_list);
+    spin_unlock(&ctx->lock);
+
+    return shm_rpc;
+
+err:
+    atomic_dec(&ctx->shm_rpc_count);
+    xfree(shm_rpc);
+    return NULL;
+}
+
+static void free_shm_rpc(struct domain_ctx *ctx, uint64_t cookie)
+{
+    struct shm_rpc *shm_rpc;
+    bool found = false;
+
+    spin_lock(&ctx->lock);
+
+    list_for_each_entry( shm_rpc, &ctx->shm_rpc_list, list )
+    {
+    if ( shm_rpc->cookie == cookie )


What does guarantee you the cookie will be uniq?

Normal World guarantees. This is the part of the protocol.


By NW, do you mean the guest? You should know by now we should not trust 
what the guest is doing. If you think it is still fine, then I would 
like some writing to explain what is the impact of a guest putting twice 
the same cookie ID.


[...]

It feels quite suspicious to free the memory in Xen before calling 
OP-TEE. I think this need to be done afterwards.


No, it is OP-TEE asked to free buffer. This func

Re: [Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-10 Thread Volodymyr Babchuk

Hi Julien,

On 10.09.18 16:01, Julien Grall wrote:

Hi Volodymyr,

On 03/09/18 17:54, Volodymyr Babchuk wrote:

OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.


Can you explain why you always need to keep the shared buffer mapped in 
Xen? Why not using access_guest_memory_by_ipa every time you want to get 
information from the guest?

Sorry, I just didn't know about this mechanism. But for performance reasons,
I'd like to keep this buffers always mapped. You see, RPC returns are
very frequent (for every IRQ, actually). So I think, it will be costly
to map/unmap this buffer every time.



Signed-off-by: Volodymyr Babchuk 
---
  xen/arch/arm/tee/optee.c | 121 
++-

  1 file changed, 119 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 1008eba..6d6b51d 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -21,6 +21,7 @@
  #include 
  #define MAX_STD_CALLS   16
+#define MAX_RPC_SHMS    16
  /*
   * Call context. OP-TEE can issue multiple RPC returns during one call.
@@ -35,11 +36,22 @@ struct std_call_ctx {
  int rpc_op;
  };
+/* Pre-allocated SHM buffer for RPC commands */
+struct shm_rpc {
+    struct list_head list;
+    struct optee_msg_arg *guest_arg;
+    struct page *guest_page;
+    mfn_t guest_mfn;
+    uint64_t cookie;
+};
+
  struct domain_ctx {
  struct list_head list;
  struct list_head call_ctx_list;
+    struct list_head shm_rpc_list;
  struct domain *domain;
  atomic_t call_ctx_count;
+    atomic_t shm_rpc_count;
  spinlock_t lock;
  };
@@ -145,8 +157,10 @@ static int optee_enable(struct domain *d)
  ctx->domain = d;
  INIT_LIST_HEAD(&ctx->call_ctx_list);
+    INIT_LIST_HEAD(&ctx->shm_rpc_list);
  atomic_set(&ctx->call_ctx_count, 0);
+    atomic_set(&ctx->shm_rpc_count, 0);
  spin_lock_init(&ctx->lock);
  spin_lock(&domain_ctx_list_lock);
@@ -256,11 +270,81 @@ static struct std_call_ctx *find_call_ctx(struct 
domain_ctx *ctx, int thread_id)

  return NULL;
  }
+static struct shm_rpc *allocate_and_map_shm_rpc(struct domain_ctx 
*ctx, paddr_t gaddr,


I would prefer if you pass a gfn instead of the address here.


+    uint64_t cookie)


NIT: Indentation


+{
+    struct shm_rpc *shm_rpc;
+    int count;
+
+    count = atomic_add_unless(&ctx->shm_rpc_count, 1, MAX_RPC_SHMS);
+    if ( count == MAX_RPC_SHMS )
+    return NULL;
+
+    shm_rpc = xzalloc(struct shm_rpc);
+    if ( !shm_rpc )
+    goto err;
+
+    shm_rpc->guest_mfn = lookup_and_pin_guest_ram_addr(gaddr, NULL);
+
+    if ( mfn_eq(shm_rpc->guest_mfn, INVALID_MFN) )
+    goto err;
+
+    shm_rpc->guest_arg = map_domain_page_global(shm_rpc->guest_mfn);
+    if ( !shm_rpc->guest_arg )
+    {
+    gprintk(XENLOG_INFO, "Could not map domain page\n");


You don't unpin the guest page if Xen can't map the page.


+    goto err;
+    }
+    shm_rpc->cookie = cookie;
+
+    spin_lock(&ctx->lock);
+    list_add_tail(&shm_rpc->list, &ctx->shm_rpc_list);
+    spin_unlock(&ctx->lock);
+
+    return shm_rpc;
+
+err:
+    atomic_dec(&ctx->shm_rpc_count);
+    xfree(shm_rpc);
+    return NULL;
+}
+
+static void free_shm_rpc(struct domain_ctx *ctx, uint64_t cookie)
+{
+    struct shm_rpc *shm_rpc;
+    bool found = false;
+
+    spin_lock(&ctx->lock);
+
+    list_for_each_entry( shm_rpc, &ctx->shm_rpc_list, list )
+    {
+    if ( shm_rpc->cookie == cookie )


What does guarantee you the cookie will be uniq?

Normal World guarantees. This is the part of the protocol.


+    {
+    found = true;
+    list_del(&shm_rpc->list);
+    break;
+    }
+    }
+    spin_unlock(&ctx->lock);


At this point you have a shm_rpc in hand to free. But what does 
guarantee you no-one will use it?

This is valid point. I'll revisit this part of the code, thank you.
Looks like I need some refcount there.


+
+    if ( !found ) {
+    return;
+    }


No need for the {} in a one-liner.


+
+    if ( shm_rpc->guest_arg ) {


Coding style:

if ( ... )
{


+    unpin_guest_ram_addr(shm_rpc->guest_mfn);
+    unmap_domain_page_global(shm_rpc->guest_arg);
+    }
+
+    xfree(shm_rpc);
+}
+
  static void optee_domain_destroy(struct domain *d)
  {
  struct arm_smccc_res resp;
  struct domain_ctx *ctx;
  struct std_call_ctx *call, *call_tmp;
+    struct shm_r

Re: [Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-10 Thread Julien Grall

Hi Volodymyr,

On 03/09/18 17:54, Volodymyr Babchuk wrote:

OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.


Can you explain why you always need to keep the shared buffer mapped in 
Xen? Why not using access_guest_memory_by_ipa every time you want to get 
information from the guest?




Signed-off-by: Volodymyr Babchuk 
---
  xen/arch/arm/tee/optee.c | 121 ++-
  1 file changed, 119 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 1008eba..6d6b51d 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -21,6 +21,7 @@
  #include 
  
  #define MAX_STD_CALLS   16

+#define MAX_RPC_SHMS16
  
  /*

   * Call context. OP-TEE can issue multiple RPC returns during one call.
@@ -35,11 +36,22 @@ struct std_call_ctx {
  int rpc_op;
  };
  
+/* Pre-allocated SHM buffer for RPC commands */

+struct shm_rpc {
+struct list_head list;
+struct optee_msg_arg *guest_arg;
+struct page *guest_page;
+mfn_t guest_mfn;
+uint64_t cookie;
+};
+
  struct domain_ctx {
  struct list_head list;
  struct list_head call_ctx_list;
+struct list_head shm_rpc_list;
  struct domain *domain;
  atomic_t call_ctx_count;
+atomic_t shm_rpc_count;
  spinlock_t lock;
  };
  
@@ -145,8 +157,10 @@ static int optee_enable(struct domain *d)
  
  ctx->domain = d;

  INIT_LIST_HEAD(&ctx->call_ctx_list);
+INIT_LIST_HEAD(&ctx->shm_rpc_list);
  
  atomic_set(&ctx->call_ctx_count, 0);

+atomic_set(&ctx->shm_rpc_count, 0);
  spin_lock_init(&ctx->lock);
  
  spin_lock(&domain_ctx_list_lock);

@@ -256,11 +270,81 @@ static struct std_call_ctx *find_call_ctx(struct 
domain_ctx *ctx, int thread_id)
  return NULL;
  }
  
+static struct shm_rpc *allocate_and_map_shm_rpc(struct domain_ctx *ctx, paddr_t gaddr,


I would prefer if you pass a gfn instead of the address here.


+uint64_t cookie)


NIT: Indentation


+{
+struct shm_rpc *shm_rpc;
+int count;
+
+count = atomic_add_unless(&ctx->shm_rpc_count, 1, MAX_RPC_SHMS);
+if ( count == MAX_RPC_SHMS )
+return NULL;
+
+shm_rpc = xzalloc(struct shm_rpc);
+if ( !shm_rpc )
+goto err;
+
+shm_rpc->guest_mfn = lookup_and_pin_guest_ram_addr(gaddr, NULL);
+
+if ( mfn_eq(shm_rpc->guest_mfn, INVALID_MFN) )
+goto err;
+
+shm_rpc->guest_arg = map_domain_page_global(shm_rpc->guest_mfn);
+if ( !shm_rpc->guest_arg )
+{
+gprintk(XENLOG_INFO, "Could not map domain page\n");


You don't unpin the guest page if Xen can't map the page.


+goto err;
+}
+shm_rpc->cookie = cookie;
+
+spin_lock(&ctx->lock);
+list_add_tail(&shm_rpc->list, &ctx->shm_rpc_list);
+spin_unlock(&ctx->lock);
+
+return shm_rpc;
+
+err:
+atomic_dec(&ctx->shm_rpc_count);
+xfree(shm_rpc);
+return NULL;
+}
+
+static void free_shm_rpc(struct domain_ctx *ctx, uint64_t cookie)
+{
+struct shm_rpc *shm_rpc;
+bool found = false;
+
+spin_lock(&ctx->lock);
+
+list_for_each_entry( shm_rpc, &ctx->shm_rpc_list, list )
+{
+if ( shm_rpc->cookie == cookie )


What does guarantee you the cookie will be uniq?


+{
+found = true;
+list_del(&shm_rpc->list);
+break;
+}
+}
+spin_unlock(&ctx->lock);


At this point you have a shm_rpc in hand to free. But what does 
guarantee you no-one will use it?



+
+if ( !found ) {
+return;
+}


No need for the {} in a one-liner.


+
+if ( shm_rpc->guest_arg ) {


Coding style:

if ( ... )
{


+unpin_guest_ram_addr(shm_rpc->guest_mfn);
+unmap_domain_page_global(shm_rpc->guest_arg);
+}
+
+xfree(shm_rpc);
+}
+
  static void optee_domain_destroy(struct domain *d)
  {
  struct arm_smccc_res resp;
  struct domain_ctx *ctx;
  struct std_call_ctx *call, *call_tmp;
+struct shm_rpc *shm_rpc, *shm_rpc_tmp;
  bool found = false;
  
  /* At this time all domain VCPUs should be stopped */

@@ -290,7 +374,11 @@ static void optee_domain_destroy(struct domain *d)
  list_for_each_entry_safe( call, call_tmp, &ctx->call_ctx_list, list )
  free_std_call_ctx(ctx, call);
  
+list_for_each_entry_safe( shm_rpc, shm_rpc_tmp, &ctx->shm_rpc_list, list )

+free_shm_rpc(ctx, shm_rpc->cookie);
+
  ASSERT

[Xen-devel] [PATCH v2 08/13] optee: add support for RPC SHM buffers

2018-09-03 Thread Volodymyr Babchuk
OP-TEE usually uses the same idea with command buffers (see
previous commit) to issue RPC requests. Problem is that initially
it has no buffer, where it can write request. So the first RPC
request it makes is special: it requests NW to allocate shared
buffer for other RPC requests. Usually this buffer is allocated
only once for every OP-TEE thread and it remains allocated all
the time until shutdown.

Mediator needs to pin this buffer(s) to make sure that domain can't
transfer it to someone else. Also it should be mapped into XEN
address space, because mediator needs to check responses from
guests.

Signed-off-by: Volodymyr Babchuk 
---
 xen/arch/arm/tee/optee.c | 121 ++-
 1 file changed, 119 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 1008eba..6d6b51d 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -21,6 +21,7 @@
 #include 
 
 #define MAX_STD_CALLS   16
+#define MAX_RPC_SHMS16
 
 /*
  * Call context. OP-TEE can issue multiple RPC returns during one call.
@@ -35,11 +36,22 @@ struct std_call_ctx {
 int rpc_op;
 };
 
+/* Pre-allocated SHM buffer for RPC commands */
+struct shm_rpc {
+struct list_head list;
+struct optee_msg_arg *guest_arg;
+struct page *guest_page;
+mfn_t guest_mfn;
+uint64_t cookie;
+};
+
 struct domain_ctx {
 struct list_head list;
 struct list_head call_ctx_list;
+struct list_head shm_rpc_list;
 struct domain *domain;
 atomic_t call_ctx_count;
+atomic_t shm_rpc_count;
 spinlock_t lock;
 };
 
@@ -145,8 +157,10 @@ static int optee_enable(struct domain *d)
 
 ctx->domain = d;
 INIT_LIST_HEAD(&ctx->call_ctx_list);
+INIT_LIST_HEAD(&ctx->shm_rpc_list);
 
 atomic_set(&ctx->call_ctx_count, 0);
+atomic_set(&ctx->shm_rpc_count, 0);
 spin_lock_init(&ctx->lock);
 
 spin_lock(&domain_ctx_list_lock);
@@ -256,11 +270,81 @@ static struct std_call_ctx *find_call_ctx(struct 
domain_ctx *ctx, int thread_id)
 return NULL;
 }
 
+static struct shm_rpc *allocate_and_map_shm_rpc(struct domain_ctx *ctx, 
paddr_t gaddr,
+uint64_t cookie)
+{
+struct shm_rpc *shm_rpc;
+int count;
+
+count = atomic_add_unless(&ctx->shm_rpc_count, 1, MAX_RPC_SHMS);
+if ( count == MAX_RPC_SHMS )
+return NULL;
+
+shm_rpc = xzalloc(struct shm_rpc);
+if ( !shm_rpc )
+goto err;
+
+shm_rpc->guest_mfn = lookup_and_pin_guest_ram_addr(gaddr, NULL);
+
+if ( mfn_eq(shm_rpc->guest_mfn, INVALID_MFN) )
+goto err;
+
+shm_rpc->guest_arg = map_domain_page_global(shm_rpc->guest_mfn);
+if ( !shm_rpc->guest_arg )
+{
+gprintk(XENLOG_INFO, "Could not map domain page\n");
+goto err;
+}
+shm_rpc->cookie = cookie;
+
+spin_lock(&ctx->lock);
+list_add_tail(&shm_rpc->list, &ctx->shm_rpc_list);
+spin_unlock(&ctx->lock);
+
+return shm_rpc;
+
+err:
+atomic_dec(&ctx->shm_rpc_count);
+xfree(shm_rpc);
+return NULL;
+}
+
+static void free_shm_rpc(struct domain_ctx *ctx, uint64_t cookie)
+{
+struct shm_rpc *shm_rpc;
+bool found = false;
+
+spin_lock(&ctx->lock);
+
+list_for_each_entry( shm_rpc, &ctx->shm_rpc_list, list )
+{
+if ( shm_rpc->cookie == cookie )
+{
+found = true;
+list_del(&shm_rpc->list);
+break;
+}
+}
+spin_unlock(&ctx->lock);
+
+if ( !found ) {
+return;
+}
+
+if ( shm_rpc->guest_arg ) {
+unpin_guest_ram_addr(shm_rpc->guest_mfn);
+unmap_domain_page_global(shm_rpc->guest_arg);
+}
+
+xfree(shm_rpc);
+}
+
 static void optee_domain_destroy(struct domain *d)
 {
 struct arm_smccc_res resp;
 struct domain_ctx *ctx;
 struct std_call_ctx *call, *call_tmp;
+struct shm_rpc *shm_rpc, *shm_rpc_tmp;
 bool found = false;
 
 /* At this time all domain VCPUs should be stopped */
@@ -290,7 +374,11 @@ static void optee_domain_destroy(struct domain *d)
 list_for_each_entry_safe( call, call_tmp, &ctx->call_ctx_list, list )
 free_std_call_ctx(ctx, call);
 
+list_for_each_entry_safe( shm_rpc, shm_rpc_tmp, &ctx->shm_rpc_list, list )
+free_shm_rpc(ctx, shm_rpc->cookie);
+
 ASSERT(!atomic_read(&ctx->call_ctx_count));
+ASSERT(!atomic_read(&ctx->shm_rpc_count));
 
 xfree(ctx);
 }
@@ -452,6 +540,32 @@ out:
 return ret;
 }
 
+static void handle_rpc_func_alloc(struct domain_ctx *ctx,
+  struct cpu_user_regs *regs)
+{
+paddr_t ptr = get_user_reg(regs, 1) << 32 | get_user_reg(regs, 2);
+
+if ( ptr & (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1) )
+gprintk(XENLOG_WARNING, "Domain returned invalid RPC command 
buffer\n");
+
+if ( ptr ) {
+uint64_t cookie = get_user_reg(regs, 4) << 32 | get_user_reg(regs, 5);
+struct shm_rpc *shm_rpc;
+
+shm_rpc = allocate_and_map_shm