Re: [Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-18 Thread Julien Grall



On 09/12/2018 01:45 PM, Volodymyr Babchuk wrote:

Hi,


Hi Volodymyr,


On 12.09.18 14:02, Julien Grall wrote:

On 09/11/2018 08:33 PM, Volodymyr Babchuk wrote:
However, 2MB might be too big considering that you also need to 
account the SMC call. Does buffer can be passed for fast call?
No, all such calls are yielding calls, so you can ignore time used 
for SMC call itself.


How come you can ignore it? It has a cost to trap to EL3.

Strictly speaking, yes. All steps has cost: trap to EL3, dispatch in EL3,
switch to S-EL1, new thread preparation in OP-TEE, context switch in 
OP-TEE.
I wanted to tell, that in my opinion, this is negligible in comparison 
with the actual call processing. But maybe, I'm wrong there.


It would be interesting to have a breakdown on the time spent in the 
call (with virtualization). Did you have a chance to do that?


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-12 Thread Volodymyr Babchuk

Hi,

On 12.09.18 14:02, Julien Grall wrote:



On 09/11/2018 08:33 PM, Volodymyr Babchuk wrote:

Hi Julien,


Hi,


On 11.09.18 16:37, Julien Grall wrote:

Hi Volodymyr,

On 10/09/18 19:04, Volodymyr Babchuk wrote:

On 10.09.18 17:02, Julien Grall wrote:

On 03/09/18 17:54, Volodymyr Babchuk wrote:

[...]



+    if ( !pages_data_xen_start )
+    return false;
+
+    shm_buf = allocate_shm_buf(ctx, param->u.tmem.shm_ref, 
num_pages);


In alocate_shm_buf you are now globally limiting the number of 
pages ( (16384) to pin. However, this does not limit per call.


With the current limit, you would could call up to 16384 times 
lookup_and_pin_guest_ram_addr(...). On Arm, for p2m related 
operation, we limit to 512 iterations in one go before checking the 
preemption.

So I think 16384 times is far too much.
So, in other words, I can translate only 2MB buffer (if 4096KB pages 
are used), is it right?


2MB for the whole command. So if you have 5 buffer in the command, 
then the sum of the buffer should not be bigger than 2MB.
4  buffers, but yes, it can be up to 8MB. Okay, I'll add per-call 
counter to limit memory usage for a whole call.


That would need to be reduced to 2MB in total per call. You probably 
want to look at max_order(...).

Yes, this is what I was saying. 512 pages per call.



However, 2MB might be too big considering that you also need to 
account the SMC call. Does buffer can be passed for fast call?
No, all such calls are yielding calls, so you can ignore time used for 
SMC call itself.


How come you can ignore it? It has a cost to trap to EL3.

Strictly speaking, yes. All steps has cost: trap to EL3, dispatch in EL3,
switch to S-EL1, new thread preparation in OP-TEE, context switch in 
OP-TEE.
I wanted to tell, that in my opinion, this is negligible in comparison 
with the actual call processing. But maybe, I'm wrong there.


--
Volodymyr Babchuk

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-12 Thread Julien Grall



On 09/11/2018 08:33 PM, Volodymyr Babchuk wrote:

Hi Julien,


Hi,


On 11.09.18 16:37, Julien Grall wrote:

Hi Volodymyr,

On 10/09/18 19:04, Volodymyr Babchuk wrote:

On 10.09.18 17:02, Julien Grall wrote:

On 03/09/18 17:54, Volodymyr Babchuk wrote:

[...]



+    if ( !pages_data_xen_start )
+    return false;
+
+    shm_buf = allocate_shm_buf(ctx, param->u.tmem.shm_ref, 
num_pages);


In alocate_shm_buf you are now globally limiting the number of pages 
( (16384) to pin. However, this does not limit per call.


With the current limit, you would could call up to 16384 times 
lookup_and_pin_guest_ram_addr(...). On Arm, for p2m related 
operation, we limit to 512 iterations in one go before checking the 
preemption.

So I think 16384 times is far too much.
So, in other words, I can translate only 2MB buffer (if 4096KB pages 
are used), is it right?


2MB for the whole command. So if you have 5 buffer in the command, 
then the sum of the buffer should not be bigger than 2MB.
4  buffers, but yes, it can be up to 8MB. Okay, I'll add per-call 
counter to limit memory usage for a whole call.


That would need to be reduced to 2MB in total per call. You probably 
want to look at max_order(...).




However, 2MB might be too big considering that you also need to 
account the SMC call. Does buffer can be passed for fast call?
No, all such calls are yielding calls, so you can ignore time used for 
SMC call itself.


How come you can ignore it? It has a cost to trap to EL3.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-11 Thread Volodymyr Babchuk

Hi Julien,

On 11.09.18 16:37, Julien Grall wrote:

Hi Volodymyr,

On 10/09/18 19:04, Volodymyr Babchuk wrote:

On 10.09.18 17:02, Julien Grall wrote:

On 03/09/18 17:54, Volodymyr Babchuk wrote:

[...]



+    if ( !pages_data_xen_start )
+    return false;
+
+    shm_buf = allocate_shm_buf(ctx, param->u.tmem.shm_ref, num_pages);


In alocate_shm_buf you are now globally limiting the number of pages 
( (16384) to pin. However, this does not limit per call.


With the current limit, you would could call up to 16384 times 
lookup_and_pin_guest_ram_addr(...). On Arm, for p2m related 
operation, we limit to 512 iterations in one go before checking the 
preemption.

So I think 16384 times is far too much.
So, in other words, I can translate only 2MB buffer (if 4096KB pages 
are used), is it right?


2MB for the whole command. So if you have 5 buffer in the command, then 
the sum of the buffer should not be bigger than 2MB.
4  buffers, but yes, it can be up to 8MB. Okay, I'll add per-call 
counter to limit memory usage for a whole call.


However, 2MB might be too big considering that you also need to account 
the SMC call. Does buffer can be passed for fast call?
No, all such calls are yielding calls, so you can ignore time used for 
SMC call itself.



--
Volodymyr Babchuk

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-11 Thread Julien Grall

Hi Volodymyr,

On 10/09/18 19:04, Volodymyr Babchuk wrote:

On 10.09.18 17:02, Julien Grall wrote:

On 03/09/18 17:54, Volodymyr Babchuk wrote:

+    struct {
+    uint64_t pages_list[PAGELIST_ENTRIES_PER_PAGE];
+    uint64_t next_page_data;
+    } *pages_data_guest, *pages_data_xen, *pages_data_xen_start;
+    struct shm_buf *shm_buf;
+
+    page_offset = param->u.tmem.buf_ptr & 
(OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1);

+
+    size = ROUNDUP(param->u.tmem.size + page_offset,
+   OPTEE_MSG_NONCONTIG_PAGE_SIZE);
+
+    num_pages = DIV_ROUND_UP(size, OPTEE_MSG_NONCONTIG_PAGE_SIZE);
+
+    order = get_order_from_bytes(get_pages_list_size(num_pages));
+
+    pages_data_xen_start = alloc_xenheap_pages(order, 0);


This could be replaced by a _xmalloc and would avoid to allocate more 
memory than necessary when the order is getting bigger.
Thanks. Would it allocate page-aligned buffer?  This is crucial in this 
case. I can't find any documentation on it so I don't know which 
alignment it guarantees.


_xmalloc takes in argument the alignment required for the allocation.






+    if ( !pages_data_xen_start )
+    return false;
+
+    shm_buf = allocate_shm_buf(ctx, param->u.tmem.shm_ref, num_pages);


In alocate_shm_buf you are now globally limiting the number of pages ( 
(16384) to pin. However, this does not limit per call.


With the current limit, you would could call up to 16384 times 
lookup_and_pin_guest_ram_addr(...). On Arm, for p2m related operation, 
we limit to 512 iterations in one go before checking the preemption.

So I think 16384 times is far too much.
So, in other words, I can translate only 2MB buffer (if 4096KB pages are 
used), is it right?


2MB for the whole command. So if you have 5 buffer in the command, then 
the sum of the buffer should not be bigger than 2MB.


However, 2MB might be too big considering that you also need to account 
the SMC call. Does buffer can be passed for fast call?



I think, it will be okay to implement such limitation for this initial
version of mediator. In the future, it would be possible to do RPC 
return from XEN (as OP-TEE does) to finish this request later.





+    if ( !shm_buf )
+    goto err_free;
+
+    gaddr = param->u.tmem.buf_ptr & ~(OPTEE_MSG_NONCONTIG_PAGE_SIZE 
- 1);

+    guest_mfn = lookup_and_pin_guest_ram_addr(gaddr, NULL);
+    if ( mfn_eq(guest_mfn, INVALID_MFN) )
+    goto err_free;
+
+    pages_data_guest = map_domain_page(guest_mfn);
+    if ( !pages_data_guest )
+    goto err_free;
+
+    pages_data_xen = pages_data_xen_start;
+    while ( num_pages ) {
+    struct page_info *page;
+    mfn_t entry_mfn = lookup_and_pin_guest_ram_addr(
+    pages_data_guest->pages_list[entries_on_page], );
+
+    if ( mfn_eq(entry_mfn, INVALID_MFN) )
+    goto err_unmap;
+
+    shm_buf->pages[shm_buf->page_cnt++] = page;
+    pages_data_xen->pages_list[entries_on_page] = 
mfn_to_maddr(entry_mfn);

+    entries_on_page++;
+
+    if ( entries_on_page == PAGELIST_ENTRIES_PER_PAGE ) {
+    pages_data_xen->next_page_data = 
virt_to_maddr(pages_data_xen + 1);

+    pages_data_xen++;
+    gaddr = pages_data_guest->next_page_data;


next_page_data is not a guest address but a machine address. For 
anything related to address, the variable should be named accordingly 
to avoid confusion.

Why? In this case it is IPA that comes from the guest.


Because I misread the variables.


+
+    unmap_domain_page(pages_data_guest);
+    unpin_guest_ram_addr(guest_mfn);
+    return true;
+
+err_unmap:
+    unmap_domain_page(pages_data_guest);
+    unpin_guest_ram_addr(guest_mfn);
+    free_shm_buf(ctx, shm_buf->cookie);
+
+err_free:
+    free_xenheap_pages(pages_data_xen_start, order);
+
+    return false;
+}
+
+static bool translate_params(struct domain_ctx *ctx,
+ struct std_call_ctx *call)
+{
+    unsigned int i;
+    uint32_t attr;
+
+    for ( i = 0; i < call->xen_arg->num_params; i++ ) {


Please pay attention to Xen coding style. I haven't pointed out 
everywhere, but I would all of them to be fixed in the next version.

Yes, I'm sorry for that. I simultaneously work  with different projects
and sometimes it is hard to track coding style. I'll fix all such
problems.


IIRC your team is working on the checkpatch. It might be worth using it 
to see how it performs on your series.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-10 Thread Volodymyr Babchuk

Hi,

On 10.09.18 17:02, Julien Grall wrote:

Hi,

On 03/09/18 17:54, Volodymyr Babchuk wrote:

Shared memory is widely used by NW to communicate with
TAs in OP-TEE. NW can share part of own memory with
TA or OP-TEE core, by registering it OP-TEE, or by providing
a temporal refernce. Anyways, information about such memory
buffers are sent to OP-TEE as a list of pages. This mechanism
is descripted optee_msg.h.

Mediator should step in when NW tries to share memory with
OP-TEE for two reasons:

1. Do address translation from IPA to PA.
2. Pin domain pages till they are mapped into OP-TEE or TA
    address space, so domain can't transfer this pages to
    other domain or baloon out them.


s/baloon/balloon/



Address translation is done by translate_noncontig(...) function.
It allocates new buffer from xenheap and then walks on guest
provided list of pages, translates addresses and stores PAs into
newly allocated buffer. This buffer will be provided to OP-TEE
instead of original buffer from the guest. This buffer will
be free at the end of sdandard call.

In the same time this function pins pages and stores them in
struct shm_buf object. This object will live all the time,
when given SHM buffer is known to OP-TEE. It will be freed
after guest unregisters shared buffer. At this time pages
will be unpinned.

Signed-off-by: Volodymyr Babchuk 
---
  xen/arch/arm/tee/optee.c | 245 
++-

  1 file changed, 244 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 6d6b51d..8bfcfdc 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -22,6 +22,8 @@
  #define MAX_STD_CALLS   16
  #define MAX_RPC_SHMS    16
+#define MAX_TOTAL_SMH_BUF_PG    16384


So that's 64MB worth of guest memory. Do we expect them to be mapped in 
Xen? Or just pinned?

Just pinned. We are not interested in contents of this memory.


+#define MAX_NONCONTIG_ENTRIES   5
  /*
   * Call context. OP-TEE can issue multiple RPC returns during one call.
@@ -31,6 +33,9 @@ struct std_call_ctx {
  struct list_head list;
  struct optee_msg_arg *guest_arg;
  struct optee_msg_arg *xen_arg;
+    /* Buffer for translated page addresses, shared with OP-TEE */
+    void *non_contig[MAX_NONCONTIG_ENTRIES];
+    int non_contig_order[MAX_NONCONTIG_ENTRIES];


Can you please introduce a structure with the order and mapping?


  mfn_t guest_arg_mfn;
  int optee_thread_id;
  int rpc_op;
@@ -45,13 +50,24 @@ struct shm_rpc {
  uint64_t cookie;
  };
+/* Shared memory buffer for arbitrary data */
+struct shm_buf {
+    struct list_head list;
+    uint64_t cookie;
+    int max_page_cnt;
+    int page_cnt;


AFAICT, max_page_cnt and page_cnt should never but negative. If so, then 
they should be unsigned.



+    struct page_info *pages[];
+};
+
  struct domain_ctx {
  struct list_head list;
  struct list_head call_ctx_list;
  struct list_head shm_rpc_list;
+    struct list_head shm_buf_list;
  struct domain *domain;
  atomic_t call_ctx_count;
  atomic_t shm_rpc_count;
+    atomic_t shm_buf_pages;
  spinlock_t lock;
  };
@@ -158,9 +174,12 @@ static int optee_enable(struct domain *d)
  ctx->domain = d;
  INIT_LIST_HEAD(>call_ctx_list);
  INIT_LIST_HEAD(>shm_rpc_list);
+    INIT_LIST_HEAD(>shm_buf_list);
  atomic_set(>call_ctx_count, 0);
  atomic_set(>shm_rpc_count, 0);
+    atomic_set(>shm_buf_pages, 0);
+
  spin_lock_init(>lock);
  spin_lock(_ctx_list_lock);
@@ -339,12 +358,76 @@ static void free_shm_rpc(struct domain_ctx *ctx, 
uint64_t cookie)

  xfree(shm_rpc);
  }
+static struct shm_buf *allocate_shm_buf(struct domain_ctx *ctx,
+    uint64_t cookie,
+    int pages_cnt)


Ditto.


+{
+    struct shm_buf *shm_buf;
+
+    while(1)
+    {
+    int old = atomic_read(>shm_buf_pages);
+    int new = old + pages_cnt;
+    if ( new >= MAX_TOTAL_SMH_BUF_PG )
+    return NULL;
+    if ( likely(old == atomic_cmpxchg(>shm_buf_pages, old, 
new)) )

+    break;
+    }
+
+    shm_buf = xzalloc_bytes(sizeof(struct shm_buf) +
+    pages_cnt * sizeof(struct page *));
+    if ( !shm_buf ) {


Coding style:

if ( ... )
{


+    atomic_sub(pages_cnt, >shm_buf_pages);
+    return NULL;
+    }
+
+    shm_buf->cookie = cookie;
+    shm_buf->max_page_cnt = pages_cnt;
+
+    spin_lock(>lock);
+    list_add_tail(_buf->list, >shm_buf_list);
+    spin_unlock(>lock);
+
+    return shm_buf;
+}
+
+static void free_shm_buf(struct domain_ctx *ctx, uint64_t cookie)
+{
+    struct shm_buf *shm_buf; > +    bool found = false;
+
+    spin_lock(>lock);
+    list_for_each_entry( shm_buf, >shm_buf_list, list )
+    {
+    if ( shm_buf->cookie == cookie )


What does guarantee you the cookie will be uniq?


+    {
+    found = true;
+    list_del(_buf->list);
+

Re: [Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-10 Thread Julien Grall

Hi,

On 03/09/18 17:54, Volodymyr Babchuk wrote:

Shared memory is widely used by NW to communicate with
TAs in OP-TEE. NW can share part of own memory with
TA or OP-TEE core, by registering it OP-TEE, or by providing
a temporal refernce. Anyways, information about such memory
buffers are sent to OP-TEE as a list of pages. This mechanism
is descripted optee_msg.h.

Mediator should step in when NW tries to share memory with
OP-TEE for two reasons:

1. Do address translation from IPA to PA.
2. Pin domain pages till they are mapped into OP-TEE or TA
address space, so domain can't transfer this pages to
other domain or baloon out them.


s/baloon/balloon/



Address translation is done by translate_noncontig(...) function.
It allocates new buffer from xenheap and then walks on guest
provided list of pages, translates addresses and stores PAs into
newly allocated buffer. This buffer will be provided to OP-TEE
instead of original buffer from the guest. This buffer will
be free at the end of sdandard call.

In the same time this function pins pages and stores them in
struct shm_buf object. This object will live all the time,
when given SHM buffer is known to OP-TEE. It will be freed
after guest unregisters shared buffer. At this time pages
will be unpinned.

Signed-off-by: Volodymyr Babchuk 
---
  xen/arch/arm/tee/optee.c | 245 ++-
  1 file changed, 244 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 6d6b51d..8bfcfdc 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -22,6 +22,8 @@
  
  #define MAX_STD_CALLS   16

  #define MAX_RPC_SHMS16
+#define MAX_TOTAL_SMH_BUF_PG16384


So that's 64MB worth of guest memory. Do we expect them to be mapped in 
Xen? Or just pinned?



+#define MAX_NONCONTIG_ENTRIES   5
  
  /*

   * Call context. OP-TEE can issue multiple RPC returns during one call.
@@ -31,6 +33,9 @@ struct std_call_ctx {
  struct list_head list;
  struct optee_msg_arg *guest_arg;
  struct optee_msg_arg *xen_arg;
+/* Buffer for translated page addresses, shared with OP-TEE */
+void *non_contig[MAX_NONCONTIG_ENTRIES];
+int non_contig_order[MAX_NONCONTIG_ENTRIES];


Can you please introduce a structure with the order and mapping?


  mfn_t guest_arg_mfn;
  int optee_thread_id;
  int rpc_op;
@@ -45,13 +50,24 @@ struct shm_rpc {
  uint64_t cookie;
  };
  
+/* Shared memory buffer for arbitrary data */

+struct shm_buf {
+struct list_head list;
+uint64_t cookie;
+int max_page_cnt;
+int page_cnt;


AFAICT, max_page_cnt and page_cnt should never but negative. If so, then 
they should be unsigned.



+struct page_info *pages[];
+};
+
  struct domain_ctx {
  struct list_head list;
  struct list_head call_ctx_list;
  struct list_head shm_rpc_list;
+struct list_head shm_buf_list;
  struct domain *domain;
  atomic_t call_ctx_count;
  atomic_t shm_rpc_count;
+atomic_t shm_buf_pages;
  spinlock_t lock;
  };
  
@@ -158,9 +174,12 @@ static int optee_enable(struct domain *d)

  ctx->domain = d;
  INIT_LIST_HEAD(>call_ctx_list);
  INIT_LIST_HEAD(>shm_rpc_list);
+INIT_LIST_HEAD(>shm_buf_list);
  
  atomic_set(>call_ctx_count, 0);

  atomic_set(>shm_rpc_count, 0);
+atomic_set(>shm_buf_pages, 0);
+
  spin_lock_init(>lock);
  
  spin_lock(_ctx_list_lock);

@@ -339,12 +358,76 @@ static void free_shm_rpc(struct domain_ctx *ctx, uint64_t 
cookie)
  xfree(shm_rpc);
  }
  
+static struct shm_buf *allocate_shm_buf(struct domain_ctx *ctx,

+uint64_t cookie,
+int pages_cnt)


Ditto.


+{
+struct shm_buf *shm_buf;
+
+while(1)
+{
+int old = atomic_read(>shm_buf_pages);
+int new = old + pages_cnt;
+if ( new >= MAX_TOTAL_SMH_BUF_PG )
+return NULL;
+if ( likely(old == atomic_cmpxchg(>shm_buf_pages, old, new)) )
+break;
+}
+
+shm_buf = xzalloc_bytes(sizeof(struct shm_buf) +
+pages_cnt * sizeof(struct page *));
+if ( !shm_buf ) {


Coding style:

if ( ... )
{


+atomic_sub(pages_cnt, >shm_buf_pages);
+return NULL;
+}
+
+shm_buf->cookie = cookie;
+shm_buf->max_page_cnt = pages_cnt;
+
+spin_lock(>lock);
+list_add_tail(_buf->list, >shm_buf_list);
+spin_unlock(>lock);
+
+return shm_buf;
+}
+
+static void free_shm_buf(struct domain_ctx *ctx, uint64_t cookie)
+{
+struct shm_buf *shm_buf; > +bool found = false;
+
+spin_lock(>lock);
+list_for_each_entry( shm_buf, >shm_buf_list, list )
+{
+if ( shm_buf->cookie == cookie )


What does guarantee you the cookie will be uniq?


+{
+found = true;
+list_del(_buf->list);
+break;
+}
+}
+spin_unlock(>lock);



At this point you have 

[Xen-devel] [PATCH v2 09/13] optee: add support for arbitrary shared memory

2018-09-03 Thread Volodymyr Babchuk
Shared memory is widely used by NW to communicate with
TAs in OP-TEE. NW can share part of own memory with
TA or OP-TEE core, by registering it OP-TEE, or by providing
a temporal refernce. Anyways, information about such memory
buffers are sent to OP-TEE as a list of pages. This mechanism
is descripted optee_msg.h.

Mediator should step in when NW tries to share memory with
OP-TEE for two reasons:

1. Do address translation from IPA to PA.
2. Pin domain pages till they are mapped into OP-TEE or TA
   address space, so domain can't transfer this pages to
   other domain or baloon out them.

Address translation is done by translate_noncontig(...) function.
It allocates new buffer from xenheap and then walks on guest
provided list of pages, translates addresses and stores PAs into
newly allocated buffer. This buffer will be provided to OP-TEE
instead of original buffer from the guest. This buffer will
be free at the end of sdandard call.

In the same time this function pins pages and stores them in
struct shm_buf object. This object will live all the time,
when given SHM buffer is known to OP-TEE. It will be freed
after guest unregisters shared buffer. At this time pages
will be unpinned.

Signed-off-by: Volodymyr Babchuk 
---
 xen/arch/arm/tee/optee.c | 245 ++-
 1 file changed, 244 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 6d6b51d..8bfcfdc 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -22,6 +22,8 @@
 
 #define MAX_STD_CALLS   16
 #define MAX_RPC_SHMS16
+#define MAX_TOTAL_SMH_BUF_PG16384
+#define MAX_NONCONTIG_ENTRIES   5
 
 /*
  * Call context. OP-TEE can issue multiple RPC returns during one call.
@@ -31,6 +33,9 @@ struct std_call_ctx {
 struct list_head list;
 struct optee_msg_arg *guest_arg;
 struct optee_msg_arg *xen_arg;
+/* Buffer for translated page addresses, shared with OP-TEE */
+void *non_contig[MAX_NONCONTIG_ENTRIES];
+int non_contig_order[MAX_NONCONTIG_ENTRIES];
 mfn_t guest_arg_mfn;
 int optee_thread_id;
 int rpc_op;
@@ -45,13 +50,24 @@ struct shm_rpc {
 uint64_t cookie;
 };
 
+/* Shared memory buffer for arbitrary data */
+struct shm_buf {
+struct list_head list;
+uint64_t cookie;
+int max_page_cnt;
+int page_cnt;
+struct page_info *pages[];
+};
+
 struct domain_ctx {
 struct list_head list;
 struct list_head call_ctx_list;
 struct list_head shm_rpc_list;
+struct list_head shm_buf_list;
 struct domain *domain;
 atomic_t call_ctx_count;
 atomic_t shm_rpc_count;
+atomic_t shm_buf_pages;
 spinlock_t lock;
 };
 
@@ -158,9 +174,12 @@ static int optee_enable(struct domain *d)
 ctx->domain = d;
 INIT_LIST_HEAD(>call_ctx_list);
 INIT_LIST_HEAD(>shm_rpc_list);
+INIT_LIST_HEAD(>shm_buf_list);
 
 atomic_set(>call_ctx_count, 0);
 atomic_set(>shm_rpc_count, 0);
+atomic_set(>shm_buf_pages, 0);
+
 spin_lock_init(>lock);
 
 spin_lock(_ctx_list_lock);
@@ -339,12 +358,76 @@ static void free_shm_rpc(struct domain_ctx *ctx, uint64_t 
cookie)
 xfree(shm_rpc);
 }
 
+static struct shm_buf *allocate_shm_buf(struct domain_ctx *ctx,
+uint64_t cookie,
+int pages_cnt)
+{
+struct shm_buf *shm_buf;
+
+while(1)
+{
+int old = atomic_read(>shm_buf_pages);
+int new = old + pages_cnt;
+if ( new >= MAX_TOTAL_SMH_BUF_PG )
+return NULL;
+if ( likely(old == atomic_cmpxchg(>shm_buf_pages, old, new)) )
+break;
+}
+
+shm_buf = xzalloc_bytes(sizeof(struct shm_buf) +
+pages_cnt * sizeof(struct page *));
+if ( !shm_buf ) {
+atomic_sub(pages_cnt, >shm_buf_pages);
+return NULL;
+}
+
+shm_buf->cookie = cookie;
+shm_buf->max_page_cnt = pages_cnt;
+
+spin_lock(>lock);
+list_add_tail(_buf->list, >shm_buf_list);
+spin_unlock(>lock);
+
+return shm_buf;
+}
+
+static void free_shm_buf(struct domain_ctx *ctx, uint64_t cookie)
+{
+struct shm_buf *shm_buf;
+bool found = false;
+
+spin_lock(>lock);
+list_for_each_entry( shm_buf, >shm_buf_list, list )
+{
+if ( shm_buf->cookie == cookie )
+{
+found = true;
+list_del(_buf->list);
+break;
+}
+}
+spin_unlock(>lock);
+
+if ( !found ) {
+return;
+}
+
+for ( int i = 0; i < shm_buf->page_cnt; i++ )
+if ( shm_buf->pages[i] )
+put_page(shm_buf->pages[i]);
+
+atomic_sub(shm_buf->max_page_cnt, >shm_buf_pages);
+
+xfree(shm_buf);
+}
+
 static void optee_domain_destroy(struct domain *d)
 {
 struct arm_smccc_res resp;
 struct domain_ctx *ctx;
 struct std_call_ctx *call, *call_tmp;
 struct shm_rpc *shm_rpc, *shm_rpc_tmp;
+struct shm_buf *shm_buf, *shm_buf_tmp;