date:20230502

Re: [PATCH v4 10/13] tools/xenstore: switch transaction accounting to generic accounting

2023-05-02 Thread Juergen Gross


On 02.05.23 21:19, Julien Grall wrote:

Hi,

On 05/04/2023 08:03, Juergen Gross wrote:

As transaction accounting is active for unprivileged domains only, it
can easily be added to the generic per-domain accounting.

Signed-off-by: Juergen Gross 
---
  tools/xenstore/xenstored_core.c    |  3 +--
  tools/xenstore/xenstored_core.h    |  1 -
  tools/xenstore/xenstored_domain.c  | 21 ++---
  tools/xenstore/xenstored_domain.h  |  4 
  tools/xenstore/xenstored_transaction.c | 12 +---
  5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 2d481fcad9..88c569b7d5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2083,7 +2083,7 @@ static void consider_message(struct connection *conn)
   * stalled. This will ignore new requests until Live-Update happened
   * or it was aborted.
   */
-    if (lu_is_pending() && conn->transaction_started == 0 &&
+    if (lu_is_pending() && conn->ta_start_time == 0 &&


NIT: I know there are some places in the code checking for conn->ta_start_time 
== 0. But it feels like a better replacement to "conn->transaction_started" is 
"list_empty(...)".


Fine with me.



I agree this is going to be more expensive. But you are switching the 
transaction accounting to a generic infrastructure which is pretty heavy compare 
to a simple addition/substraction. So I think a "list_empty()" would be OK here.



  conn->in->hdr.msg.type == XS_TRANSACTION_START) {
  trace("Delaying transaction start for connection %p req_id %u\n",
    conn, conn->in->hdr.msg.req_id);
@@ -2190,7 +2190,6 @@ struct connection *new_connection(const struct 
interface_funcs *funcs)

  new->funcs = funcs;
  new->is_ignored = false;
  new->is_stalled = false;
-    new->transaction_started = 0;
  INIT_LIST_HEAD(&new->out_list);
  INIT_LIST_HEAD(&new->acc_list);
  INIT_LIST_HEAD(&new->ref_list);
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 5a11dc1231..3564d85d7d 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -151,7 +151,6 @@ struct connection
  /* List of in-progress transactions. */
  struct list_head transaction_list;
  uint32_t next_transaction_id;
-    unsigned int transaction_started;
  time_t ta_start_time;
  /* List of delayed requests. */
diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c

index 1caa60bb14..40bcc1dbfa 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -419,12 +419,10 @@ int domain_get_quota(const void *ctx, struct connection 
*conn,

  {
  struct domain *d = find_domain_struct(domid);
  char *resp;
-    int ta;
  if (!d)
  return ENOENT;
-    ta = d->conn ? d->conn->transaction_started : 0;
  resp = talloc_asprintf(ctx, "Domain %u:\n", domid);
  if (!resp)
  return ENOMEM;
@@ -435,7 +433,7 @@ int domain_get_quota(const void *ctx, struct connection 
*conn,

  ent(nodes, d->acc[ACC_NODES]);
  ent(watches, d->acc[ACC_WATCH]);
-    ent(transactions, ta);
+    ent(transactions, d->acc[ACC_TRANS]);
  ent(outstanding, d->acc[ACC_OUTST]);
  ent(memory, d->acc[ACC_MEM]);
@@ -1297,6 +1295,23 @@ void domain_outstanding_dec(struct connection *conn, 
unsigned int domid)

  domain_acc_add(conn, domid, ACC_OUTST, -1, true);
  }
+void domain_transaction_inc(struct connection *conn)
+{
+    domain_acc_add(conn, conn->id, ACC_TRANS, 1, true);
+}
+
+void domain_transaction_dec(struct connection *conn)
+{
+    domain_acc_add(conn, conn->id, ACC_TRANS, -1, true);
+}
+
+unsigned int domain_transaction_get(struct connection *conn)
+{
+    return (domain_is_unprivileged(conn))
+    ? domain_acc_add(conn, conn->id, ACC_TRANS, 0, true)
+    : 0;
+}
+
  static wrl_creditt wrl_config_writecost  = WRL_FACTOR;
  static wrl_creditt wrl_config_rate   = WRL_RATE   * WRL_FACTOR;
  static wrl_creditt wrl_config_dburst = WRL_DBURST * WRL_FACTOR;
diff --git a/tools/xenstore/xenstored_domain.h 
b/tools/xenstore/xenstored_domain.h

index 0d61bf4344..abc766f343 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -31,6 +31,7 @@ enum accitem {
  ACC_WATCH = ACC_TR_N,
  ACC_OUTST,
  ACC_MEM,
+    ACC_TRANS,
  ACC_N,    /* Number of elements per domain. */
  };
@@ -112,6 +113,9 @@ void domain_watch_dec(struct connection *conn);
  int domain_watch(struct connection *conn);
  void domain_outstanding_inc(struct connection *conn, unsigned int domid);
  void domain_outstanding_dec(struct connection *conn, unsigned int domid);
+void domain_transaction_inc(struct connection *conn);
+void domain_transaction_dec(struct connection *conn);
+unsigned int domain_transaction_get(struct connection *conn);
  int domain_get_quo

[qemu-mainline test] 180507: tolerable FAIL - PUSHED

2023-05-02 Thread osstest service owner

flight 180507 qemu-mainline real [real]
flight 180512 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/180507/
http://logs.test-lab.xenproject.org/osstest/logs/180512/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-xsm7 xen-install fail pass in 180512-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180487
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180487
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180487
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180487
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180487
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180487
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180487
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180487
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 qemuub5f47ba73b7c1457d2f18d71c00e1a91a76fe60b
baseline version:
 qemuu7c18f2d663521f1b31b821a13358ce38075eaf7d

Last test of basis   180487  2023-04-30 07:39:02 Z2 days
Testing same since   180507  2023-05-02 15:37:40 Z0 days1 attempts


People who touched revisions under test:
  Alexander Bulekov 
  Fabiano Rosas 
  Richard Henderson 
  Thomas Huth 

jobs:
 build-amd64-xsm

Re: [PATCH v4 07/13] tools/xenstore: use accounting data array for per-domain values

2023-05-02 Thread Juergen Gross


On 02.05.23 21:09, Julien Grall wrote:

Hi Juergen,

On 05/04/2023 08:03, Juergen Gross wrote:
diff --git a/tools/xenstore/xenstored_domain.h 
b/tools/xenstore/xenstored_domain.h

index 5cfd730cf6..0d61bf4344 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -28,7 +28,10 @@ enum accitem {
  ACC_NODES,
  ACC_REQ_N,    /* Number of elements per request. */
  ACC_TR_N = ACC_REQ_N,    /* Number of elements per transaction. */
-    ACC_N = ACC_TR_N,    /* Number of elements per domain. */
+    ACC_WATCH = ACC_TR_N,
+    ACC_OUTST,
+    ACC_MEM,
+    ACC_N,    /* Number of elements per domain. */
  };
  void handle_event(void);
@@ -107,9 +110,8 @@ static inline void domain_memory_add_nochk(struct 
connection *conn,

  void domain_watch_inc(struct connection *conn);
  void domain_watch_dec(struct connection *conn);
  int domain_watch(struct connection *conn);
-void domain_outstanding_inc(struct connection *conn);
-void domain_outstanding_dec(struct connection *conn);
-void domain_outstanding_domid_dec(unsigned int domid);
+void domain_outstanding_inc(struct connection *conn, unsigned int domid);


AFAICT, all the caller of domain_outstanding_inc() will pass 'conn->id'. So it 
is not entirely clear what's the benefits to add the extra parameter.


domain_acc_add() will need conn. I agree that I should drop the domid
parameter.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH v4 05/13] tools/xenstore: use accounting buffering for node accounting

2023-05-02 Thread Juergen Gross


On 02.05.23 20:55, Julien Grall wrote:

Hi Juergen,

On 05/04/2023 08:03, Juergen Gross wrote:

Add the node accounting to the accounting information buffering in
order to avoid having to undo it in case of failure.

Signed-off-by: Juergen Gross 
---
  tools/xenstore/xenstored_core.c   | 21 ++---
  tools/xenstore/xenstored_domain.h |  4 ++--
  2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 84335f5f3d..92a40ccf3f 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1452,7 +1452,6 @@ static void destroy_node_rm(struct connection *conn, 
struct node *node)

  static int destroy_node(struct connection *conn, struct node *node)
  {
  destroy_node_rm(conn, node);
-    domain_nbentry_dec(conn, get_node_owner(node));
  /*
   * It is not possible to easily revert the changes in a transaction.
@@ -1797,27 +1796,11 @@ static int do_set_perms(const void *ctx, struct 
connection *conn,

  old_perms = node->perms;
  domain_nbentry_dec(conn, get_node_owner(node));


IIRC, we originally said that domain_nbentry_dec() could never fail in a 
non-transaction case. But with your current rework, the function can now fail 
because of an allocation failure.


How would that be possible to happen?

domain_nbentry_dec() can only be called if a node is being owned by an already
known domain. So allocation is impossible to happen, as this would be a major
error in xenstored.


Therefore, shouldn't we now check the error? (Possibly in a patch beforehand).


I don't think so. I can add a comment if you want.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

RE: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-02 Thread Michael Kelley (LINUX)

From: Juergen Gross  Sent: Tuesday, May 2, 2023 5:09 AM
> 
> This series tries to fix the rather special case of PAT being available
> without having MTRRs (either due to CONFIG_MTRR being not set, or
> because the feature has been disabled e.g. by a hypervisor).
> 
> The main use cases are Xen PV guests and SEV-SNP guests running under
> Hyper-V.
> 
> Instead of trying to work around all the issues by adding if statements
> here and there, just try to use the complete available infrastructure
> by setting up a read-only MTRR state when needed.
> 
> In the Xen PV case the current MTRR MSR values can be read from the
> hypervisor, while for the SEV-SNP case all needed is to set the
> default caching mode to "WB".
> 
> I have added more cleanup which has been discussed when looking into
> the most recent failures.
> 
> Note that I couldn't test the Hyper-V related change (patch 3).
> 
> Running on bare metal and with Xen didn't show any problems with the
> series applied.
> 
> It should be noted that patches 9+10 are replacing today's way to
> lookup the MTRR cache type for a memory region from looking at the
> MTRR register values to building a memory map with the cache types.
> This should make the lookup much faster and much easier to understand.
> 
> Changes in V2:
> - replaced former patches 1+2 with new patches 1-4, avoiding especially
>   the rather hacky approach of V1, while making all the MTRR type
>   conflict tests available for the Xen PV case
> - updated patch 6 (was patch 4 in V1)
> 
> Changes in V3:
> - dropped patch 5 of V2, as already applied
> - split patch 1 of V2 into 2 patches
> - new patches 6-10
> - addressed comments
> 
> Changes in V4:
> - addressed comments
> 
> Changes in V5
> - addressed comments
> - some other small fixes
> - new patches 3, 8 and 15
> 
> Changes in V6:
> - patch 1 replaces patches 1+2 of V5
> - new patches 8+12
> - addressed comments
> 
> Juergen Gross (16):
>   x86/mtrr: remove physical address size calculation
>   x86/mtrr: replace some constants with defines
>   x86/mtrr: support setting MTRR state for software defined MTRRs
>   x86/hyperv: set MTRR state when running as SEV-SNP Hyper-V guest
>   x86/xen: set MTRR state when running as Xen PV initial domain
>   x86/mtrr: replace vendor tests in MTRR code
>   x86/mtrr: have only one set_mtrr() variant
>   x86/mtrr: move 32-bit code from mtrr.c to legacy.c
>   x86/mtrr: allocate mtrr_value array dynamically
>   x86/mtrr: add get_effective_type() service function
>   x86/mtrr: construct a memory map with cache modes
>   x86/mtrr: add mtrr=debug command line option
>   x86/mtrr: use new cache_map in mtrr_type_lookup()
>   x86/mtrr: don't let mtrr_type_lookup() return MTRR_TYPE_INVALID
>   x86/mm: only check uniform after calling mtrr_type_lookup()
>   x86/mtrr: remove unused code
> 
>  .../admin-guide/kernel-parameters.txt |   4 +
>  arch/x86/hyperv/ivm.c |   4 +
>  arch/x86/include/asm/mtrr.h   |  43 +-
>  arch/x86/include/uapi/asm/mtrr.h  |   6 +-
>  arch/x86/kernel/cpu/mtrr/Makefile |   2 +-
>  arch/x86/kernel/cpu/mtrr/amd.c|   2 +-
>  arch/x86/kernel/cpu/mtrr/centaur.c|  11 +-
>  arch/x86/kernel/cpu/mtrr/cleanup.c|  22 +-
>  arch/x86/kernel/cpu/mtrr/cyrix.c  |   2 +-
>  arch/x86/kernel/cpu/mtrr/generic.c| 677 --
>  arch/x86/kernel/cpu/mtrr/legacy.c |  90 +++
>  arch/x86/kernel/cpu/mtrr/mtrr.c   | 195 ++---
>  arch/x86/kernel/cpu/mtrr/mtrr.h   |  18 +-
>  arch/x86/kernel/setup.c   |   2 +
>  arch/x86/mm/pgtable.c |  24 +-
>  arch/x86/xen/enlighten_pv.c   |  52 ++
>  16 files changed, 721 insertions(+), 433 deletions(-)
>  create mode 100644 arch/x86/kernel/cpu/mtrr/legacy.c
> 
> --
> 2.35.3

I've tested the full v6 series in a normal Hyper-V guest and in an SEV-SNP 
guest.

In the SNP guest, the page attributes in /sys/kernel/debug/x86/pat_memtype_list
are "write-back" in the expected cases.  The "mtrr" x86 feature no longer 
appears
in the "flags" output of "lscpu" or /proc/cpuinfo.  /proc/mtrr does not exist, 
again
as expected.

In a normal VM, the "mtrr" x86 feature appears in the flags, and /proc/mtrr
shows expected values.  The boot option mtrr=debug works as expected.

Tested-by: Michael Kelley

[xen-unstable test] 180506: tolerable FAIL - PUSHED

2023-05-02 Thread osstest service owner

flight 180506 xen-unstable real [real]
flight 180510 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/180506/
http://logs.test-lab.xenproject.org/osstest/logs/180510/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-livepatch 7 xen-install fail pass in 180510-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180501
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180501
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180501
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180501
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 180501
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 180501
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180501
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180501
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180501
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180501
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180501
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180501
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  b033eddc9779109c06a26936321d27a2ef4e088b
baseline version:
 xen  ef841d2a2377f5297add27e637b725426bb4840a

Last test of basis   180501  2023-05-02 01:52:02 Z0

Re: [RFC PATCH] xen/arm: arm32: Enable smpboot on Arm32 based systems

2023-05-02 Thread Stefano Stabellini

On Tue, 2 May 2023, Ayan Kumar Halder wrote:
> On some of the Arm32 based systems (eg Cortex-R52), smpboot is supported.
> In these systems PSCI may not always be supported. In case of Cortex-R52, 
> there
> is no EL3 or secure mode. Thus, PSCI is not supported as it requires EL3.
> 
> Thus, we use 'spin-table' mechanism to boot the secondary cpus. The primary
> cpu provides the startup address of the secondary cores. This address is
> provided using the 'cpu-release-addr' property.
> 
> To support smpboot, we have copied the code from xen/arch/arm/arm64/smpboot.c
> with the following changes :-
> 
> 1. 'enable-method' is an optional property. Refer to the comment in
> https://www.kernel.org/doc/Documentation/devicetree/bindings/arm/cpus.yaml
> "  # On ARM 32-bit systems this property is optional"
> 
> 2. psci is not currently supported as a value for 'enable-method'.
> 
> 3. update_identity_mapping() is not invoked as we are not sure if it is
> required.
> 
> Signed-off-by: Ayan Kumar Halder 
> ---
> 
> The dts snippet with which this has been validated is :-
> 
> cpus {
> #address-cells = <0x02>;
> #size-cells = <0x00>;
> 
> cpu-map {
> 
> cluster0 {
> 
> core0 {
> 
> thread0 {
> cpu = <0x02>;
> };
> };
> core1 {
> 
> thread0 {
> cpu = <0x03>;
> };
> };
> };
> };
> 
> cpu@0 {
> device_type = "cpu";
> compatible = "arm,armv8";
> reg = <0x00 0x00>;
> phandle = <0x02>;
> };
> 
> cpu@1 {
> device_type = "cpu";
> compatible = "arm,armv8";
> reg = <0x00 0x01>;
> enable-method = "spin-table";
> cpu-release-addr = <0xEB58C010>;
> phandle = <0x03>;
> };
> };
> 
> Although currently I have tested this on Cortex-R52, I feel this may be 
> helpful
> to enable smp on other Arm32 based systems as well. Happy to hear opinions.

I think you are right


>  xen/arch/arm/arm32/smpboot.c | 84 ++--
>  1 file changed, 80 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/smpboot.c b/xen/arch/arm/arm32/smpboot.c
> index 518e9f9c7e..feb249d3f8 100644
> --- a/xen/arch/arm/arm32/smpboot.c
> +++ b/xen/arch/arm/arm32/smpboot.c
> @@ -1,24 +1,100 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
> +struct smp_enable_ops {
> +int (*prepare_cpu)(int);
> +};

coding style


> +static uint32_t cpu_release_addr[NR_CPUS];
> +static struct smp_enable_ops smp_enable_ops[NR_CPUS];

they could be __initdata


>  int __init arch_smp_init(void)
>  {
>  return platform_smp_init();
>  }
>  
> -int __init arch_cpu_init(int cpu, struct dt_device_node *dn)
> +static int __init smp_spin_table_cpu_up(int cpu)
> +{
> +uint32_t __iomem *release;
> +
> +if (!cpu_release_addr[cpu])

code style


> +{
> +printk("CPU%d: No release addr\n", cpu);
> +return -ENODEV;
> +}
> +
> +release = ioremap_nocache(cpu_release_addr[cpu], 4);
> +if ( !release )
> +{
> +dprintk(XENLOG_ERR, "CPU%d: Unable to map release address\n", cpu);
> +return -EFAULT;
> +}
> +
> +writel(__pa(init_secondary), release);
> +
> +iounmap(release);

I think we need a wmb() ?


> +sev();
> +
> +return 0;
> +}
> +
> +static void __init smp_spin_table_init(int cpu, struct dt_device_node *dn)
>  {
> -/* Not needed on ARM32, as there is no relevant information in
> - * the CPU device tree node for ARMv7 CPUs.
> +if ( !dt_property_read_u32(dn, "cpu-release-addr", 
> &cpu_release_addr[cpu]) )

It looks like cpu-release-addr could be u64 or u32. Can we detect the
size of the property and act accordingly? If the address is u64 and
above 4GB it is fine to abort.


> +{
> +printk("CPU%d has no cpu-release-addr\n", cpu);
> +return;
> +}
> +
> +smp_enable_ops[cpu].prepare_cpu = smp_spin_table_cpu_up;
> +}
> +
> +static int __init dt_arch_cpu_init(int cpu, struct dt_device_node *dn)
> +{
> +const char *enable_method;
> +
> +/*
> + * Refer Documentation/devicetree/bindings/arm/cpus.yaml, it says on
> + * ARM 32-bit systems this property is optional.
>   */
> +enable_method = dt_get_property(dn, "enable-method", NULL);
> +if (!enable_method)

coding style


> +{
> +return 0;
> +}
> +
> +if ( !strcmp(enable_method, "spin-table") )
> +smp_spin_table_init(cpu, dn);
> +else
> +{
> +printk("CPU%d has unknown enable method \"%s\"\n", cpu, 
> enable_method);
> +return -EINVAL;
> +}
> +
>  return 0;
>  }
>  
> +int __init arch_cpu_init(int cpu, struct dt_device_node *dn)
> +{
> +return dt_arch_cpu_i

[XEN][PATCH v6 17/19] tools/libs/ctrl: Implement new xc interfaces for dt overlay

2023-05-02 Thread Vikram Garhwal

xc_dt_overlay() sends the device tree binary overlay, size of .dtbo and overlay
operation type i.e. add or remove to xen.

Signed-off-by: Vikram Garhwal 
---
 tools/include/xenctrl.h |  5 
 tools/libs/ctrl/Makefile.common |  1 +
 tools/libs/ctrl/xc_dt_overlay.c | 48 +
 3 files changed, 54 insertions(+)
 create mode 100644 tools/libs/ctrl/xc_dt_overlay.c

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 752fc87580..1a99c06561 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2666,6 +2666,11 @@ int xc_livepatch_replace(xc_interface *xch, char *name, 
uint32_t timeout, uint32
 int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
  xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
 
+#if defined(__arm__) || defined(__aarch64__)
+int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
+  uint32_t overlay_fdt_size, uint8_t overlay_op);
+#endif
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libs/ctrl/Makefile.common b/tools/libs/ctrl/Makefile.common
index 0a09c28fd3..247afbe5f9 100644
--- a/tools/libs/ctrl/Makefile.common
+++ b/tools/libs/ctrl/Makefile.common
@@ -24,6 +24,7 @@ OBJS-y   += xc_hcall_buf.o
 OBJS-y   += xc_foreign_memory.o
 OBJS-y   += xc_kexec.o
 OBJS-y   += xc_resource.o
+OBJS-$(CONFIG_ARM)  += xc_dt_overlay.o
 OBJS-$(CONFIG_X86) += xc_psr.o
 OBJS-$(CONFIG_X86) += xc_pagetab.o
 OBJS-$(CONFIG_Linux) += xc_linux.o
diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c
new file mode 100644
index 00..202fc906f4
--- /dev/null
+++ b/tools/libs/ctrl/xc_dt_overlay.c
@@ -0,0 +1,48 @@
+/*
+ *
+ * Device Tree Overlay functions.
+ * Copyright (C) 2021 Xilinx Inc.
+ * Author Vikram Garhwal 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+
+int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
+  uint32_t overlay_fdt_size, uint8_t overlay_op)
+{
+int err;
+DECLARE_SYSCTL;
+
+DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) )
+goto err;
+
+sysctl.cmd = XEN_SYSCTL_dt_overlay;
+sysctl.u.dt_overlay.overlay_op = overlay_op;
+sysctl.u.dt_overlay.overlay_fdt_size = overlay_fdt_size;
+
+set_xen_guest_handle(sysctl.u.dt_overlay.overlay_fdt, overlay_fdt);
+
+if ( (err = do_sysctl(xch, &sysctl)) != 0 )
+PERROR("%s failed", __func__);
+
+err:
+xc_hypercall_bounce_post(xch, overlay_fdt);
+
+return err;
+}
-- 
2.17.1

[XEN][PATCH v6 19/19] tools/xl: Add new xl command overlay for device tree overlay support

2023-05-02 Thread Vikram Garhwal

Signed-off-by: Vikram Garhwal 
---
 tools/xl/xl.h   |  1 +
 tools/xl/xl_cmdtable.c  |  6 +
 tools/xl/xl_vmcontrol.c | 52 +
 3 files changed, 59 insertions(+)

diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 72538d6a81..a923daccd3 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -138,6 +138,7 @@ int main_shutdown(int argc, char **argv);
 int main_reboot(int argc, char **argv);
 int main_list(int argc, char **argv);
 int main_vm_list(int argc, char **argv);
+int main_dt_overlay(int argc, char **argv);
 int main_create(int argc, char **argv);
 int main_config_update(int argc, char **argv);
 int main_button_press(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index ccf4d83584..db0acff62a 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -630,6 +630,12 @@ const struct cmd_spec cmd_table[] = {
   "Issue a qemu monitor command to the device model of a domain",
   " ",
 },
+{ "dt-overlay",
+  &main_dt_overlay, 0, 1,
+  "Add/Remove a device tree overlay",
+  "add/remove <.dtbo>"
+  "-h print this help\n"
+},
 };
 
 const int cmdtable_len = ARRAY_SIZE(cmd_table);
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 5518c78dc6..de56e00d8b 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1265,6 +1265,58 @@ int main_create(int argc, char **argv)
 return 0;
 }
 
+int main_dt_overlay(int argc, char **argv)
+{
+const char *overlay_ops = NULL;
+const char *overlay_config_file = NULL;
+void *overlay_dtb = NULL;
+int rc;
+uint8_t op;
+int overlay_dtb_size = 0;
+const int overlay_add_op = 1;
+const int overlay_remove_op = 2;
+
+if (argc < 2) {
+help("dt_overlay");
+return EXIT_FAILURE;
+}
+
+overlay_ops = argv[1];
+overlay_config_file = argv[2];
+
+if (strcmp(overlay_ops, "add") == 0)
+op = overlay_add_op;
+else if (strcmp(overlay_ops, "remove") == 0)
+op = overlay_remove_op;
+else {
+fprintf(stderr, "Invalid dt overlay operation\n");
+return EXIT_FAILURE;
+}
+
+if (overlay_config_file) {
+rc = libxl_read_file_contents(ctx, overlay_config_file,
+  &overlay_dtb, &overlay_dtb_size);
+
+if (rc) {
+fprintf(stderr, "failed to read the overlay device tree file %s\n",
+overlay_config_file);
+free(overlay_dtb);
+return ERROR_FAIL;
+}
+} else {
+fprintf(stderr, "overlay dtbo file not provided\n");
+return ERROR_FAIL;
+}
+
+rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
+
+free(overlay_dtb);
+
+if (rc)
+return EXIT_FAILURE;
+
+return rc;
+}
 /*
  * Local variables:
  * mode: C
-- 
2.17.1

[XEN][PATCH v6 18/19] tools/libs/light: Implement new libxl functions for device tree overlay ops

2023-05-02 Thread Vikram Garhwal

Signed-off-by: Vikram Garhwal 
---
 tools/include/libxl.h   | 11 +
 tools/libs/light/Makefile   |  3 ++
 tools/libs/light/libxl_dt_overlay.c | 71 +
 3 files changed, 85 insertions(+)
 create mode 100644 tools/libs/light/libxl_dt_overlay.c

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index cfa1a19131..1c5e8abaae 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -250,6 +250,12 @@
  */
 #define LIBXL_HAVE_DEVICETREE_PASSTHROUGH 1
 
+#if defined(__arm__) || defined(__aarch64__)
+/**
+ * This means Device Tree Overlay is supported.
+ */
+#define LIBXL_HAVE_DT_OVERLAY 1
+#endif
 /*
  * libxl_domain_build_info has device_model_user to specify the user to
  * run the device model with. See docs/misc/qemu-deprivilege.txt.
@@ -2453,6 +2459,11 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
 int *num);
 void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
+#if defined(__arm__) || defined(__aarch64__)
+int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
+ uint32_t overlay_size, uint8_t overlay_op);
+#endif
+
 /*
  * Turns the current process into a backend device service daemon
  * for a driver domain.
diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index 96daeabc47..563a1e8d0a 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -112,6 +112,9 @@ OBJS-y += _libxl_types.o
 OBJS-y += libxl_flask.o
 OBJS-y += _libxl_types_internal.o
 
+# Device tree overlay is enabled only for ARM architecture.
+OBJS-$(CONFIG_ARM) += libxl_dt_overlay.o
+
 ifeq ($(CONFIG_LIBNL),y)
 CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
 endif
diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
new file mode 100644
index 00..a6c709a6dc
--- /dev/null
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (C) 2021 Xilinx Inc.
+ * Author Vikram Garhwal 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+#include 
+#include 
+
+static int check_overlay_fdt(libxl__gc *gc, void *fdt, size_t size)
+{
+int r;
+
+if (fdt_magic(fdt) != FDT_MAGIC) {
+LOG(ERROR, "Overlay FDT is not a valid Flat Device Tree");
+return ERROR_FAIL;
+}
+
+r = fdt_check_header(fdt);
+if (r) {
+LOG(ERROR, "Failed to check the overlay FDT (%d)", r);
+return ERROR_FAIL;
+}
+
+if (fdt_totalsize(fdt) > size) {
+LOG(ERROR, "Overlay FDT totalsize is too big");
+return ERROR_FAIL;
+}
+
+return 0;
+}
+
+int libxl_dt_overlay(libxl_ctx *ctx, void *overlay_dt, uint32_t 
overlay_dt_size,
+ uint8_t overlay_op)
+{
+int rc;
+int r;
+GC_INIT(ctx);
+
+if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) {
+LOG(ERROR, "Overlay DTB check failed");
+rc = ERROR_FAIL;
+goto out;
+} else {
+LOG(DEBUG, "Overlay DTB check passed");
+rc = 0;
+}
+
+r = xc_dt_overlay(ctx->xch, overlay_dt, overlay_dt_size, overlay_op);
+
+if (r) {
+LOG(ERROR, "%s: Adding/Removing overlay dtb failed.", __func__);
+rc = ERROR_FAIL;
+}
+
+out:
+GC_FREE;
+return rc;
+}
+
-- 
2.17.1

[XEN][PATCH v6 15/19] xen/arm: Implement device tree node removal functionalities

2023-05-02 Thread Vikram Garhwal

Introduce sysctl XEN_SYSCTL_dt_overlay to remove device-tree nodes added using
device tree overlay.

xl dt-overlay remove file.dtbo:
Removes all the nodes in a given dtbo.
First, removes IRQ permissions and MMIO accesses. Next, it finds the nodes
in dt_host and delete the device node entries from dt_host.

The nodes get removed only if it is not used by any of dom0 or domio.

Also, added overlay_track struct to keep the track of added node through device
tree overlay. overlay_track has dt_host_new which is unflattened form of updated
fdt and name of overlay nodes. When a node is removed, we also free the memory
used by overlay_track for the particular overlay node.

Nested overlay removal is supported in sequential manner only i.e. if
overlay_child nests under overlay_parent, it is assumed that user first removes
overlay_child and then removes overlay_parent.

Signed-off-by: Vikram Garhwal 
---
 xen/arch/arm/sysctl.c|  16 +-
 xen/common/Makefile  |   1 +
 xen/common/dt-overlay.c  | 419 +++
 xen/include/public/sysctl.h  |  23 ++
 xen/include/xen/dt-overlay.h |  58 +
 5 files changed, 516 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/dt-overlay.c
 create mode 100644 xen/include/xen/dt-overlay.h

diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c
index b0a78a8b10..456358166c 100644
--- a/xen/arch/arm/sysctl.c
+++ b/xen/arch/arm/sysctl.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -21,7 +22,20 @@ void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
 long arch_do_sysctl(struct xen_sysctl *sysctl,
 XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
-return -ENOSYS;
+long ret = 0;
+
+switch ( sysctl->cmd )
+{
+case XEN_SYSCTL_dt_overlay:
+ret = dt_sysctl(&sysctl->u.dt_overlay);
+break;
+
+default:
+ret = -ENOSYS;
+break;
+}
+
+return ret;
 }
 
 /*
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 46049eac35..e7e96b1087 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
 obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
 obj-$(CONFIG_IOREQ_SERVER) += dm.o
 obj-y += domain.o
+obj-$(CONFIG_OVERLAY_DTB) += dt-overlay.o
 obj-y += event_2l.o
 obj-y += event_channel.o
 obj-y += event_fifo.o
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
new file mode 100644
index 00..b89cceab84
--- /dev/null
+++ b/xen/common/dt-overlay.c
@@ -0,0 +1,419 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * xen/common/dt-overlay.c
+ *
+ * Device tree overlay support in Xen.
+ *
+ * Copyright (C) 2023, Advanced Micro Devices, Inc. All Rights Reserved.
+ * Written by Vikram Garhwal 
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static LIST_HEAD(overlay_tracker);
+static DEFINE_SPINLOCK(overlay_lock);
+
+/* Find last descendants of the device_node. */
+static struct dt_device_node *
+find_last_descendants_node(struct dt_device_node *device_node)
+{
+struct dt_device_node *child_node;
+
+for ( child_node = device_node->child; child_node->sibling != NULL;
+  child_node = child_node->sibling );
+
+/* If last child_node also have children. */
+if ( child_node->child )
+child_node = find_last_descendants_node(child_node);
+
+return child_node;
+}
+
+static int dt_overlay_remove_node(struct dt_device_node *device_node)
+{
+struct dt_device_node *np;
+struct dt_device_node *parent_node;
+struct dt_device_node *device_node_last_descendant = device_node->child;
+
+parent_node = device_node->parent;
+
+if ( parent_node == NULL )
+{
+dt_dprintk("%s's parent node not found\n", device_node->name);
+return -EFAULT;
+}
+
+np = parent_node->child;
+
+if ( np == NULL )
+{
+dt_dprintk("parent node %s's not found\n", parent_node->name);
+return -EFAULT;
+}
+
+/* If node to be removed is only child node or first child. */
+if ( !dt_node_cmp(np->full_name, device_node->full_name) )
+{
+parent_node->child = np->sibling;
+
+/*
+ * Iterate over all child nodes of device_node. Given that we are
+ * removing parent node, we need to remove all it's descendants too.
+ */
+if ( device_node_last_descendant )
+{
+device_node_last_descendant =
+
find_last_descendants_node(device_node);
+parent_node->allnext = device_node_last_descendant->allnext;
+}
+else
+parent_node->allnext = np->allnext;
+
+return 0;
+}
+
+for ( np = parent_node->child; np->sibling != NULL; np = np->sibling )
+{
+if ( !dt_node_cmp(np->sibling->full_name, device_node->full_name) )
+{
+/* Found the node. Now we remove it. */
+np->si

[XEN][PATCH v6 10/19] xen/iommu: protect iommu_add_dt_device() with dtdevs_lock

2023-05-02 Thread Vikram Garhwal

Protect iommu_add_dt_device() with dtdevs_lock to prevent concurrent access add.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
 xen/drivers/passthrough/device_tree.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index c386fda3e4..f3867ef1a6 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -145,6 +145,8 @@ int iommu_add_dt_device(struct dt_device_node *np)
 if ( dev_iommu_fwspec_get(dev) )
 return 0;
 
+spin_lock(&dtdevs_lock);
+
 /*
  * According to the Documentation/devicetree/bindings/iommu/iommu.txt
  * from Linux.
@@ -157,7 +159,10 @@ int iommu_add_dt_device(struct dt_device_node *np)
  * these callback implemented.
  */
 if ( !ops->add_device || !ops->dt_xlate )
-return -EINVAL;
+{
+rc = -EINVAL;
+goto fail;
+}
 
 if ( !dt_device_is_available(iommu_spec.np) )
 break;
@@ -188,6 +193,8 @@ int iommu_add_dt_device(struct dt_device_node *np)
 if ( rc < 0 )
 iommu_fwspec_free(dev);
 
+fail:
+spin_unlock(&dtdevs_lock);
 return rc;
 }
 
-- 
2.17.1

[XEN][PATCH v6 16/19] xen/arm: Implement device tree node addition functionalities

2023-05-02 Thread Vikram Garhwal

Update sysctl XEN_SYSCTL_dt_overlay to enable support for dtbo nodes addition
using device tree overlay.

xl dt-overlay add file.dtbo:
Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
device_tree_flattened) is created and updated with overlay nodes. This
updated fdt is further unflattened to a dt_host_new. Next, it checks if any
of the overlay nodes already exists in the dt_host. If overlay nodes doesn't
exist then find the overlay nodes in dt_host_new, find the overlay node's
parent in dt_host and add the nodes as child under their parent in the
dt_host. The node is attached as the last node under target parent.

Finally, add IRQs, add device to IOMMUs, set permissions and map MMIO for 
the
overlay node.

When a node is added using overlay, a new entry is allocated in the
overlay_track to keep the track of memory allocation due to addition of overlay
node. This is helpful for freeing the memory allocated when a device tree node
is removed.

The main purpose of this to address first part of dynamic programming i.e.
making xen aware of new device tree node which means updating the dt_host with
overlay node information. Here we are adding/removing node from dt_host, and
checking/setting IOMMU and IRQ permission but never mapping them to any domain.
Right now, mapping/Un-mapping will happen only when a new domU is
created/destroyed using "xl create".

Signed-off-by: Vikram Garhwal 
---
 xen/common/dt-overlay.c | 510 
 1 file changed, 510 insertions(+)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index b89cceab84..09ea46111b 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -33,6 +33,25 @@ static struct dt_device_node *
 return child_node;
 }
 
+/*
+ * Returns next node to the input node. If node has children then return
+ * last descendant's next node.
+*/
+static struct dt_device_node *
+dt_find_next_node(struct dt_device_node *dt, const struct dt_device_node *node)
+{
+struct dt_device_node *np;
+
+dt_for_each_device_node(dt, np)
+if ( np == node )
+break;
+
+if ( np->child )
+np = find_last_descendants_node(np);
+
+return np->allnext;
+}
+
 static int dt_overlay_remove_node(struct dt_device_node *device_node)
 {
 struct dt_device_node *np;
@@ -106,6 +125,76 @@ static int dt_overlay_remove_node(struct dt_device_node 
*device_node)
 return 0;
 }
 
+static int dt_overlay_add_node(struct dt_device_node *device_node,
+   const char *parent_node_path)
+{
+struct dt_device_node *parent_node;
+struct dt_device_node *next_node;
+
+parent_node = dt_find_node_by_path(parent_node_path);
+
+if ( parent_node == NULL )
+{
+dt_dprintk("Parent node %s not found. Overlay node will not be 
added\n",
+   parent_node_path);
+return -EINVAL;
+}
+
+/* If parent has no child. */
+if ( parent_node->child == NULL )
+{
+next_node = parent_node->allnext;
+device_node->parent = parent_node;
+parent_node->allnext = device_node;
+parent_node->child = device_node;
+}
+else
+{
+struct dt_device_node *np;
+/* If parent has at least one child node.
+ * Iterate to the last child node of parent.
+ */
+for ( np = parent_node->child; np->sibling != NULL; np = np->sibling );
+
+/* Iterate over all child nodes of np node. */
+if ( np->child )
+{
+struct dt_device_node *np_last_descendant;
+
+np_last_descendant = find_last_descendants_node(np);
+
+next_node = np_last_descendant->allnext;
+np_last_descendant->allnext = device_node;
+}
+else
+{
+next_node = np->allnext;
+np->allnext = device_node;
+}
+
+device_node->parent = parent_node;
+np->sibling = device_node;
+np->sibling->sibling = NULL;
+}
+
+/* Iterate over all child nodes of device_node to add children too. */
+if ( device_node->child )
+{
+struct dt_device_node *device_node_last_descendant;
+
+device_node_last_descendant = find_last_descendants_node(device_node);
+/* Plug next_node at the end of last children of device_node. */
+device_node_last_descendant->allnext = next_node;
+}
+else
+{
+/* Now plug next_node at the end of device_node. */
+device_node->allnext = next_node;
+}
+
+return 0;
+}
+
 /* Basic sanity check for the dtbo tool stack provided to Xen. */
 static int check_overlay_fdt(const void *overlay_fdt, uint32_t 
overlay_fdt_size)
 {
@@ -145,6 +234,82 @@ static unsigned int overlay_node_count(const void 
*overlay_fdt)
 return num_overlay_nodes;
 }
 
+/*
+ * overlay_get_nodes_info gets full name with path for all the nodes which
+ * are in one level of __overlay__ tag. This is us

[XEN][PATCH v6 13/19] asm/smp.h: Fix circular dependency for device_tree.h and rwlock.h

2023-05-02 Thread Vikram Garhwal

Dynamic programming ops will modify the dt_host and there might be other
function which are browsing the dt_host at the same time. To avoid the race
conditions, adding rwlock for browsing the dt_host. But adding rwlock in
device_tree.h causes following circular dependency:
device_tree.h->rwlock.h->smp.h->asm/smp.h->device_tree.h

To fix this, removed the "#include  and forward declared
"struct dt_device_node".

Signed-off-by: Vikram Garhwal 
Reviewed-by: Henry Wang 
Reviewed-by: Michal Orzel 
---
 xen/arch/arm/include/asm/smp.h | 3 ++-
 xen/arch/arm/smpboot.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/include/asm/smp.h b/xen/arch/arm/include/asm/smp.h
index a37ca55bff..b12949ba8a 100644
--- a/xen/arch/arm/include/asm/smp.h
+++ b/xen/arch/arm/include/asm/smp.h
@@ -3,13 +3,14 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
 #include 
 #endif
 
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 
+struct dt_device_node;
+
 #define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
 
 #define smp_processor_id() get_processor_id()
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 4a89b3a834..255bbcc967 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.17.1

[XEN][PATCH v6 11/19] xen/iommu: Introduce iommu_remove_dt_device()

2023-05-02 Thread Vikram Garhwal

Remove master device from the IOMMU. This will be helpful when removing the
overlay nodes using dynamic programming during run time.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Michal Orzel 
---
 xen/drivers/passthrough/device_tree.c | 41 +++
 xen/include/xen/iommu.h   |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index f3867ef1a6..46f9080c8f 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -125,6 +125,47 @@ int iommu_release_dt_devices(struct domain *d)
 return 0;
 }
 
+int iommu_remove_dt_device(struct dt_device_node *np)
+{
+const struct iommu_ops *ops = iommu_get_ops();
+struct device *dev = dt_to_dev(np);
+int rc;
+
+if ( !ops )
+return -EOPNOTSUPP;
+
+spin_lock(&dtdevs_lock);
+
+if ( iommu_dt_device_is_assigned_locked(np) )
+{
+rc = -EBUSY;
+goto fail;
+}
+
+/*
+ * The driver which supports generic IOMMU DT bindings must have this
+ * callback implemented.
+ */
+if ( !ops->remove_device )
+{
+rc = -EOPNOTSUPP;
+goto fail;
+}
+
+/*
+ * Remove master device from the IOMMU if latter is present and available.
+ * The driver is responsible for removing is_protected flag.
+ */
+rc = ops->remove_device(0, dev);
+
+if ( !rc )
+iommu_fwspec_free(dev);
+
+fail:
+spin_unlock(&dtdevs_lock);
+return rc;
+}
+
 int iommu_add_dt_device(struct dt_device_node *np)
 {
 const struct iommu_ops *ops = iommu_get_ops();
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 76add226ec..6ba8d73966 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -219,6 +219,8 @@ int iommu_deassign_dt_device(struct domain *d, struct 
dt_device_node *dev);
 int iommu_dt_domain_init(struct domain *d);
 int iommu_release_dt_devices(struct domain *d);
 
+int iommu_remove_dt_device(struct dt_device_node *np);
+
 /*
  * Helper to add master device to the IOMMU using generic IOMMU DT bindings.
  *
-- 
2.17.1

[XEN][PATCH v6 14/19] common/device_tree: Add rwlock for dt_host

2023-05-02 Thread Vikram Garhwal

 Dynamic programming ops will modify the dt_host and there might be other
 function which are browsing the dt_host at the same time. To avoid the race
 conditions, adding rwlock for browsing the dt_host during runtime.

Signed-off-by: Vikram Garhwal 
---
 xen/common/device_tree.c  |  4 
 xen/drivers/passthrough/device_tree.c | 18 ++
 xen/include/xen/device_tree.h |  6 ++
 3 files changed, 28 insertions(+)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 426a809f42..48cb68bcd9 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2109,7 +2109,11 @@ int unflatten_device_tree(const void *fdt, struct 
dt_device_node **mynodes)
 
 dt_dprintk(" <- unflatten_device_tree()\n");
 
+/* Init r/w lock for host device tree. */
+rwlock_init(&dt_host->lock);
+
 return 0;
+
 }
 
 static void dt_alias_add(struct dt_alias_prop *ap,
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 46f9080c8f..e3be8e3f91 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -111,6 +111,8 @@ int iommu_release_dt_devices(struct domain *d)
 if ( !is_iommu_enabled(d) )
 return 0;
 
+read_lock(&dt_host->lock);
+
 list_for_each_entry_safe(dev, _dev, &hd->dt_devices, domain_list)
 {
 rc = iommu_deassign_dt_device(d, dev);
@@ -118,10 +120,14 @@ int iommu_release_dt_devices(struct domain *d)
 {
 dprintk(XENLOG_ERR, "Failed to deassign %s in domain %u\n",
 dt_node_full_name(dev), d->domain_id);
+
+read_unlock(&dt_host->lock);
 return rc;
 }
 }
 
+read_unlock(&dt_host->lock);
+
 return 0;
 }
 
@@ -245,6 +251,8 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 int ret;
 struct dt_device_node *dev;
 
+read_lock(&dt_host->lock);
+
 switch ( domctl->cmd )
 {
 case XEN_DOMCTL_assign_device:
@@ -294,7 +302,10 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 spin_unlock(&dtdevs_lock);
 
 if ( d == dom_io )
+{
+read_unlock(&dt_host->lock);
 return -EINVAL;
+}
 
 ret = iommu_add_dt_device(dev);
 if ( ret < 0 )
@@ -310,6 +321,8 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 printk(XENLOG_G_ERR "XEN_DOMCTL_assign_dt_device: assign \"%s\""
" to dom%u failed (%d)\n",
dt_node_full_name(dev), d->domain_id, ret);
+
+read_unlock(&dt_host->lock);
 break;
 
 case XEN_DOMCTL_deassign_device:
@@ -328,11 +341,15 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 break;
 
 ret = xsm_deassign_dtdevice(XSM_HOOK, d, dt_node_full_name(dev));
+
 if ( ret )
 break;
 
 if ( d == dom_io )
+{
+read_unlock(&dt_host->lock);
 return -EINVAL;
+}
 
 ret = iommu_deassign_dt_device(d, dev);
 
@@ -347,5 +364,6 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 break;
 }
 
+read_unlock(&dt_host->lock);
 return ret;
 }
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index d6366d3dac..e616dd7e9c 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DEVICE_TREE_MAX_DEPTH 16
 
@@ -106,6 +107,11 @@ struct dt_device_node {
 struct list_head domain_list;
 
 struct device dev;
+
+/*
+ * Lock that protects r/w updates to unflattened device tree i.e. dt_host.
+ */
+rwlock_t lock;
 };
 
 #define dt_to_dev(dt_node)  (&(dt_node)->dev)
-- 
2.17.1

[XEN][PATCH v6 12/19] xen/smmu: Add remove_device callback for smmu_iommu ops

2023-05-02 Thread Vikram Garhwal

Add remove_device callback for removing the device entry from smmu-master using
following steps:
1. Find if SMMU master exists for the device node.
2. Check if device is currently in use.
3. Remove the SMMU master.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
 xen/drivers/passthrough/arm/smmu.c | 58 ++
 1 file changed, 58 insertions(+)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 0a514821b3..39d3a5c345 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -816,6 +816,19 @@ static int insert_smmu_master(struct arm_smmu_device *smmu,
return 0;
 }
 
+static int remove_smmu_master(struct arm_smmu_device *smmu,
+ struct arm_smmu_master *master)
+{
+   if (!smmu->masters.rb_node) {
+   ASSERT_UNREACHABLE();
+   return -ENOENT;
+   }
+
+   rb_erase(&master->node, &smmu->masters);
+
+   return 0;
+}
+
 static int arm_smmu_dt_add_device_legacy(struct arm_smmu_device *smmu,
 struct device *dev,
 struct iommu_fwspec *fwspec)
@@ -853,6 +866,34 @@ static int arm_smmu_dt_add_device_legacy(struct 
arm_smmu_device *smmu,
return insert_smmu_master(smmu, master);
 }
 
+static int arm_smmu_dt_remove_device_legacy(struct arm_smmu_device *smmu,
+struct device *dev)
+{
+   struct arm_smmu_master *master;
+   struct device_node *dev_node = dev_get_dev_node(dev);
+   int ret;
+
+   master = find_smmu_master(smmu, dev_node);
+   if (master == NULL) {
+   dev_err(dev,
+   "No registrations found for master device %s\n",
+   dev_node->name);
+   return -EINVAL;
+   }
+
+   if (iommu_dt_device_is_assigned_locked(dev_to_dt(dev)))
+   return -EBUSY;
+
+   ret = remove_smmu_master(smmu, master);
+   if (ret)
+   return ret;
+
+   dev_node->is_protected = false;
+
+   kfree(master);
+   return 0;
+}
+
 static int register_smmu_master(struct arm_smmu_device *smmu,
struct device *dev,
struct of_phandle_args *masterspec)
@@ -876,6 +917,22 @@ static int register_smmu_master(struct arm_smmu_device 
*smmu,
 fwspec);
 }
 
+static int arm_smmu_dt_remove_device_generic(u8 devfn, struct device *dev)
+{
+   struct arm_smmu_device *smmu;
+   struct iommu_fwspec *fwspec;
+
+   fwspec = dev_iommu_fwspec_get(dev);
+   if (fwspec == NULL)
+   return -ENXIO;
+
+   smmu = find_smmu(fwspec->iommu_dev);
+   if (smmu == NULL)
+   return -ENXIO;
+
+   return arm_smmu_dt_remove_device_legacy(smmu, dev);
+}
+
 static int arm_smmu_dt_add_device_generic(u8 devfn, struct device *dev)
 {
struct arm_smmu_device *smmu;
@@ -2858,6 +2915,7 @@ static const struct iommu_ops arm_smmu_iommu_ops = {
 .init = arm_smmu_iommu_domain_init,
 .hwdom_init = arch_iommu_hwdom_init,
 .add_device = arm_smmu_dt_add_device_generic,
+.remove_device = arm_smmu_dt_remove_device_generic,
 .teardown = arm_smmu_iommu_domain_teardown,
 .iotlb_flush = arm_smmu_iotlb_flush,
 .assign_device = arm_smmu_assign_dev,
-- 
2.17.1

[XEN][PATCH v6 07/19] libfdt: overlay: change overlay_get_target()

2023-05-02 Thread Vikram Garhwal

Rename overlay_get_target() to fdt_overlay_target_offset() and remove static
function type.

This is done to get the target path for the overlay nodes which is very useful
in many cases. For example, Xen hypervisor needs it when applying overlays
because Xen needs to do further processing of the overlay nodes, e.g. mapping of
resources(IRQs and IOMMUs) to other VMs, creation of SMMU pagetables, etc.

Signed-off-by: Vikram Garhwal 
Message-Id: <1637204036-382159-2-git-send-email-fnu.vik...@xilinx.com>
Signed-off-by: David Gibson 
Origin: git://git.kernel.org/pub/scm/utils/dtc/dtc.git 45f3d1a095dd

Signed-off-by: Vikram Garhwal 
Reviewed-by: Michal Orzel 
---
 xen/common/libfdt/fdt_overlay.c | 29 +++--
 xen/common/libfdt/version.lds   |  1 +
 xen/include/xen/libfdt/libfdt.h | 18 ++
 3 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/xen/common/libfdt/fdt_overlay.c b/xen/common/libfdt/fdt_overlay.c
index 7b95e2b639..acf0c4c2a6 100644
--- a/xen/common/libfdt/fdt_overlay.c
+++ b/xen/common/libfdt/fdt_overlay.c
@@ -41,37 +41,22 @@ static uint32_t overlay_get_target_phandle(const void 
*fdto, int fragment)
return fdt32_to_cpu(*val);
 }
 
-/**
- * overlay_get_target - retrieves the offset of a fragment's target
- * @fdt: Base device tree blob
- * @fdto: Device tree overlay blob
- * @fragment: node offset of the fragment in the overlay
- * @pathp: pointer which receives the path of the target (or NULL)
- *
- * overlay_get_target() retrieves the target offset in the base
- * device tree of a fragment, no matter how the actual targeting is
- * done (through a phandle or a path)
- *
- * returns:
- *  the targeted node offset in the base device tree
- *  Negative error code on error
- */
-static int overlay_get_target(const void *fdt, const void *fdto,
- int fragment, char const **pathp)
+int fdt_overlay_target_offset(const void *fdt, const void *fdto,
+ int fragment_offset, char const **pathp)
 {
uint32_t phandle;
const char *path = NULL;
int path_len = 0, ret;
 
/* Try first to do a phandle based lookup */
-   phandle = overlay_get_target_phandle(fdto, fragment);
+   phandle = overlay_get_target_phandle(fdto, fragment_offset);
if (phandle == (uint32_t)-1)
return -FDT_ERR_BADPHANDLE;
 
/* no phandle, try path */
if (!phandle) {
/* And then a path based lookup */
-   path = fdt_getprop(fdto, fragment, "target-path", &path_len);
+   path = fdt_getprop(fdto, fragment_offset, "target-path", 
&path_len);
if (path)
ret = fdt_path_offset(fdt, path);
else
@@ -638,7 +623,7 @@ static int overlay_merge(void *fdt, void *fdto)
if (overlay < 0)
return overlay;
 
-   target = overlay_get_target(fdt, fdto, fragment, NULL);
+   target = fdt_overlay_target_offset(fdt, fdto, fragment, NULL);
if (target < 0)
return target;
 
@@ -781,7 +766,7 @@ static int overlay_symbol_update(void *fdt, void *fdto)
return -FDT_ERR_BADOVERLAY;
 
/* get the target of the fragment */
-   ret = overlay_get_target(fdt, fdto, fragment, &target_path);
+   ret = fdt_overlay_target_offset(fdt, fdto, fragment, 
&target_path);
if (ret < 0)
return ret;
target = ret;
@@ -803,7 +788,7 @@ static int overlay_symbol_update(void *fdt, void *fdto)
 
if (!target_path) {
/* again in case setprop_placeholder changed it */
-   ret = overlay_get_target(fdt, fdto, fragment, 
&target_path);
+   ret = fdt_overlay_target_offset(fdt, fdto, fragment, 
&target_path);
if (ret < 0)
return ret;
target = ret;
diff --git a/xen/common/libfdt/version.lds b/xen/common/libfdt/version.lds
index 7ab85f1d9d..cbce5d4a8b 100644
--- a/xen/common/libfdt/version.lds
+++ b/xen/common/libfdt/version.lds
@@ -77,6 +77,7 @@ LIBFDT_1.2 {
fdt_appendprop_addrrange;
fdt_setprop_inplace_namelen_partial;
fdt_create_with_flags;
+   fdt_overlay_target_offset;
local:
*;
 };
diff --git a/xen/include/xen/libfdt/libfdt.h b/xen/include/xen/libfdt/libfdt.h
index c71689e2be..fabddbee8c 100644
--- a/xen/include/xen/libfdt/libfdt.h
+++ b/xen/include/xen/libfdt/libfdt.h
@@ -2109,6 +2109,24 @@ int fdt_del_node(void *fdt, int nodeoffset);
  */
 int fdt_overlay_apply(void *fdt, void *fdto);
 
+/**
+ * fdt_overlay_target_offset - retrieves the offset of a fragment's target
+ * @fdt: Base device tree blob
+ * @fdto: Device tree overlay blob
+ * @fragment_offset: node offse

[XEN][PATCH v6 08/19] xen/device-tree: Add device_tree_find_node_by_path() to find nodes in device tree

2023-05-02 Thread Vikram Garhwal

Add device_tree_find_node_by_path() to find a matching node with path for a
dt_device_node.

Reason behind this function:
Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
device_tree_flattened) is created and updated with overlay nodes. This
updated fdt is further unflattened to a dt_host_new. Next, we need to find
the overlay nodes in dt_host_new, find the overlay node's parent in dt_host
and add the nodes as child under their parent in the dt_host. Thus we need
this function to search for node in different unflattened device trees.

Also, make dt_find_node_by_path() static inline.

Signed-off-by: Vikram Garhwal 
---
 xen/common/device_tree.c  |  5 +++--
 xen/include/xen/device_tree.h | 17 +++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 47ab2f7940..426a809f42 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -358,11 +358,12 @@ struct dt_device_node *dt_find_node_by_type(struct 
dt_device_node *from,
 return np;
 }
 
-struct dt_device_node *dt_find_node_by_path(const char *path)
+struct dt_device_node *device_tree_find_node_by_path(struct dt_device_node *dt,
+ const char *path)
 {
 struct dt_device_node *np;
 
-dt_for_each_device_node(dt_host, np)
+dt_for_each_device_node(dt, np)
 if ( np->full_name && (dt_node_cmp(np->full_name, path) == 0) )
 break;
 
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index eef0335b79..d6366d3dac 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -534,13 +534,26 @@ struct dt_device_node *dt_find_node_by_type(struct 
dt_device_node *from,
 struct dt_device_node *dt_find_node_by_alias(const char *alias);
 
 /**
- * dt_find_node_by_path - Find a node matching a full DT path
+ * device_tree_find_node_by_path - Generic function to find a node matching the
+ * full DT path for any given unflatten device tree
+ * @dt_node: The device tree to search
  * @path: The full path to match
  *
  * Returns a node pointer.
  */
-struct dt_device_node *dt_find_node_by_path(const char *path);
+struct dt_device_node *device_tree_find_node_by_path(struct dt_device_node *dt,
+ const char *path);
 
+/**
+ * dt_find_node_by_path - Find a node matching a full DT path in dt_host
+ * @path: The full path to match
+ *
+ * Returns a node pointer.
+ */
+static inline struct dt_device_node *dt_find_node_by_path(const char *path)
+{
+return device_tree_find_node_by_path(dt_host, path);
+}
 
 /**
  * dt_find_node_by_gpath - Same as dt_find_node_by_path but retrieve the
-- 
2.17.1

[XEN][PATCH v6 09/19] xen/iommu: Move spin_lock from iommu_dt_device_is_assigned to caller

2023-05-02 Thread Vikram Garhwal

Rename iommu_dt_device_is_assigned() to iommu_dt_device_is_assigned_locked().
Remove static type so this can also be used by SMMU drivers to check if the
device is being used before removing.

Moving spin_lock to caller was done to prevent the concurrent access to
iommu_dt_device_is_assigned while doing add/remove/assign/deassign.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Luca Fancellu 
---
 xen/drivers/passthrough/device_tree.c | 19 +++
 xen/include/xen/iommu.h   |  1 +
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 1c32d7b50c..c386fda3e4 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -83,16 +83,14 @@ fail:
 return rc;
 }
 
-static bool_t iommu_dt_device_is_assigned(const struct dt_device_node *dev)
+bool_t iommu_dt_device_is_assigned_locked(const struct dt_device_node *dev)
 {
 bool_t assigned = 0;
 
 if ( !dt_device_is_protected(dev) )
 return 0;
 
-spin_lock(&dtdevs_lock);
 assigned = !list_empty(&dev->domain_list);
-spin_unlock(&dtdevs_lock);
 
 return assigned;
 }
@@ -213,27 +211,40 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 if ( (d && d->is_dying) || domctl->u.assign_device.flags )
 break;
 
+spin_lock(&dtdevs_lock);
+
 ret = dt_find_node_by_gpath(domctl->u.assign_device.u.dt.path,
 domctl->u.assign_device.u.dt.size,
 &dev);
 if ( ret )
+{
+spin_unlock(&dtdevs_lock);
 break;
+}
 
 ret = xsm_assign_dtdevice(XSM_HOOK, d, dt_node_full_name(dev));
 if ( ret )
+{
+spin_unlock(&dtdevs_lock);
 break;
+}
 
 if ( domctl->cmd == XEN_DOMCTL_test_assign_device )
 {
-if ( iommu_dt_device_is_assigned(dev) )
+
+if ( iommu_dt_device_is_assigned_locked(dev) )
 {
 printk(XENLOG_G_ERR "%s already assigned.\n",
dt_node_full_name(dev));
 ret = -EINVAL;
 }
+
+spin_unlock(&dtdevs_lock);
 break;
 }
 
+spin_unlock(&dtdevs_lock);
+
 if ( d == dom_io )
 return -EINVAL;
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 405db59971..76add226ec 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -214,6 +214,7 @@ struct msi_msg;
 #include 
 
 int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev);
+bool_t iommu_dt_device_is_assigned_locked(const struct dt_device_node *dev);
 int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev);
 int iommu_dt_domain_init(struct domain *d);
 int iommu_release_dt_devices(struct domain *d);
-- 
2.17.1

[XEN][PATCH v6 06/19] libfdt: Keep fdt functions after init for CONFIG_OVERLAY_DTB.

2023-05-02 Thread Vikram Garhwal

This is done to access fdt library function which are required for adding device
tree overlay nodes for dynamic programming of nodes.

Signed-off-by: Vikram Garhwal 
Acked-by: Julien Grall 
---
 xen/common/libfdt/Makefile | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/common/libfdt/Makefile b/xen/common/libfdt/Makefile
index 75aaefa2e3..d50487aa6e 100644
--- a/xen/common/libfdt/Makefile
+++ b/xen/common/libfdt/Makefile
@@ -1,7 +1,11 @@
 include $(src)/Makefile.libfdt
 
 SECTIONS := text data $(SPECIAL_DATA_SECTIONS)
+
+# For CONFIG_OVERLAY_DTB, libfdt functionalities will be needed during runtime.
+ifneq ($(CONFIG_OVERLAY_DTB),y)
 OBJCOPYFLAGS := $(foreach s,$(SECTIONS),--rename-section .$(s)=.init.$(s))
+endif
 
 obj-y += libfdt.o
 nocov-y += libfdt.o
-- 
2.17.1

[XEN][PATCH v6 04/19] common/device_tree.c: unflatten_device_tree() propagate errors

2023-05-02 Thread Vikram Garhwal

This will be useful in dynamic node programming when new dt nodes are unflatten
during runtime. Invalid device tree node related errors should be propagated
back to the caller.

Signed-off-by: Vikram Garhwal 
---
 xen/common/device_tree.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 5daf5197bd..47ab2f7940 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2071,6 +2071,9 @@ int unflatten_device_tree(const void *fdt, struct 
dt_device_node **mynodes)
 /* First pass, scan for size */
 start = ((unsigned long)fdt) + fdt_off_dt_struct(fdt);
 size = unflatten_dt_node(fdt, 0, &start, NULL, NULL, 0);
+if ( !size )
+return -EINVAL;
+
 size = (size | 3) + 1;
 
 dt_dprintk("  size is %#lx allocating...\n", size);
@@ -2088,11 +2091,19 @@ int unflatten_device_tree(const void *fdt, struct 
dt_device_node **mynodes)
 start = ((unsigned long)fdt) + fdt_off_dt_struct(fdt);
 unflatten_dt_node(fdt, mem, &start, NULL, &allnextp, 0);
 if ( be32_to_cpup((__be32 *)start) != FDT_END )
-printk(XENLOG_WARNING "Weird tag at end of tree: %08x\n",
+{
+printk(XENLOG_ERR "Weird tag at end of tree: %08x\n",
   *((u32 *)start));
+return -EINVAL;
+}
+
 if ( be32_to_cpu(((__be32 *)mem)[size / 4]) != 0xdeadbeef )
-printk(XENLOG_WARNING "End of tree marker overwritten: %08x\n",
+{
+printk(XENLOG_ERR "End of tree marker overwritten: %08x\n",
   be32_to_cpu(((__be32 *)mem)[size / 4]));
+return -EINVAL;
+}
+
 *allnextp = NULL;
 
 dt_dprintk(" <- unflatten_device_tree()\n");
-- 
2.17.1

[XEN][PATCH v6 03/19] common/device_tree: change __unflatten_device_tree() type

2023-05-02 Thread Vikram Garhwal

Following changes are done to __unflatten_device_tree():
1. __unflatten_device_tree() is renamed to unflatten_device_tree().
2. Remove __init and static function type.

Signed-off-by: Vikram Garhwal 
---
 xen/common/device_tree.c  | 9 -
 xen/include/xen/device_tree.h | 5 +
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index fc38a0b3dd..5daf5197bd 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2047,7 +2047,7 @@ static unsigned long unflatten_dt_node(const void *fdt,
 }
 
 /**
- * __unflatten_device_tree - create tree of device_nodes from flat blob
+ * unflatten_device_tree - create tree of device_nodes from flat blob
  *
  * unflattens a device-tree, creating the
  * tree of struct device_node. It also fills the "name" and "type"
@@ -2056,8 +2056,7 @@ static unsigned long unflatten_dt_node(const void *fdt,
  * @fdt: The fdt to expand
  * @mynodes: The device_node tree created by the call
  */
-static int __init __unflatten_device_tree(const void *fdt,
-  struct dt_device_node **mynodes)
+int unflatten_device_tree(const void *fdt, struct dt_device_node **mynodes)
 {
 unsigned long start, mem, size;
 struct dt_device_node **allnextp = mynodes;
@@ -2183,9 +2182,9 @@ dt_find_interrupt_controller(const struct dt_device_match 
*matches)
 
 void __init dt_unflatten_host_device_tree(void)
 {
-int error = __unflatten_device_tree(device_tree_flattened, &dt_host);
+int error = unflatten_device_tree(device_tree_flattened, &dt_host);
 if ( error )
-panic("__unflatten_device_tree failed with error %d\n", error);
+panic("unflatten_device_tree failed with error %d\n", error);
 
 dt_alias_scan();
 }
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index 19a74909ce..eef0335b79 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -178,6 +178,11 @@ int device_tree_for_each_node(const void *fdt, int node,
  */
 void dt_unflatten_host_device_tree(void);
 
+/**
+ * unflatten any device tree.
+ */
+int unflatten_device_tree(const void *fdt, struct dt_device_node **mynodes);
+
 /**
  * IRQ translation callback
  * TODO: For the moment we assume that we only have ONE
-- 
2.17.1

[XEN][PATCH v6 05/19] xen/arm: Add CONFIG_OVERLAY_DTB

2023-05-02 Thread Vikram Garhwal

Introduce a config option where the user can enable support for adding/removing
device tree nodes using a device tree binary overlay.

Signed-off-by: Vikram Garhwal 
---
 SUPPORT.md   | 6 ++
 xen/arch/arm/Kconfig | 5 +
 2 files changed, 11 insertions(+)

diff --git a/SUPPORT.md b/SUPPORT.md
index aa1940e55f..e40ec4fba2 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -822,6 +822,12 @@ No support for QEMU backends in a 16K or 64K domain.
 
 Status: Supported
 
+### Device Tree Overlays
+
+Add/Remove device tree nodes using a device tree overlay binary(.dtbo).
+
+Status, ARM: Experimental
+
 ### ARM: Guest ACPI support
 
 Status: Supported
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 239d3aed3c..1fe3d698a5 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -53,6 +53,11 @@ config HAS_ITS
 bool "GICv3 ITS MSI controller support (UNSUPPORTED)" if UNSUPPORTED
 depends on GICV3 && !NEW_VGIC && !ARM_32
 
+config OVERLAY_DTB
+   bool "DTB overlay support (UNSUPPORTED)" if UNSUPPORTED
+   help
+ Dynamic addition/removal of Xen device tree nodes using a dtbo.
+
 config HVM
 def_bool y
 
-- 
2.17.1

[XEN][PATCH v6 02/19] common/device_tree: handle memory allocation failure in __unflatten_device_tree()

2023-05-02 Thread Vikram Garhwal

Change __unflatten_device_tree() return type to integer so it can propagate
memory allocation failure. Add panic() in dt_unflatten_host_device_tree() for
memory allocation failure during boot.

Signed-off-by: Vikram Garhwal 
---
 xen/common/device_tree.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 5f7ae45304..fc38a0b3dd 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2056,8 +2056,8 @@ static unsigned long unflatten_dt_node(const void *fdt,
  * @fdt: The fdt to expand
  * @mynodes: The device_node tree created by the call
  */
-static void __init __unflatten_device_tree(const void *fdt,
-   struct dt_device_node **mynodes)
+static int __init __unflatten_device_tree(const void *fdt,
+  struct dt_device_node **mynodes)
 {
 unsigned long start, mem, size;
 struct dt_device_node **allnextp = mynodes;
@@ -2078,6 +2078,8 @@ static void __init __unflatten_device_tree(const void 
*fdt,
 
 /* Allocate memory for the expanded device tree */
 mem = (unsigned long)_xmalloc (size + 4, __alignof__(struct 
dt_device_node));
+if ( !mem )
+return -ENOMEM;
 
 ((__be32 *)mem)[size / 4] = cpu_to_be32(0xdeadbeef);
 
@@ -2095,6 +2097,8 @@ static void __init __unflatten_device_tree(const void 
*fdt,
 *allnextp = NULL;
 
 dt_dprintk(" <- unflatten_device_tree()\n");
+
+return 0;
 }
 
 static void dt_alias_add(struct dt_alias_prop *ap,
@@ -2179,7 +2183,10 @@ dt_find_interrupt_controller(const struct 
dt_device_match *matches)
 
 void __init dt_unflatten_host_device_tree(void)
 {
-__unflatten_device_tree(device_tree_flattened, &dt_host);
+int error = __unflatten_device_tree(device_tree_flattened, &dt_host);
+if ( error )
+panic("__unflatten_device_tree failed with error %d\n", error);
+
 dt_alias_scan();
 }
 
-- 
2.17.1

[XEN][PATCH v6 01/19] xen/arm/device: Remove __init from function type

2023-05-02 Thread Vikram Garhwal

Remove __init from following function to access during runtime:
1. map_irq_to_domain()
2. handle_device_interrupts()
3. map_range_to_domain()
4. unflatten_dt_node()

Move map_irq_to_domain() prototype from domain_build.h to setup.h.

To avoid breaking the build, following changes are also done:
1. Move map_irq_to_domain(), handle_device_interrupts() and 
map_range_to_domain()
to device.c. After removing __init type,  these functions are not specific
to domain building, so moving them out of domain_build.c to device.c.
2. Remove static type from handle_device_interrupt().

Overall, these changes are done to support the dynamic programming of a nodes
where an overlay node will be added to fdt and unflattened node will be added to
dt_host. Furthermore, IRQ and mmio mapping will be done for the added node.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Michal Orzel 
---
 xen/arch/arm/device.c   | 144 
 xen/arch/arm/domain_build.c | 142 ---
 xen/arch/arm/include/asm/domain_build.h |   2 -
 xen/arch/arm/include/asm/setup.h|   6 +
 xen/common/device_tree.c|  12 +-
 5 files changed, 156 insertions(+), 150 deletions(-)

diff --git a/xen/arch/arm/device.c b/xen/arch/arm/device.c
index ca8539dee5..84197981a0 100644
--- a/xen/arch/arm/device.c
+++ b/xen/arch/arm/device.c
@@ -9,8 +9,10 @@
  */
 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 
 extern const struct device_desc _sdevice[], _edevice[];
@@ -75,6 +77,148 @@ enum device_class device_get_class(const struct 
dt_device_node *dev)
 return DEVICE_UNKNOWN;
 }
 
+int map_irq_to_domain(struct domain *d, unsigned int irq,
+  bool need_mapping, const char *devname)
+{
+int res;
+
+res = irq_permit_access(d, irq);
+if ( res )
+{
+printk(XENLOG_ERR "Unable to permit to dom%u access to IRQ %u\n",
+   d->domain_id, irq);
+return res;
+}
+
+if ( need_mapping )
+{
+/*
+ * Checking the return of vgic_reserve_virq is not
+ * necessary. It should not fail except when we try to map
+ * the IRQ twice. This can legitimately happen if the IRQ is shared
+ */
+vgic_reserve_virq(d, irq);
+
+res = route_irq_to_guest(d, irq, irq, devname);
+if ( res < 0 )
+{
+printk(XENLOG_ERR "Unable to map IRQ%"PRId32" to dom%d\n",
+   irq, d->domain_id);
+return res;
+}
+}
+
+dt_dprintk("  - IRQ: %u\n", irq);
+return 0;
+}
+
+int map_range_to_domain(const struct dt_device_node *dev,
+u64 addr, u64 len, void *data)
+{
+struct map_range_data *mr_data = data;
+struct domain *d = mr_data->d;
+int res;
+
+/*
+ * reserved-memory regions are RAM carved out for a special purpose.
+ * They are not MMIO and therefore a domain should not be able to
+ * manage them via the IOMEM interface.
+ */
+if ( strncasecmp(dt_node_full_name(dev), "/reserved-memory/",
+ strlen("/reserved-memory/")) != 0 )
+{
+res = iomem_permit_access(d, paddr_to_pfn(addr),
+paddr_to_pfn(PAGE_ALIGN(addr + len - 1)));
+if ( res )
+{
+printk(XENLOG_ERR "Unable to permit to dom%d access to"
+" 0x%"PRIx64" - 0x%"PRIx64"\n",
+d->domain_id,
+addr & PAGE_MASK, PAGE_ALIGN(addr + len) - 1);
+return res;
+}
+}
+
+if ( !mr_data->skip_mapping )
+{
+res = map_regions_p2mt(d,
+   gaddr_to_gfn(addr),
+   PFN_UP(len),
+   maddr_to_mfn(addr),
+   mr_data->p2mt);
+
+if ( res < 0 )
+{
+printk(XENLOG_ERR "Unable to map 0x%"PRIx64
+   " - 0x%"PRIx64" in domain %d\n",
+   addr & PAGE_MASK, PAGE_ALIGN(addr + len) - 1,
+   d->domain_id);
+return res;
+}
+}
+
+dt_dprintk("  - MMIO: %010"PRIx64" - %010"PRIx64" P2MType=%x\n",
+   addr, addr + len, mr_data->p2mt);
+
+return 0;
+}
+
+/*
+ * handle_device_interrupts retrieves the interrupts configuration from
+ * a device tree node and maps those interrupts to the target domain.
+ *
+ * Returns:
+ *   < 0 error
+ *   0   success
+ */
+int handle_device_interrupts(struct domain *d,
+ struct dt_device_node *dev,
+ bool need_mapping)
+{
+unsigned int i, nirq;
+int res;
+struct dt_raw_irq rirq;
+
+nirq = dt_number_of_irq(dev);
+
+/* Give permission and map IRQs */
+for ( i = 0; i < nirq; i++ )
+{
+res = dt_device_get_raw_irq(dev, i, &rirq);
+if ( res )
+{
+printk(XENLOG_ERR "Unable to retrieve irq %u for

[XEN][PATCH v6 00/19] dynamic node programming using overlay dtbo

2023-05-02 Thread Vikram Garhwal

Hi,
This patch series is for introducing dynamic programming i.e. add/remove the
devices during run time. Using "xl dt_overlay" a device can be added/removed
with dtbo.

For adding a node using dynamic programming:
1. flatten device tree overlay node will be added to a fdt
2. Updated fdt will be unflattened to a new dt_host_new
3. Extract the newly added node information from dt_host_new
4. Add the added node under correct parent in original dt_host.
3. Map/Permit interrupt and iomem region as required.

For removing a node:
1. Find the node with given path.
2. Check if the node is used by any of domus. Removes the node only when
it's not used by any domain.
3. Removes IRQ permissions and MMIO access.
5. Find the node in dt_host and delete the device node entry from dt_host.
6. Free the overlay_tracker entry which means free dt_host_new also(created
in adding node step).

The main purpose of this series to address first part of the dynamic programming
i.e. making Xen aware of new device tree node which means updating the dt_host
with overlay node information. Here we are adding/removing node from dt_host,
and checking/set IOMMU and IRQ permission but never mapping them to any domain.
Right now, mapping/Un-mapping will happen only when a new domU is
created/destroyed using "xl create".

To map IOREQ and IOMMU during runtime, there will be another small series after
this one where we will do the actual IOMMU and IRQ mapping to a running domain
and will call unmap_mmio_regions() to remove the mapping.

Change Log:
 v5 -> v6:
Add separate patch for memory allocation failure in 
__unflatten_device_tree().
Move __unflatten_device_tree() function type changes to single patch.
Add error propagation for failures in unflatten_dt_node.
Change CONFIG_OVERLAY_DTB status to "ARM: Tech Preview".
xen/smmu: Add remove_device callback for smmu_iommu ops:
Added check to see if device is currently used.
common/device_tree: Add rwlock for dt_host:
Addressed feedback from Henry to rearrange code.
xen/arm: Implement device tree node removal functionalities:
Changed file name to dash format.
Addressed Michal's comments.
Rectified formatting related errors pointed by Michal.

 v4 -> v5:
Split patch 01/16 to two patches. One with function type changes and another
with changes inside unflatten_device_tree().
Change dt_overlay xl command to dt-overlay.
Protect overlay functionality with CONFIG(arm).
Fix rwlock issues.
Move include "device_tree.h" to c file where arch_cpu_init() is called and
forward declare dt_device_node. This was done to avoid circular deps b/w
device_tree.h and rwlock.h
Address Michal's comment on coding style.

 v3 -> v4:
Add support for adding node's children.
Add rwlock to dt_host functions.
Corrected fdt size issue when applying overlay into it.
Add memory allocation fail handling for unflatten_device_tree().
Changed xl overlay to xl dt_overlay.
Correct commit messages.
Addressed code issue from v3 review.

 v2 -> v3:
Moved overlay functionalities to dt_overlay.c file.
Renamed XEN_SYSCTL_overlay to XEN_SYSCTL_dt_overlay.
Add dt_* prefix to overlay_add/remove_nodes.
Added dtdevs_lock to protect iommu_add_dt_device().
For iommu, moved spin_lock to caller.
Address code issue from v2 review.

 v1 -> v2:
Add support for multiple node addition/removal using dtbo.
Replaced fpga-add and fpga-remove with one hypercall overlay_op.
Moved common domain_build.c function to device.c
Add OVERLAY_DTB configuration.
Renamed overlay_get_target() to fdt_overlay_get_target().
Split remove_device patch into two patches.
Moved overlay_add/remove code to sysctl and changed it from domctl to 
sysctl.
Added all overlay code under CONFIG_OVERLAY_DTB
Renamed all tool domains fpga function to overlay
Addressed code issues from v1 review.

Regards,
Vikram

Vikram Garhwal (19):
  xen/arm/device: Remove __init from function type
  common/device_tree: handle memory allocation failure in
__unflatten_device_tree()
  common/device_tree: change __unflatten_device_tree() type
  common/device_tree.c: unflatten_device_tree() propagate errors
  xen/arm: Add CONFIG_OVERLAY_DTB
  libfdt: Keep fdt functions after init for CONFIG_OVERLAY_DTB.
  libfdt: overlay: change overlay_get_target()
  xen/device-tree: Add device_tree_find_node_by_path() to find nodes in
device tree
  xen/iommu: Move spin_lock from iommu_dt_device_is_assigned to caller
  xen/iommu: protect iommu_add_dt_device() with dtdevs_lock
  xen/iommu: Introduce iommu_remove_dt_device()
  xen/smmu: Add remove_device callback for smmu_iommu ops
  asm/smp.h: Fix circular dependency for device_tree.h and rwlock.h
  common/device_tree: Add rwlock for dt_host
  xen/arm: Implement device tree node removal functionalities
  xen/arm: Implement de

Re: [PATCH v3 2/2] acpi: Add TPM2 interface definition.

2023-05-02 Thread Jennifer Herbert


On 02/05/2023 14:41, Jan Beulich wrote:

On 25.04.2023 19:47, Jennifer Herbert wrote:

--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -121,6 +121,36 @@ struct acpi_20_tcpa {
  };
  #define ACPI_2_0_TCPA_LAML_SIZE (64*1024)
  
+/*

+ * TPM2
+ */

Nit: While I'm willing to accept the comment style violation here as
(apparently) intentional, ...


Well, I was trying to keep the file consistant.   As far as I can tell, 
this styling is used thoughout the file - unless I'm misunderstanding 
your 'Nit'. (You object to a multi-line coment used for a single line? )

But I'm codes style blind, so just say how you want it.



+struct acpi_20_tpm2 {
+struct acpi_header header;
+uint16_t platform_class;
+uint16_t reserved;
+uint64_t control_area_address;
+uint32_t start_method;
+uint8_t start_method_params[12];
+uint32_t log_area_minimum_length;
+uint64_t log_area_start_address;
+};
+#define TPM2_ACPI_CLASS_CLIENT  0
+#define TPM2_START_METHOD_CRB   7
+
+/* TPM register I/O Mapped region, location of which defined in the
+ * TCG PC Client Platform TPM Profile Specification for TPM 2.0.
+ * See table 9 - Only Locality 0 is used here. This is emulated by QEMU.
+ * Definition of Register space is found in table 12.
+ */

... this comment wants adjusting to hypervisor style (/* on its own line),
as that looks to be the aimed-at style in this file.


Will do.



@@ -352,6 +353,7 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
  struct acpi_20_tcpa *tcpa;
  unsigned char *ssdt;
  void *lasa;
+struct acpi_20_tpm2 *tpm2;

Could I talk you into moving this up by two lines, such that it'll be
adjacent to "tcpa"?



No problem.



@@ -450,6 +452,43 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
   tcpa->header.length);
  }
  break;
+
+case 2:
+/* Check VID stored in bits 37:32 (3rd 16 bit word) of CRB
+ * identifier register.  See table 16 of TCG PC client platform
+ * TPM profile specification for TPM 2.0.
+ */

Nit: This comment again wants a style adjustment.


ok



--- /dev/null
+++ b/tools/libacpi/ssdt_tpm2.asl
@@ -0,0 +1,36 @@
+/*
+ * ssdt_tpm2.asl
+ *
+ * Copyright (c) 2018-2022, Citrix Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */

While the full conversion to SPDX was done in the hypervisor only so far,
I think new tool stack source files would better use the much shorter
SPDX equivalent, too.


OK, this is where I get a bit confused.  I belive I copied the licence 
from ssdt_tpm.asl, for consistancy.


So I think i need to use a 'LGPL-2.1-only' but then it says its using 
exceptions on linking as discribed in LICENSE, but um, which LICENSE 
file?  So i'm not sure what exception I should be adding. Do you know?




Then on top of Jason's R-b,
Acked-by: Jan Beulich 

Jan



Thanks,

-jenny

Re: [PATCH v3 1/2] acpi: Make TPM version configurable.

2023-05-02 Thread Jennifer Herbert




On 02/05/2023 12:54, Jan Beulich wrote:

On 25.04.2023 19:47, Jennifer Herbert wrote:

This patch makes the TPM version, for which the ACPI libary probes, 
configurable.
If acpi_config.tpm_verison is set to 1, it indicates that 1.2 (TCPA) should be 
probed.
I have also added to hvmloader an option to allow setting this new config, 
which can
be triggered by setting the platform/tpm_version xenstore key.

Signed-off-by: Jennifer Herbert 
---
  docs/misc/xenstore-paths.pandoc |  9 +
  tools/firmware/hvmloader/util.c | 19 ++---
  tools/libacpi/build.c   | 69 +++--
  tools/libacpi/libacpi.h |  3 +-
  4 files changed, 64 insertions(+), 36 deletions(-)

Please can you get used to providing a brief rev log somewhere here?


Yes, ok.


--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -994,13 +994,22 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
  if ( !strncmp(xenstore_read("platform/acpi_laptop_slate", "0"), "1", 1)  )
  config->table_flags |= ACPI_HAS_SSDT_LAPTOP_SLATE;
  
-config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC |

-ACPI_HAS_WAET | ACPI_HAS_PMTIMER |
-ACPI_HAS_BUTTONS | ACPI_HAS_VGA |
-ACPI_HAS_8042 | ACPI_HAS_CMOS_RTC);
+config->table_flags |= (ACPI_HAS_IOAPIC | ACPI_HAS_WAET |
+ACPI_HAS_PMTIMER | ACPI_HAS_BUTTONS |
+ACPI_HAS_VGA | ACPI_HAS_8042 |
+ACPI_HAS_CMOS_RTC);
  config->acpi_revision = 4;
  
-config->tis_hdr = (uint16_t *)ACPI_TIS_HDR_ADDRESS;

+s = xenstore_read("platform/tpm_version", "1");
+config->tpm_version = strtoll(s, NULL, 0);

Due to field width, someone specifying 257 will also get a 1.2 TPM,
if I'm not mistaken.


Seems likely.   And i few other wacky values would give you 1.2 as well 
I'd think.   There could also be trailing junk on the version number.


I was a bit phased by the lack of any real error cases in 
hvmloader_acpi_build_tables.  It seemed the approch was if you put in 
junk, you'll get something, but possibly not what your expecting.


Do I take it you'd prefer it to only accept a strict '1' for 1.2 and any 
other value would result in no TPM being probed?  Or is it only the 
overflow cases your concerned about?




+switch( config->tpm_version )

Nit: Style (missing blank).

yup

--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -409,38 +409,47 @@ static int construct_secondary_tables(struct acpi_ctxt 
*ctxt,
  memcpy(ssdt, ssdt_laptop_slate, sizeof(ssdt_laptop_slate));
  table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt);
  }
-
-/* TPM TCPA and SSDT. */
-if ( (config->table_flags & ACPI_HAS_TCPA) &&
- (config->tis_hdr[0] != 0 && config->tis_hdr[0] != 0x) &&
- (config->tis_hdr[1] != 0 && config->tis_hdr[1] != 0x) )
+/* TPM and its SSDT. */
+if ( config->table_flags & ACPI_HAS_TPM )
  {
-ssdt = ctxt->mem_ops.alloc(ctxt, sizeof(ssdt_tpm), 16);
-if (!ssdt) return -1;
-memcpy(ssdt, ssdt_tpm, sizeof(ssdt_tpm));
-table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt);
-
-tcpa = ctxt->mem_ops.alloc(ctxt, sizeof(struct acpi_20_tcpa), 16);
-if (!tcpa) return -1;
-memset(tcpa, 0, sizeof(*tcpa));
-table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, tcpa);
-
-tcpa->header.signature = ACPI_2_0_TCPA_SIGNATURE;
-tcpa->header.length= sizeof(*tcpa);
-tcpa->header.revision  = ACPI_2_0_TCPA_REVISION;
-fixed_strcpy(tcpa->header.oem_id, ACPI_OEM_ID);
-fixed_strcpy(tcpa->header.oem_table_id, ACPI_OEM_TABLE_ID);
-tcpa->header.oem_revision = ACPI_OEM_REVISION;
-tcpa->header.creator_id   = ACPI_CREATOR_ID;
-tcpa->header.creator_revision = ACPI_CREATOR_REVISION;
-if ( (lasa = ctxt->mem_ops.alloc(ctxt, ACPI_2_0_TCPA_LAML_SIZE, 16)) 
!= NULL )
+switch ( config->tpm_version )
  {
-tcpa->lasa = ctxt->mem_ops.v2p(ctxt, lasa);
-tcpa->laml = ACPI_2_0_TCPA_LAML_SIZE;
-memset(lasa, 0, tcpa->laml);
-set_checksum(tcpa,
- offsetof(struct acpi_header, checksum),
- tcpa->header.length);
+case 0: /* Assume legacy code wanted tpm 1.2 */

Along the lines of what Jason said: Unless this is known to be needed for
anything, I'd prefer if it was omitted.


I'm not awair of anything, but your comment 2 lines down  from version 2 
made me think you knew of some.  So if your happy with me removing this 
line, I am!




Jan

[PATCH v1] tools: drop bogus and obsolete ptyfuncs.m4

2023-05-02 Thread Olaf Hering

According to openpty(3) it is required to include  to get the
prototypes for openpty() and login_tty(). But this is not what the
function AX_CHECK_PTYFUNCS actually does. It makes no attempt to include
the required header.

The two source files which call openpty() and login_tty() already contain
the conditionals to include the required header.

Remove the bogus m4 file to fix build with clang, which complains about
calls to undeclared functions.

Signed-off-by: Olaf Hering 
---
 m4/ptyfuncs.m4 | 35 ---
 tools/configure.ac |  1 -
 2 files changed, 36 deletions(-)
 delete mode 100644 m4/ptyfuncs.m4

diff --git a/m4/ptyfuncs.m4 b/m4/ptyfuncs.m4
deleted file mode 100644
index 3e37b5a23c..00
--- a/m4/ptyfuncs.m4
+++ /dev/null
@@ -1,35 +0,0 @@
-AC_DEFUN([AX_CHECK_PTYFUNCS], [
-dnl This is a workaround for a bug in Debian package
-dnl libbsd-dev-0.3.0-1. Once we no longer support that
-dnl package we can remove the addition of -Werror to
-dnl CPPFLAGS.
-AX_SAVEVAR_SAVE(CPPFLAGS)
-CPPFLAGS="$CPPFLAGS -Werror"
-AC_CHECK_HEADER([libutil.h],[
-  AC_DEFINE([INCLUDE_LIBUTIL_H],[],[libutil header file name])
-])
-AX_SAVEVAR_RESTORE(CPPFLAGS)
-AC_CACHE_CHECK([for openpty et al], [ax_cv_ptyfuncs_libs], [
-for ax_cv_ptyfuncs_libs in -lutil "" NOT_FOUND; do
-if test "x$ax_cv_ptyfuncs_libs" = "xNOT_FOUND"; then
-AC_MSG_FAILURE([Unable to find library for openpty and 
login_tty])
-fi
-AX_SAVEVAR_SAVE(LIBS)
-LIBS="$LIBS $ax_cv_ptyfuncs_libs"
-AC_LINK_IFELSE([AC_LANG_SOURCE([
-#ifdef INCLUDE_LIBUTIL_H
-#include INCLUDE_LIBUTIL_H
-#endif
-int main(void) {
-  openpty(0,0,0,0,0);
-  login_tty(0);
-}
-])],[
-break
-],[])
-AX_SAVEVAR_RESTORE(LIBS)
-done
-])
-PTYFUNCS_LIBS="$ax_cv_ptyfuncs_libs"
-AC_SUBST(PTYFUNCS_LIBS)
-])
diff --git a/tools/configure.ac b/tools/configure.ac
index 9bcf42f233..c94257f751 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -70,7 +70,6 @@ m4_include([../m4/uuid.m4])
 m4_include([../m4/pkg.m4])
 m4_include([../m4/curses.m4])
 m4_include([../m4/pthread.m4])
-m4_include([../m4/ptyfuncs.m4])
 m4_include([../m4/extfs.m4])
 m4_include([../m4/fetcher.m4])
 m4_include([../m4/ax_compare_version.m4])

Re: [PATCH v2] ns16550: enable memory decoding on MMIO-based PCI console card

2023-05-02 Thread Marek Marczykowski-Górecki

On Tue, May 02, 2023 at 12:53:15PM +0200, Jan Beulich wrote:
> On 25.04.2023 16:39, Marek Marczykowski-Górecki wrote:
> > pci_serial_early_init() enables PCI_COMMAND_IO for IO-based UART
> > devices, add setting PCI_COMMAND_MEMORY for MMIO-based UART devices too.
> 
> This sentence is odd, as by its grammar it looks to describe the current
> situation only. The respective sentence in v1 did not have this issue.
> 
> > --- a/xen/drivers/char/ns16550.c
> > +++ b/xen/drivers/char/ns16550.c
> > @@ -272,7 +272,15 @@ static int cf_check ns16550_getc(struct serial_port 
> > *port, char *pc)
> >  static void pci_serial_early_init(struct ns16550 *uart)
> >  {
> >  #ifdef NS16550_PCI
> > -if ( !uart->ps_bdf_enable || uart->io_base >= 0x1 )
> > +if ( uart->bar )
> > +{
> > +pci_conf_write16(PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
> > +  uart->ps_bdf[2]),
> > + PCI_COMMAND, PCI_COMMAND_MEMORY);
> > +return;
> > +}
> > +
> > +if ( !uart->ps_bdf_enable )
> >  return;
> >  
> >  if ( uart->pb_bdf_enable )
> 
> While I did suggest using uart->bar, my implication was that the io_base
> check would then remain in place. Otherwise, if I'm not mistaken, MMIO-
> based devices not specified via "com=...,pci" would then wrongly take
> the I/O port path.

I don't think MMIO-based devices specified manually have great chance to
work anyway (see the commit message), but indeed I shouldn't have broken
them even more.

> Furthermore - you can't use uart->bar alone here, can you? The field is
> set equally for MMIO and port based cards in pci_uart_config().

Right, I'll restore the io_base check.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

[ovmf test] 180508: all pass - PUSHED

2023-05-02 Thread osstest service owner

flight 180508 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180508/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf d6b42ed7ed1b0c4584097f0d76798cff74c96379
baseline version:
 ovmf 23c71536efbebed57942947668f470f934324477

Last test of basis   180502  2023-05-02 07:42:13 Z0 days
Testing same since   180508  2023-05-02 16:10:54 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   23c71536ef..d6b42ed7ed  d6b42ed7ed1b0c4584097f0d76798cff74c96379 -> 
xen-tested-master

[PATCH v1] automation: provide example for downloading an existing container

2023-05-02 Thread Olaf Hering

Signed-off-by: Olaf Hering 
---
 automation/build/README.md | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/automation/build/README.md b/automation/build/README.md
index 2d07cafe0e..8ad89a259a 100644
--- a/automation/build/README.md
+++ b/automation/build/README.md
@@ -12,6 +12,12 @@ can be pulled with Docker from the following path:
 docker pull registry.gitlab.com/xen-project/xen/DISTRO:VERSION
 ```
 
+This example shows how to pull the existing container for Tumbleweed:
+
+```
+docker pull registry.gitlab.com/xen-project/xen/suse:opensuse-tumbleweed
+```
+
 To see the list of available containers run `make` in this
 directory. You will have to replace the `/` with a `:` to use
 them.

Re: [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain()

2023-05-02 Thread Stefan Hajnoczi

On Tue, May 02, 2023 at 06:21:20PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > For simplicity, always run BlockDevOps .drained_begin/end/poll()
> > callbacks in the main loop thread. This makes it easier to implement the
> > callbacks and avoids extra locks.
> > 
> > Move the function pointer declarations from the I/O Code section to the
> > Global State section in block-backend-common.h.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> If we're updating function pointers, we should probably update them in
> BdrvChildClass and BlockDriver, too.

I'll do that in the next revision.

> This means that a non-coroutine caller can't run in an iothread, not
> even the home iothread of the BlockDriverState. (I'm not sure if it was
> allowed previously. I don't think we're actually doing this, but in
> theory it could have worked.) Maybe put a GLOBAL_STATE_CODE() after
> handling the bdrv_co_yield_to_drain() case? Or would that look too odd?
> 
> IO_OR_GS_CODE();
> 
> if (qemu_in_coroutine()) {
> bdrv_co_yield_to_drain(bs, true, parent, poll);
> return;
> }
> 
> GLOBAL_STATE_CODE();

That looks good to me, it makes explicit that IO_OR_GS_CODE() only
applies until the end of the if statement.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server

2023-05-02 Thread Stefan Hajnoczi

On Tue, May 02, 2023 at 06:04:24PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > vhost-user activity must be suspended during bdrv_drained_begin/end().
> > This prevents new requests from interfering with whatever is happening
> > in the drained section.
> > 
> > Previously this was done using aio_set_fd_handler()'s is_external
> > argument. In a multi-queue block layer world the aio_disable_external()
> > API cannot be used since multiple AioContext may be processing I/O, not
> > just one.
> > 
> > Switch to BlockDevOps->drained_begin/end() callbacks.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  block/export/vhost-user-blk-server.c | 43 ++--
> >  util/vhost-user-server.c | 10 +++
> >  2 files changed, 26 insertions(+), 27 deletions(-)
> > 
> > diff --git a/block/export/vhost-user-blk-server.c 
> > b/block/export/vhost-user-blk-server.c
> > index 092b86aae4..d20f69cd74 100644
> > --- a/block/export/vhost-user-blk-server.c
> > +++ b/block/export/vhost-user-blk-server.c
> > @@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
> >  .process_msg   = vu_blk_process_msg,
> >  };
> >  
> > -static void blk_aio_attached(AioContext *ctx, void *opaque)
> > -{
> > -VuBlkExport *vexp = opaque;
> > -
> > -vexp->export.ctx = ctx;
> > -vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
> > -}
> > -
> > -static void blk_aio_detach(void *opaque)
> > -{
> > -VuBlkExport *vexp = opaque;
> > -
> > -vhost_user_server_detach_aio_context(&vexp->vu_server);
> > -vexp->export.ctx = NULL;
> > -}
> 
> So for changing the AioContext, we now rely on the fact that the node to
> be changed is always drained, so the drain callbacks implicitly cover
> this case, too?

Yes.

> >  static void
> >  vu_blk_initialize_config(BlockDriverState *bs,
> >   struct virtio_blk_config *config,
> > @@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
> >  vu_config_change_msg(&vexp->vu_server.vu_dev);
> >  }
> >  
> > +/* Called with vexp->export.ctx acquired */
> > +static void vu_blk_drained_begin(void *opaque)
> > +{
> > +VuBlkExport *vexp = opaque;
> > +
> > +vhost_user_server_detach_aio_context(&vexp->vu_server);
> > +}
> 
> Compared to the old code, we're losing the vexp->export.ctx = NULL. This
> is correct at this point because after drained_begin we still keep
> processing requests until we arrive at a quiescent state.
> 
> However, if we detach the AioContext because we're deleting the
> iothread, won't we end up with a dangling pointer in vexp->export.ctx?
> Or can we be certain that nothing interesting happens before drained_end
> updates it with a new valid pointer again?

If you want I can add the detach() callback back again and set ctx to
NULL there?

Stefan


signature.asc
Description: PGP signature

[PATCH v1] automation: remove python2 from opensuse images

2023-05-02 Thread Olaf Hering

The upcoming Leap 15.5 will come without a binary named 'python'.
Prepare the suse images for that change.

Starting with Xen 4.14 python3 can be used for build.

Signed-off-by: Olaf Hering 
---
 automation/build/suse/opensuse-leap.dockerfile   | 2 --
 automation/build/suse/opensuse-tumbleweed.dockerfile | 1 -
 2 files changed, 3 deletions(-)

diff --git a/automation/build/suse/opensuse-leap.dockerfile 
b/automation/build/suse/opensuse-leap.dockerfile
index c7973dd6ab..79de83ac20 100644
--- a/automation/build/suse/opensuse-leap.dockerfile
+++ b/automation/build/suse/opensuse-leap.dockerfile
@@ -58,8 +58,6 @@ RUN zypper install -y --no-recommends \
 'pkgconfig(libpci)' \
 'pkgconfig(sdl)' \
 'pkgconfig(sdl2)' \
-python \
-python-devel \
 python3-devel \
 systemd-devel \
 tar \
diff --git a/automation/build/suse/opensuse-tumbleweed.dockerfile 
b/automation/build/suse/opensuse-tumbleweed.dockerfile
index 7e5f22acef..abb25c8c84 100644
--- a/automation/build/suse/opensuse-tumbleweed.dockerfile
+++ b/automation/build/suse/opensuse-tumbleweed.dockerfile
@@ -61,7 +61,6 @@ RUN zypper install -y --no-recommends \
 'pkgconfig(libpci)' \
 'pkgconfig(sdl)' \
 'pkgconfig(sdl2)' \
-python-devel \
 python3-devel \
 systemd-devel \
 tar \

Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug

2023-05-02 Thread Stefan Hajnoczi

On Tue, May 02, 2023 at 03:19:52PM +0200, Kevin Wolf wrote:
> Am 01.05.2023 um 17:09 hat Stefan Hajnoczi geschrieben:
> > On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> > > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > > This patch is part of an effort to remove the aio_disable_external()
> > > > API because it does not fit in a multi-queue block layer world where
> > > > many AioContexts may be submitting requests to the same disk.
> > > > 
> > > > The SCSI emulation code is already in good shape to stop using
> > > > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > > > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > > > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > > > driver is submitting I/O.
> > > > 
> > > > Ensure virtio_scsi_hotunplug() is safe as follows:
> > > > 
> > > > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> > > >device_set_realized() calls qatomic_set(&dev->realized, false) so
> > > >that future scsi_device_get() calls return NULL because they exclude
> > > >SCSIDevices with realized=false.
> > > > 
> > > >That means virtio-scsi will reject new I/O requests to this
> > > >SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> > > >virtio_scsi_hotunplug() is still executing. We are protected against
> > > >new requests!
> > > > 
> > > > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> > > >that in-flight requests are cancelled synchronously. This ensures
> > > >that no in-flight requests remain once qdev_simple_device_unplug_cb()
> > > >returns.
> > > > 
> > > > Thanks to these two conditions we don't need aio_disable_external()
> > > > anymore.
> > > > 
> > > > Cc: Zhengui Li 
> > > > Reviewed-by: Paolo Bonzini 
> > > > Reviewed-by: Daniil Tatianin 
> > > > Signed-off-by: Stefan Hajnoczi 
> > > 
> > > qemu-iotests 040 starts failing for me after this patch, with what looks
> > > like a use-after-free error of some kind.
> > > 
> > > (gdb) bt
> > > #0  0x55b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at 
> > > ../job.c:238
> > > #1  0x55b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at 
> > > ../blockjob.c:41
> > > #2  0x55b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at 
> > > ../blockjob.c:54
> > > #3  0x55b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at 
> > > ../blockdev.c:157
> > > #4  0x55b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at 
> > > ../hw/scsi/scsi-bus.c:303
> > > #5  0x55b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, 
> > > value=false, errp=0x55b6e497c918 ) at ../hw/core/qdev.c:599
> > > #6  0x55b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, 
> > > v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, 
> > > errp=0x55b6e497c918 )
> > > at ../qom/object.c:2285
> > > #7  0x55b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, 
> > > name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 
> > > ) at ../qom/object.c:1420
> > > #8  0x55b6e3dbd84a in object_property_set_qobject 
> > > (obj=0x55b6e7c04d40, name=0x55b6e41bd6d8 "realized", 
> > > value=0x55b6e74c1890, errp=0x55b6e497c918 )
> > > at ../qom/qom-qobject.c:28
> > > #9  0x55b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, 
> > > name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 
> > > ) at ../qom/object.c:1489
> > > #10 0x55b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at 
> > > ../hw/core/qdev.c:306
> > > #11 0x55b6e3db509d in qdev_simple_device_unplug_cb 
> > > (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at 
> > > ../hw/core/qdev-hotplug.c:72
> > > #12 0x55b6e3c520f9 in virtio_scsi_hotunplug 
> > > (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at 
> > > ../hw/scsi/virtio-scsi.c:1065
> > > #13 0x55b6e3db4dec in hotplug_handler_unplug 
> > > (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, 
> > > errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> > > #14 0x55b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, 
> > > errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> > > #15 0x55b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", 
> > > errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> > > #16 0x55b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, 
> > > ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> > > #17 0x55b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at 
> > > ../qapi/qmp-dispatch.c:128
> > > #18 0x55b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at 
> > > ../util/async.c:155
> > > #19 0x55b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at 
> > > ../util/async.c:184
> > > #20 0x55b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at 
> > > ../util/aio-posix.c:421
> > > #21 0x55b6e4009544 in aio_ctx_dispatch (sour

Re: [PATCH v6 10/12] xen/tools: add sve parameter in XL configuration

2023-05-02 Thread Luca Fancellu

Hi Anthony,

Thank you for your review.

> On 2 May 2023, at 18:06, Anthony PERARD  wrote:
> 
> On Mon, Apr 24, 2023 at 07:02:46AM +0100, Luca Fancellu wrote:
>> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
>> index ddc7b2a15975..1e69dac2c4fa 100644
>> --- a/tools/libs/light/libxl_arm.c
>> +++ b/tools/libs/light/libxl_arm.c
>> @@ -211,6 +213,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>> return ERROR_FAIL;
>> }
>> 
>> +/* Parameter is sanitised in libxl__arch_domain_build_info_setdefault */
>> +if (d_config->b_info.arch_arm.sve_vl) {
>> +/* Vector length is divided by 128 in struct 
>> xen_domctl_createdomain */
>> +config->arch.sve_vl = d_config->b_info.arch_arm.sve_vl / 128U;
>> +}
>> +
>> return 0;
>> }
>> 
>> @@ -1681,6 +1689,26 @@ int 
>> libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
>> /* ACPI is disabled by default */
>> libxl_defbool_setdefault(&b_info->acpi, false);
>> 
>> +/* Sanitise SVE parameter */
>> +if (b_info->arch_arm.sve_vl) {
>> +unsigned int max_sve_vl =
>> +arch_capabilities_arm_sve(physinfo->arch_capabilities);
>> +
>> +if (!max_sve_vl) {
>> +LOG(ERROR, "SVE is unsupported on this machine.");
>> +return ERROR_FAIL;
>> +}
>> +
>> +if (LIBXL_SVE_TYPE_HW == b_info->arch_arm.sve_vl) {
>> +b_info->arch_arm.sve_vl = max_sve_vl;
>> +} else if (b_info->arch_arm.sve_vl > max_sve_vl) {
>> +LOG(ERROR,
>> +"Invalid sve value: %d. Platform supports up to %u bits",
>> +b_info->arch_arm.sve_vl, max_sve_vl);
>> +return ERROR_FAIL;
>> +}
> 
> You still need to check that sve_vl is one of the value from the enum,
> or that the value is divisible by 128.

I have probably missed something, I thought that using the way below to
specify the input I had for free that the value is 0 or divisible by 128, is it
not the case? Who can write to b_info->arch_arm.sve_vl different value
from the enum we specified in the .idl?

> 
>> +}
>> +
>> if (b_info->type != LIBXL_DOMAIN_TYPE_PV)
>> return 0;
>> 
>> diff --git a/tools/libs/light/libxl_types.idl 
>> b/tools/libs/light/libxl_types.idl
>> index fd31dacf7d5a..9e48bb772646 100644
>> --- a/tools/libs/light/libxl_types.idl
>> +++ b/tools/libs/light/libxl_types.idl
>> @@ -523,6 +523,27 @@ libxl_tee_type = Enumeration("tee_type", [
>> (1, "optee")
>> ], init_val = "LIBXL_TEE_TYPE_NONE")
>> 
>> +libxl_sve_type = Enumeration("sve_type", [
>> +(-1, "hw"),
>> +(0, "disabled"),
>> +(128, "128"),
>> +(256, "256"),
>> +(384, "384"),
>> +(512, "512"),
>> +(640, "640"),
>> +(768, "768"),
>> +(896, "896"),
>> +(1024, "1024"),
>> +(1152, "1152"),
>> +(1280, "1280"),
>> +(1408, "1408"),
>> +(1536, "1536"),
>> +(1664, "1664"),
>> +(1792, "1792"),
>> +(1920, "1920"),
>> +(2048, "2048")
>> +], init_val = "LIBXL_SVE_TYPE_DISABLED")
> 
> I'm not sure if I like that or not. Is there a reason to stop at 2048?
> It is possible that there will be more value available in the future?

Uhm... possibly there might be some extension, I thought that when it will
be the case, the only thing to do was to add another entry, I used this way
also to have for free the checks on the %128 and maximum 2048.

> 
> Also this mean that users of libxl (like libvirt) would be supposed to
> use LIBXL_SVE_TYPE_1024 for e.g., or use libxl_sve_type_from_string().
> 
> Also, it feels weird to me to mostly use numerical value of the enum
> rather than the enum itself.
> 
> Anyway, hopefully that enum will work fine.
> 
>> libxl_rdm_reserve = Struct("rdm_reserve", [
>> ("strategy",libxl_rdm_reserve_strategy),
>> ("policy",  libxl_rdm_reserve_policy),
> 
> Thanks,
> 
> -- 
> Anthony PERARD

Re: [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining

2023-05-02 Thread Stefan Hajnoczi

On Tue, May 02, 2023 at 05:42:51PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > Each vhost-user-blk request runs in a coroutine. When the BlockBackend
> > enters a drained section we need to enter a quiescent state. Currently
> > any in-flight requests race with bdrv_drained_begin() because it is
> > unaware of vhost-user-blk requests.
> > 
> > When blk_co_preadv/pwritev()/etc returns it wakes the
> > bdrv_drained_begin() thread but vhost-user-blk request processing has
> > not yet finished. The request coroutine continues executing while the
> > main loop thread thinks it is in a drained section.
> > 
> > One example where this is unsafe is for blk_set_aio_context() where
> > bdrv_drained_begin() is called before .aio_context_detached() and
> > .aio_context_attach(). If request coroutines are still running after
> > bdrv_drained_begin(), then the AioContext could change underneath them
> > and they race with new requests processed in the new AioContext. This
> > could lead to virtqueue corruption, for example.
> > 
> > (This example is theoretical, I came across this while reading the
> > code and have not tried to reproduce it.)
> > 
> > It's easy to make bdrv_drained_begin() wait for in-flight requests: add
> > a .drained_poll() callback that checks the VuServer's in-flight counter.
> > VuServer just needs an API that returns true when there are requests in
> > flight. The in-flight counter needs to be atomic.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/qemu/vhost-user-server.h |  4 +++-
> >  block/export/vhost-user-blk-server.c | 16 
> >  util/vhost-user-server.c | 14 ++
> >  3 files changed, 29 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/qemu/vhost-user-server.h 
> > b/include/qemu/vhost-user-server.h
> > index bc0ac9ddb6..b1c1cda886 100644
> > --- a/include/qemu/vhost-user-server.h
> > +++ b/include/qemu/vhost-user-server.h
> > @@ -40,8 +40,9 @@ typedef struct {
> >  int max_queues;
> >  const VuDevIface *vu_iface;
> >  
> > +unsigned int in_flight; /* atomic */
> > +
> >  /* Protected by ctx lock */
> > -unsigned int in_flight;
> >  bool wait_idle;
> >  VuDev vu_dev;
> >  QIOChannel *ioc; /* The I/O channel with the client */
> > @@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
> >  
> >  void vhost_user_server_inc_in_flight(VuServer *server);
> >  void vhost_user_server_dec_in_flight(VuServer *server);
> > +bool vhost_user_server_has_in_flight(VuServer *server);
> >  
> >  void vhost_user_server_attach_aio_context(VuServer *server, AioContext 
> > *ctx);
> >  void vhost_user_server_detach_aio_context(VuServer *server);
> > diff --git a/block/export/vhost-user-blk-server.c 
> > b/block/export/vhost-user-blk-server.c
> > index 841acb36e3..092b86aae4 100644
> > --- a/block/export/vhost-user-blk-server.c
> > +++ b/block/export/vhost-user-blk-server.c
> > @@ -272,7 +272,20 @@ static void vu_blk_exp_resize(void *opaque)
> >  vu_config_change_msg(&vexp->vu_server.vu_dev);
> >  }
> >  
> > +/*
> > + * Ensures that bdrv_drained_begin() waits until in-flight requests 
> > complete.
> > + *
> > + * Called with vexp->export.ctx acquired.
> > + */
> > +static bool vu_blk_drained_poll(void *opaque)
> > +{
> > +VuBlkExport *vexp = opaque;
> > +
> > +return vhost_user_server_has_in_flight(&vexp->vu_server);
> > +}
> > +
> >  static const BlockDevOps vu_blk_dev_ops = {
> > +.drained_poll  = vu_blk_drained_poll,
> >  .resize_cb = vu_blk_exp_resize,
> >  };
> 
> You're adding a new function pointer to an existing BlockDevOps...
> 
> > @@ -314,6 +327,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
> > BlockExportOptions *opts,
> >  vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
> >   logical_block_size, num_queues);
> >  
> > +blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
> >  blk_add_aio_context_notifier(exp->blk, blk_aio_attached, 
> > blk_aio_detach,
> >   vexp);
> >  
> >  blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
> 
> ..but still add a second blk_set_dev_ops(). Maybe a bad merge conflict
> resolution with commit ca858a5fe94?

Thanks, I probably didn't have ca858a5fe94 in my tree when writing this
code.

> > @@ -323,6 +337,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
> > BlockExportOptions *opts,
> >   num_queues, &vu_blk_iface, errp)) {
> >  blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
> >  blk_aio_detach, vexp);
> > +blk_set_dev_ops(exp->blk, NULL, NULL);
> >  g_free(vexp->handler.serial);
> >  return -EADDRNOTAVAIL;
> >  }
> > @@ -336,6 +351,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
> >  
> >  blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, 
> > blk_aio_detach,
> >

Re: [PATCH v4 10/13] tools/xenstore: switch transaction accounting to generic accounting

2023-05-02 Thread Julien Grall


Hi,

On 05/04/2023 08:03, Juergen Gross wrote:

As transaction accounting is active for unprivileged domains only, it
can easily be added to the generic per-domain accounting.

Signed-off-by: Juergen Gross 
---
  tools/xenstore/xenstored_core.c|  3 +--
  tools/xenstore/xenstored_core.h|  1 -
  tools/xenstore/xenstored_domain.c  | 21 ++---
  tools/xenstore/xenstored_domain.h  |  4 
  tools/xenstore/xenstored_transaction.c | 12 +---
  5 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 2d481fcad9..88c569b7d5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2083,7 +2083,7 @@ static void consider_message(struct connection *conn)
 * stalled. This will ignore new requests until Live-Update happened
 * or it was aborted.
 */
-   if (lu_is_pending() && conn->transaction_started == 0 &&
+   if (lu_is_pending() && conn->ta_start_time == 0 &&


NIT: I know there are some places in the code checking for 
conn->ta_start_time == 0. But it feels like a better replacement to 
"conn->transaction_started" is "list_empty(...)".


I agree this is going to be more expensive. But you are switching the 
transaction accounting to a generic infrastructure which is pretty heavy 
compare to a simple addition/substraction. So I think a "list_empty()" 
would be OK here.



conn->in->hdr.msg.type == XS_TRANSACTION_START) {
trace("Delaying transaction start for connection %p req_id 
%u\n",
  conn, conn->in->hdr.msg.req_id);
@@ -2190,7 +2190,6 @@ struct connection *new_connection(const struct 
interface_funcs *funcs)
new->funcs = funcs;
new->is_ignored = false;
new->is_stalled = false;
-   new->transaction_started = 0;
INIT_LIST_HEAD(&new->out_list);
INIT_LIST_HEAD(&new->acc_list);
INIT_LIST_HEAD(&new->ref_list);
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 5a11dc1231..3564d85d7d 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -151,7 +151,6 @@ struct connection
/* List of in-progress transactions. */
struct list_head transaction_list;
uint32_t next_transaction_id;
-   unsigned int transaction_started;
time_t ta_start_time;
  
  	/* List of delayed requests. */

diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c
index 1caa60bb14..40bcc1dbfa 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -419,12 +419,10 @@ int domain_get_quota(const void *ctx, struct connection 
*conn,
  {
struct domain *d = find_domain_struct(domid);
char *resp;
-   int ta;
  
  	if (!d)

return ENOENT;
  
-	ta = d->conn ? d->conn->transaction_started : 0;

resp = talloc_asprintf(ctx, "Domain %u:\n", domid);
if (!resp)
return ENOMEM;
@@ -435,7 +433,7 @@ int domain_get_quota(const void *ctx, struct connection 
*conn,
  
  	ent(nodes, d->acc[ACC_NODES]);

ent(watches, d->acc[ACC_WATCH]);
-   ent(transactions, ta);
+   ent(transactions, d->acc[ACC_TRANS]);
ent(outstanding, d->acc[ACC_OUTST]);
ent(memory, d->acc[ACC_MEM]);
  
@@ -1297,6 +1295,23 @@ void domain_outstanding_dec(struct connection *conn, unsigned int domid)

domain_acc_add(conn, domid, ACC_OUTST, -1, true);
  }
  
+void domain_transaction_inc(struct connection *conn)

+{
+   domain_acc_add(conn, conn->id, ACC_TRANS, 1, true);
+}
+
+void domain_transaction_dec(struct connection *conn)
+{
+   domain_acc_add(conn, conn->id, ACC_TRANS, -1, true);
+}
+
+unsigned int domain_transaction_get(struct connection *conn)
+{
+   return (domain_is_unprivileged(conn))
+   ? domain_acc_add(conn, conn->id, ACC_TRANS, 0, true)
+   : 0;
+}
+
  static wrl_creditt wrl_config_writecost  = WRL_FACTOR;
  static wrl_creditt wrl_config_rate   = WRL_RATE   * WRL_FACTOR;
  static wrl_creditt wrl_config_dburst = WRL_DBURST * WRL_FACTOR;
diff --git a/tools/xenstore/xenstored_domain.h 
b/tools/xenstore/xenstored_domain.h
index 0d61bf4344..abc766f343 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -31,6 +31,7 @@ enum accitem {
ACC_WATCH = ACC_TR_N,
ACC_OUTST,
ACC_MEM,
+   ACC_TRANS,
ACC_N,  /* Number of elements per domain. */
  };
  
@@ -112,6 +113,9 @@ void domain_watch_dec(struct connection *conn);

  int domain_watch(struct connection *conn);
  void domain_outstanding_inc(struct connection *conn, unsigned int domid);
  void domain_outstanding_dec(struct connection *conn, unsigned int domid);
+void domain_transaction_inc(struct connection *conn);
+void domain_transaction_dec(struct connection *conn);

Re: [PATCH v4 07/13] tools/xenstore: use accounting data array for per-domain values

2023-05-02 Thread Julien Grall


Hi Juergen,

On 05/04/2023 08:03, Juergen Gross wrote:

diff --git a/tools/xenstore/xenstored_domain.h 
b/tools/xenstore/xenstored_domain.h
index 5cfd730cf6..0d61bf4344 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -28,7 +28,10 @@ enum accitem {
ACC_NODES,
ACC_REQ_N,  /* Number of elements per request. */
ACC_TR_N = ACC_REQ_N,   /* Number of elements per transaction. */
-   ACC_N = ACC_TR_N,   /* Number of elements per domain. */
+   ACC_WATCH = ACC_TR_N,
+   ACC_OUTST,
+   ACC_MEM,
+   ACC_N,  /* Number of elements per domain. */
  };
  
  void handle_event(void);

@@ -107,9 +110,8 @@ static inline void domain_memory_add_nochk(struct 
connection *conn,
  void domain_watch_inc(struct connection *conn);
  void domain_watch_dec(struct connection *conn);
  int domain_watch(struct connection *conn);
-void domain_outstanding_inc(struct connection *conn);
-void domain_outstanding_dec(struct connection *conn);
-void domain_outstanding_domid_dec(unsigned int domid);
+void domain_outstanding_inc(struct connection *conn, unsigned int domid);


AFAICT, all the caller of domain_outstanding_inc() will pass 'conn->id'. 
So it is not entirely clear what's the benefits to add the extra parameter.


I am not against this change (and same for removing *domid_dec()). But I 
think this ought to be explained in the commit message as this feels 
unrelated.



+void domain_outstanding_dec(struct connection *conn, unsigned int domid);
  int domain_get_quota(const void *ctx, struct connection *conn,
 unsigned int domid);
  


Cheers,

--
Julien Grall

Re: [PATCH v4 06/13] tools/xenstore: add current connection to domain_memory_add() parameters

2023-05-02 Thread Julien Grall


Hi Juergen,

On 05/04/2023 08:03, Juergen Gross wrote:

In order to enable switching memory accounting to the generic array
based accounting, add the current connection to the parameters of
domain_memory_add().

This requires to add the connection to some other functions, too.

Signed-off-by: Juergen Gross 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event

2023-05-02 Thread Stefan Hajnoczi

On Tue, May 02, 2023 at 05:19:46PM +0200, Kevin Wolf wrote:
> Am 25.04.2023 um 19:26 hat Stefan Hajnoczi geschrieben:
> > Only report a transport reset event to the guest after the SCSIDevice
> > has been unrealized by qdev_simple_device_unplug_cb().
> > 
> > qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> > to false so that scsi_device_find/get() no longer see it.
> > 
> > scsi_target_emulate_report_luns() also needs to be updated to filter out
> > SCSIDevices that are unrealized.
> > 
> > These changes ensure that the guest driver does not see the SCSIDevice
> > that's being unplugged if it responds very quickly to the transport
> > reset event.
> > 
> > Reviewed-by: Paolo Bonzini 
> > Reviewed-by: Michael S. Tsirkin 
> > Reviewed-by: Daniil Tatianin 
> > Signed-off-by: Stefan Hajnoczi 
> 
> > @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
> >  virtio_scsi_release(s);
> >  }
> > +
> > +if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> > +virtio_scsi_acquire(s);
> > +virtio_scsi_push_event(s, sd,
> > +   VIRTIO_SCSI_T_TRANSPORT_RESET,
> > +   VIRTIO_SCSI_EVT_RESET_REMOVED);
> > +scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> > +virtio_scsi_release(s);
> > +}
> >  }
> 
> s, sd and s->bus are all unrealized at this point, whereas before this
> patch they were still realized. I couldn't find any practical problem
> with it, but it made me nervous enough that I thought I should comment
> on it at least.
> 
> Should we maybe have documentation on these functions that says that
> they accept unrealized objects as their parameters?

s is the VirtIOSCSI controller, not the SCSIDevice that is being
unplugged. The VirtIOSCSI controller is still realized.

s->bus is the VirtIOSCSI controller's bus, it is still realized.

You are right that the SCSIDevice (sd) has been unrealized at this
point:
- sd->conf.blk is safe because qdev properties stay alive the
  Object is deleted, but I'm not sure we should rely on that.
- virti_scsi_push_event(.., sd, ...) is questionable because the LUN
  that's fetched from sd no longer belongs to the unplugged SCSIDevice.

How about I change the code to fetch sd->conf.blk and the LUN before
unplugging?

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v4 05/13] tools/xenstore: use accounting buffering for node accounting

2023-05-02 Thread Julien Grall


Hi Juergen,

On 05/04/2023 08:03, Juergen Gross wrote:

Add the node accounting to the accounting information buffering in
order to avoid having to undo it in case of failure.

Signed-off-by: Juergen Gross 
---
  tools/xenstore/xenstored_core.c   | 21 ++---
  tools/xenstore/xenstored_domain.h |  4 ++--
  2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 84335f5f3d..92a40ccf3f 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1452,7 +1452,6 @@ static void destroy_node_rm(struct connection *conn, 
struct node *node)
  static int destroy_node(struct connection *conn, struct node *node)
  {
destroy_node_rm(conn, node);
-   domain_nbentry_dec(conn, get_node_owner(node));
  
  	/*

 * It is not possible to easily revert the changes in a transaction.
@@ -1797,27 +1796,11 @@ static int do_set_perms(const void *ctx, struct 
connection *conn,
old_perms = node->perms;
domain_nbentry_dec(conn, get_node_owner(node));


IIRC, we originally said that domain_nbentry_dec() could never fail in a 
non-transaction case. But with your current rework, the function can now 
fail because of an allocation failure.


Therefore, shouldn't we now check the error? (Possibly in a patch 
beforehand).



node->perms = perms;
-   if (domain_nbentry_inc(conn, get_node_owner(node))) {
-   node->perms = old_perms;
-   /*
-* This should never fail because we had a reference on the
-* domain before and Xenstored is single-threaded.
-*/
-   domain_nbentry_inc(conn, get_node_owner(node));
+   if (domain_nbentry_inc(conn, get_node_owner(node)))
return ENOMEM;
-   }
  
-	if (write_node(conn, node, false)) {

-   int saved_errno = errno;
-
-   domain_nbentry_dec(conn, get_node_owner(node));
-   node->perms = old_perms;
-   /* No failure possible as above. */
-   domain_nbentry_inc(conn, get_node_owner(node));
-
-   errno = saved_errno;
+   if (write_node(conn, node, false))
return errno;
-   }
  
  	fire_watches(conn, ctx, name, node, false, &old_perms);

send_ack(conn, XS_SET_PERMS);
diff --git a/tools/xenstore/xenstored_domain.h 
b/tools/xenstore/xenstored_domain.h
index 6355ad4f37..e669f57b80 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -25,9 +25,9 @@
   * a per transaction array.
   */
  enum accitem {
+   ACC_NODES,
ACC_REQ_N,  /* Number of elements per request. */
-   ACC_NODES = ACC_REQ_N,
-   ACC_TR_N,   /* Number of elements per transaction. */
+   ACC_TR_N = ACC_REQ_N,   /* Number of elements per transaction. */
ACC_N = ACC_TR_N,   /* Number of elements per domain. */
  };
  


Cheers,

--
Julien Grall

[linux-linus test] 180504: regressions - FAIL

2023-05-02 Thread osstest service owner

flight 180504 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180504/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 180278

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-credit2  12 debian-install fail pass in 180500
 test-amd64-amd64-xl-vhd  21 guest-start/debian.repeat  fail pass in 180500

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-credit2 15 migrate-support-check fail in 180500 never pass
 test-arm64-arm64-xl-credit2 16 saverestore-support-check fail in 180500 never 
pass
 test-armhf-armhf-xl-credit2   8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt  8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt-raw  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-vhd   8 xen-boot fail  like 180278
 test-armhf-armhf-examine  8 reboot   fail  like 180278
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-xl   8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-bootfail like 180278
 test-armhf-armhf-xl-multivcpu  8 xen-boot fail like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux865fdb08197e657c59e74a35fa32362b12397f58
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z   15 days
Failing since180281  2023-04-17 06:24:36 Z   15 days   27 attempts
Testing same since   180500  2023-05-02 01:41:05 Z0 days2 attempts


2156 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops

Re: [PULL 05/27] hw/xen: Watches on XenStore transactions

2023-05-02 Thread Peter Maydell

On Tue, 7 Mar 2023 at 18:27, David Woodhouse  wrote:
>
> From: David Woodhouse 
>
> Firing watches on the nodes that still exist is relatively easy; just
> walk the tree and look at the nodes with refcount of one.
>
> Firing watches on *deleted* nodes is more fun. We add 'modified_in_tx'
> and 'deleted_in_tx' flags to each node. Nodes with those flags cannot
> be shared, as they will always be unique to the transaction in which
> they were created.
>
> When xs_node_walk would need to *create* a node as scaffolding and it
> encounters a deleted_in_tx node, it can resurrect it simply by clearing
> its deleted_in_tx flag. If that node originally had any *data*, they're
> gone, and the modified_in_tx flag will have been set when it was first
> deleted.
>
> We then attempt to send appropriate watches when the transaction is
> committed, properly delete the deleted_in_tx nodes, and remove the
> modified_in_tx flag from the others.
>
> Signed-off-by: David Woodhouse 
> Reviewed-by: Paul Durrant 

Hi; Coverity's "is there missing error handling?"
heuristic fired for a change in this code (CID 1508359):

>  static int transaction_commit(XenstoreImplState *s, XsTransaction *tx)
>  {
> +struct walk_op op;
> +XsNode **n;
> +
>  if (s->root_tx != tx->base_tx) {
>  return EAGAIN;
>  }
> @@ -720,10 +861,18 @@ static int transaction_commit(XenstoreImplState *s, 
> XsTransaction *tx)
>  s->root_tx = tx->tx_id;
>  s->nr_nodes = tx->nr_nodes;
>
> +init_walk_op(s, &op, XBT_NULL, tx->dom_id, "/", &n);

This is the only call to init_walk_op() which ignores its
return value. Intentional, or missing error handling?

> +op.deleted_in_tx = false;
> +op.mutating = true;
> +
>  /*
> - * XX: Walk the new root and fire watches on any node which has a
> + * Walk the new root and fire watches on any node which has a
>   * refcount of one (which is therefore unique to this transaction).
>   */
> +if (s->root->children) {
> +g_hash_table_foreach_remove(s->root->children, tx_commit_walk, &op);
> +}
> +
>  return 0;
>  }

thanks
-- PMM

Re: [PATCH v6 10/12] xen/tools: add sve parameter in XL configuration

2023-05-02 Thread Anthony PERARD

On Mon, Apr 24, 2023 at 07:02:46AM +0100, Luca Fancellu wrote:
> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
> index ddc7b2a15975..1e69dac2c4fa 100644
> --- a/tools/libs/light/libxl_arm.c
> +++ b/tools/libs/light/libxl_arm.c
> @@ -211,6 +213,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>  return ERROR_FAIL;
>  }
>  
> +/* Parameter is sanitised in libxl__arch_domain_build_info_setdefault */
> +if (d_config->b_info.arch_arm.sve_vl) {
> +/* Vector length is divided by 128 in struct xen_domctl_createdomain 
> */
> +config->arch.sve_vl = d_config->b_info.arch_arm.sve_vl / 128U;
> +}
> +
>  return 0;
>  }
>  
> @@ -1681,6 +1689,26 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc 
> *gc,
>  /* ACPI is disabled by default */
>  libxl_defbool_setdefault(&b_info->acpi, false);
>  
> +/* Sanitise SVE parameter */
> +if (b_info->arch_arm.sve_vl) {
> +unsigned int max_sve_vl =
> +arch_capabilities_arm_sve(physinfo->arch_capabilities);
> +
> +if (!max_sve_vl) {
> +LOG(ERROR, "SVE is unsupported on this machine.");
> +return ERROR_FAIL;
> +}
> +
> +if (LIBXL_SVE_TYPE_HW == b_info->arch_arm.sve_vl) {
> +b_info->arch_arm.sve_vl = max_sve_vl;
> +} else if (b_info->arch_arm.sve_vl > max_sve_vl) {
> +LOG(ERROR,
> +"Invalid sve value: %d. Platform supports up to %u bits",
> +b_info->arch_arm.sve_vl, max_sve_vl);
> +return ERROR_FAIL;
> +}

You still need to check that sve_vl is one of the value from the enum,
or that the value is divisible by 128.

> +}
> +
>  if (b_info->type != LIBXL_DOMAIN_TYPE_PV)
>  return 0;
>  
> diff --git a/tools/libs/light/libxl_types.idl 
> b/tools/libs/light/libxl_types.idl
> index fd31dacf7d5a..9e48bb772646 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -523,6 +523,27 @@ libxl_tee_type = Enumeration("tee_type", [
>  (1, "optee")
>  ], init_val = "LIBXL_TEE_TYPE_NONE")
>  
> +libxl_sve_type = Enumeration("sve_type", [
> +(-1, "hw"),
> +(0, "disabled"),
> +(128, "128"),
> +(256, "256"),
> +(384, "384"),
> +(512, "512"),
> +(640, "640"),
> +(768, "768"),
> +(896, "896"),
> +(1024, "1024"),
> +(1152, "1152"),
> +(1280, "1280"),
> +(1408, "1408"),
> +(1536, "1536"),
> +(1664, "1664"),
> +(1792, "1792"),
> +(1920, "1920"),
> +(2048, "2048")
> +], init_val = "LIBXL_SVE_TYPE_DISABLED")

I'm not sure if I like that or not. Is there a reason to stop at 2048?
It is possible that there will be more value available in the future?

Also this mean that users of libxl (like libvirt) would be supposed to
use LIBXL_SVE_TYPE_1024 for e.g., or use libxl_sve_type_from_string().

Also, it feels weird to me to mostly use numerical value of the enum
rather than the enum itself.

Anyway, hopefully that enum will work fine.

>  libxl_rdm_reserve = Struct("rdm_reserve", [
>  ("strategy",libxl_rdm_reserve_strategy),
>  ("policy",  libxl_rdm_reserve_policy),

Thanks,

-- 
Anthony PERARD

Re: [PATCH] xen/blkfront: Only check REQ_FUA for writes

2023-05-02 Thread Ross Lagerwall

> From: Roger Pau Monne 
> Sent: Tuesday, May 2, 2023 4:57 PM
> To: Ross Lagerwall 
> Cc: xen-devel@lists.xenproject.org ; 
> jgr...@suse.com ; sstabell...@kernel.org 
> ; oleksandr_tyshche...@epam.com 
> ; ax...@kernel.dk 
> Subject: Re: [PATCH] xen/blkfront: Only check REQ_FUA for writes 
>  
> On Wed, Apr 26, 2023 at 05:40:05PM +0100, Ross Lagerwall wrote:
> > The existing code silently converts read operations with the
> > REQ_FUA bit set into write-barrier operations. This results in data
> > loss as the backend scribbles zeroes over the data instead of returning
> > it.
> > 
> > While the REQ_FUA bit doesn't make sense on a read operation, at least
> > one well-known out-of-tree kernel module does set it and since it
> > results in data loss, let's be safe here and only look at REQ_FUA for
> > writes.
> 
> Do we know what's the intention of the out-of-tree kernel module with
> it's usage of FUA for reads?

It was just a plain bug that has now been fixed:

https://github.com/veeam/blksnap/commit/e3b3e7369642b59e01c647934789e5e20b380c62

I think this patch is still worthwile since reads becoming writes is
asking for data corruption.

> 
> Should this maybe be translated to a pair of flush cache and read
> requests?
> 
> > Signed-off-by: Ross Lagerwall 
> > ---
> >  drivers/block/xen-blkfront.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > index 23ed258b57f0..c1890c8a9f6e 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -780,7 +780,8 @@ static int blkif_queue_rw_req(struct request *req, 
> > struct blkfront_ring_info *ri
> >    ring_req->u.rw.handle = info->handle;
> >    ring_req->operation = rq_data_dir(req) ?
> >    BLKIF_OP_WRITE : BLKIF_OP_READ;
> > - if (req_op(req) == REQ_OP_FLUSH || req->cmd_flags & REQ_FUA) {
> > + if (req_op(req) == REQ_OP_FLUSH ||
> > + (req_op(req) == REQ_OP_WRITE && (req->cmd_flags & 
> > REQ_FUA))) {
> 
> Should we print some kind of warning maybe once that we have received
> a READ request with the FUA flag set, and the FUA flag will have no
> effect?
> 

I thought of adding something like this but I couldn't find any other
block layer code doing a similar check (also it seems more appropriate
in the core block layer).

WARN_ONCE(req_op(req) != REQ_OP_WRITE && (req->cmd_flags & REQ_FUA));

I can add it if the maintainers want it.

Thanks,
Ross

Re: [PATCH v4 10/20] block: drain from main loop thread in bdrv_co_yield_to_drain()

2023-05-02 Thread Kevin Wolf

Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> For simplicity, always run BlockDevOps .drained_begin/end/poll()
> callbacks in the main loop thread. This makes it easier to implement the
> callbacks and avoids extra locks.
> 
> Move the function pointer declarations from the I/O Code section to the
> Global State section in block-backend-common.h.
> 
> Signed-off-by: Stefan Hajnoczi 

If we're updating function pointers, we should probably update them in
BdrvChildClass and BlockDriver, too.

This means that a non-coroutine caller can't run in an iothread, not
even the home iothread of the BlockDriverState. (I'm not sure if it was
allowed previously. I don't think we're actually doing this, but in
theory it could have worked.) Maybe put a GLOBAL_STATE_CODE() after
handling the bdrv_co_yield_to_drain() case? Or would that look too odd?

IO_OR_GS_CODE();

if (qemu_in_coroutine()) {
bdrv_co_yield_to_drain(bs, true, parent, poll);
return;
}

GLOBAL_STATE_CODE();

Kevin

Re: [PATCH v6 09/12] tools: add physinfo arch_capabilities handling for Arm

2023-05-02 Thread Anthony PERARD

On Mon, Apr 24, 2023 at 07:02:45AM +0100, Luca Fancellu wrote:
> diff --git a/tools/include/xen-tools/arm-arch-capabilities.h 
> b/tools/include/xen-tools/arm-arch-capabilities.h
> new file mode 100644
> index ..ac44c8b14344
> --- /dev/null
> +++ b/tools/include/xen-tools/arm-arch-capabilities.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */

Do you mean GPL-2.0-only ?

GPL-2.0 is deprecated by the SPDX project.

https://spdx.org/licenses/GPL-2.0.html


Besides that, patch looks fine:
Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v4 07/20] block/export: stop using is_external in vhost-user-blk server

2023-05-02 Thread Kevin Wolf

Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> vhost-user activity must be suspended during bdrv_drained_begin/end().
> This prevents new requests from interfering with whatever is happening
> in the drained section.
> 
> Previously this was done using aio_set_fd_handler()'s is_external
> argument. In a multi-queue block layer world the aio_disable_external()
> API cannot be used since multiple AioContext may be processing I/O, not
> just one.
> 
> Switch to BlockDevOps->drained_begin/end() callbacks.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/export/vhost-user-blk-server.c | 43 ++--
>  util/vhost-user-server.c | 10 +++
>  2 files changed, 26 insertions(+), 27 deletions(-)
> 
> diff --git a/block/export/vhost-user-blk-server.c 
> b/block/export/vhost-user-blk-server.c
> index 092b86aae4..d20f69cd74 100644
> --- a/block/export/vhost-user-blk-server.c
> +++ b/block/export/vhost-user-blk-server.c
> @@ -208,22 +208,6 @@ static const VuDevIface vu_blk_iface = {
>  .process_msg   = vu_blk_process_msg,
>  };
>  
> -static void blk_aio_attached(AioContext *ctx, void *opaque)
> -{
> -VuBlkExport *vexp = opaque;
> -
> -vexp->export.ctx = ctx;
> -vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
> -}
> -
> -static void blk_aio_detach(void *opaque)
> -{
> -VuBlkExport *vexp = opaque;
> -
> -vhost_user_server_detach_aio_context(&vexp->vu_server);
> -vexp->export.ctx = NULL;
> -}

So for changing the AioContext, we now rely on the fact that the node to
be changed is always drained, so the drain callbacks implicitly cover
this case, too?

>  static void
>  vu_blk_initialize_config(BlockDriverState *bs,
>   struct virtio_blk_config *config,
> @@ -272,6 +256,25 @@ static void vu_blk_exp_resize(void *opaque)
>  vu_config_change_msg(&vexp->vu_server.vu_dev);
>  }
>  
> +/* Called with vexp->export.ctx acquired */
> +static void vu_blk_drained_begin(void *opaque)
> +{
> +VuBlkExport *vexp = opaque;
> +
> +vhost_user_server_detach_aio_context(&vexp->vu_server);
> +}

Compared to the old code, we're losing the vexp->export.ctx = NULL. This
is correct at this point because after drained_begin we still keep
processing requests until we arrive at a quiescent state.

However, if we detach the AioContext because we're deleting the
iothread, won't we end up with a dangling pointer in vexp->export.ctx?
Or can we be certain that nothing interesting happens before drained_end
updates it with a new valid pointer again?

Kevin

Re: [PATCH] xen/blkfront: Only check REQ_FUA for writes

2023-05-02 Thread Roger Pau Monné

On Wed, Apr 26, 2023 at 05:40:05PM +0100, Ross Lagerwall wrote:
> The existing code silently converts read operations with the
> REQ_FUA bit set into write-barrier operations. This results in data
> loss as the backend scribbles zeroes over the data instead of returning
> it.
> 
> While the REQ_FUA bit doesn't make sense on a read operation, at least
> one well-known out-of-tree kernel module does set it and since it
> results in data loss, let's be safe here and only look at REQ_FUA for
> writes.

Do we know what's the intention of the out-of-tree kernel module with
it's usage of FUA for reads?

Should this maybe be translated to a pair of flush cache and read
requests?

> Signed-off-by: Ross Lagerwall 
> ---
>  drivers/block/xen-blkfront.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 23ed258b57f0..c1890c8a9f6e 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -780,7 +780,8 @@ static int blkif_queue_rw_req(struct request *req, struct 
> blkfront_ring_info *ri
>   ring_req->u.rw.handle = info->handle;
>   ring_req->operation = rq_data_dir(req) ?
>   BLKIF_OP_WRITE : BLKIF_OP_READ;
> - if (req_op(req) == REQ_OP_FLUSH || req->cmd_flags & REQ_FUA) {
> + if (req_op(req) == REQ_OP_FLUSH ||
> + (req_op(req) == REQ_OP_WRITE && (req->cmd_flags & 
> REQ_FUA))) {

Should we print some kind of warning maybe once that we have received
a READ request with the FUA flag set, and the FUA flag will have no
effect?

Thanks, Roger.

Re: [PATCH v4 06/20] block/export: wait for vhost-user-blk requests when draining

2023-05-02 Thread Kevin Wolf

Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> Each vhost-user-blk request runs in a coroutine. When the BlockBackend
> enters a drained section we need to enter a quiescent state. Currently
> any in-flight requests race with bdrv_drained_begin() because it is
> unaware of vhost-user-blk requests.
> 
> When blk_co_preadv/pwritev()/etc returns it wakes the
> bdrv_drained_begin() thread but vhost-user-blk request processing has
> not yet finished. The request coroutine continues executing while the
> main loop thread thinks it is in a drained section.
> 
> One example where this is unsafe is for blk_set_aio_context() where
> bdrv_drained_begin() is called before .aio_context_detached() and
> .aio_context_attach(). If request coroutines are still running after
> bdrv_drained_begin(), then the AioContext could change underneath them
> and they race with new requests processed in the new AioContext. This
> could lead to virtqueue corruption, for example.
> 
> (This example is theoretical, I came across this while reading the
> code and have not tried to reproduce it.)
> 
> It's easy to make bdrv_drained_begin() wait for in-flight requests: add
> a .drained_poll() callback that checks the VuServer's in-flight counter.
> VuServer just needs an API that returns true when there are requests in
> flight. The in-flight counter needs to be atomic.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/qemu/vhost-user-server.h |  4 +++-
>  block/export/vhost-user-blk-server.c | 16 
>  util/vhost-user-server.c | 14 ++
>  3 files changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/include/qemu/vhost-user-server.h 
> b/include/qemu/vhost-user-server.h
> index bc0ac9ddb6..b1c1cda886 100644
> --- a/include/qemu/vhost-user-server.h
> +++ b/include/qemu/vhost-user-server.h
> @@ -40,8 +40,9 @@ typedef struct {
>  int max_queues;
>  const VuDevIface *vu_iface;
>  
> +unsigned int in_flight; /* atomic */
> +
>  /* Protected by ctx lock */
> -unsigned int in_flight;
>  bool wait_idle;
>  VuDev vu_dev;
>  QIOChannel *ioc; /* The I/O channel with the client */
> @@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
>  
>  void vhost_user_server_inc_in_flight(VuServer *server);
>  void vhost_user_server_dec_in_flight(VuServer *server);
> +bool vhost_user_server_has_in_flight(VuServer *server);
>  
>  void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
>  void vhost_user_server_detach_aio_context(VuServer *server);
> diff --git a/block/export/vhost-user-blk-server.c 
> b/block/export/vhost-user-blk-server.c
> index 841acb36e3..092b86aae4 100644
> --- a/block/export/vhost-user-blk-server.c
> +++ b/block/export/vhost-user-blk-server.c
> @@ -272,7 +272,20 @@ static void vu_blk_exp_resize(void *opaque)
>  vu_config_change_msg(&vexp->vu_server.vu_dev);
>  }
>  
> +/*
> + * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
> + *
> + * Called with vexp->export.ctx acquired.
> + */
> +static bool vu_blk_drained_poll(void *opaque)
> +{
> +VuBlkExport *vexp = opaque;
> +
> +return vhost_user_server_has_in_flight(&vexp->vu_server);
> +}
> +
>  static const BlockDevOps vu_blk_dev_ops = {
> +.drained_poll  = vu_blk_drained_poll,
>  .resize_cb = vu_blk_exp_resize,
>  };

You're adding a new function pointer to an existing BlockDevOps...

> @@ -314,6 +327,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
> BlockExportOptions *opts,
>  vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
>   logical_block_size, num_queues);
>  
> +blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
>  blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
>   vexp);
>  
>  blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);

..but still add a second blk_set_dev_ops(). Maybe a bad merge conflict
resolution with commit ca858a5fe94?

> @@ -323,6 +337,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
> BlockExportOptions *opts,
>   num_queues, &vu_blk_iface, errp)) {
>  blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
>  blk_aio_detach, vexp);
> +blk_set_dev_ops(exp->blk, NULL, NULL);
>  g_free(vexp->handler.serial);
>  return -EADDRNOTAVAIL;
>  }
> @@ -336,6 +351,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
>  
>  blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, 
> blk_aio_detach,
>  vexp);
> +blk_set_dev_ops(exp->blk, NULL, NULL);
>  g_free(vexp->handler.serial);
>  }

These two hunks are then probably already fixes for ca858a5fe94 and
should be a separate patch if so.

> diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
> index 1622f8cfb3..2e6b640050 100644
> --- a/util/vhost-user-server.c
> +++ b/

Re: [PATCH v4 03/20] virtio-scsi: avoid race between unplug and transport event

2023-05-02 Thread Kevin Wolf

Am 25.04.2023 um 19:26 hat Stefan Hajnoczi geschrieben:
> Only report a transport reset event to the guest after the SCSIDevice
> has been unrealized by qdev_simple_device_unplug_cb().
> 
> qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> to false so that scsi_device_find/get() no longer see it.
> 
> scsi_target_emulate_report_luns() also needs to be updated to filter out
> SCSIDevices that are unrealized.
> 
> These changes ensure that the guest driver does not see the SCSIDevice
> that's being unplugged if it responds very quickly to the transport
> reset event.
> 
> Reviewed-by: Paolo Bonzini 
> Reviewed-by: Michael S. Tsirkin 
> Reviewed-by: Daniil Tatianin 
> Signed-off-by: Stefan Hajnoczi 

> @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
>  virtio_scsi_release(s);
>  }
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> +virtio_scsi_acquire(s);
> +virtio_scsi_push_event(s, sd,
> +   VIRTIO_SCSI_T_TRANSPORT_RESET,
> +   VIRTIO_SCSI_EVT_RESET_REMOVED);
> +scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +virtio_scsi_release(s);
> +}
>  }

s, sd and s->bus are all unrealized at this point, whereas before this
patch they were still realized. I couldn't find any practical problem
with it, but it made me nervous enough that I thought I should comment
on it at least.

Should we maybe have documentation on these functions that says that
they accept unrealized objects as their parameters?

Kevin

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Olaf Hering

Tue, 2 May 2023 14:41:25 +0100 Andrew Cooper :

> Does this improve things for you?

./checker: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by 
./checker)
make[2]: *** [Makefile:24: check-headers] Error 1

I think as soon as tools/ or stubdom/ is built, more issues like that will 
appear.


Olaf


pgpvsoZY3X7gN.pgp
Description: Digitale Signatur von OpenPGP

Re: [PATCH v3 2/4] tools/xendevicemodel: Introduce ..._get_ioreq_server_info_ext

2023-05-02 Thread Anthony PERARD

On Thu, Apr 06, 2023 at 08:05:04AM +0200, Juergen Gross wrote:
> On 06.04.23 05:57, Marek Marczykowski-Górecki wrote:
> > Add xendevicemodel_get_ioreq_server_info_ext() which additionally
> > returns output flags that XEN_DMOP_get_ioreq_server_info can now return.
> > Do not change signature of existing
> > xendevicemodel_get_ioreq_server_info() so existing users will not need
> > to be changed.
> > 
> > This advertises behavior change of "x86/msi: passthrough all MSI-X
> > vector ctrl writes to device model" patch.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > ---
> > v3:
> >   - new patch
> > 
> > Should there be some HAVE_* #define in the header? Does this change
> > require soname bump (I hope it doesn't...).
> 
> You need to add version 1.5 to libxendevicemodel.map which should define
> the new function.

And update MINOR in the Makefile.

-- 
Anthony PERARD

[PATCH v2 0/2] x86: init improvements

2023-05-02 Thread Roger Pau Monne

Hello,

The following series contain two minor improvements for early boot: the
first one is an alignment check when building the initial page tables,
the second is a consistency fix for the GDT used by the BSP for the
trampoline code.

Both are a result of some debugging work done on a system with broken
firmware that resulted in Xen text not being loaded at a 2Mb aligned
address.  This resulted in corrupted page tables that would manifest as
the ljmp from compatibility mode in trampoline_protmode_entry causing a
triple fault due to the GDT being located in the Xen text section, and
the page table entry for that address being corrupt because Xen was not
loaded a 2Mb boundary.

The aim of the series (specially the first patch) is not to allow
booting on such broken firmware, but to print an error message instead
of causing a triple fault.

Thanks, Roger.

Roger Pau Monne (2):
  x86/head: check base address alignment
  x86/trampoline: load the GDT located in the trampoline page

 xen/arch/x86/boot/head.S   | 14 ++
 xen/arch/x86/boot/trampoline.S |  3 +++
 2 files changed, 17 insertions(+)

-- 
2.40.0

[PATCH v2 2/2] x86/trampoline: load the GDT located in the trampoline page

2023-05-02 Thread Roger Pau Monne

When booting the BSP the portion of the code executed from the
trampoline page will be using the GDT located in the hypervisor
.text.head section rather than the GDT located in the relocated
trampoline page.

If skip_realmode is not set the GDT located in the trampoline page
will be loaded after having executed the BIOS call, otherwise the GDT
from .text.head will be used for all the protected mode trampoline
code execution.

Note that both gdt_boot_descr and gdt_48 contain the same entries, but
the former is located inside the hypervisor .text section, while the
later lives in the relocated trampoline page.

This is not harmful as-is, as both GDTs contain the same entries, but
for consistency with the APs switch the BSP trampoline code to also
use the GDT on the relocated trampoline page.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Andrew Cooper 
---
Changes since v1:
 - Reword comment.
---
 xen/arch/x86/boot/trampoline.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
index cdecf949b410..c6005fa33d1f 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -164,6 +164,9 @@ GLOBAL(trampoline_cpu_started)
 
 .code32
 trampoline_boot_cpu_entry:
+/* Switch to relocated trampoline GDT. */
+lgdtbootsym_rel(gdt_48, 4)
+
 cmpb$0,bootsym_rel(skip_realmode,5)
 jnz .Lskip_realmode
 
-- 
2.40.0

[PATCH v2 1/2] x86/head: check base address alignment

2023-05-02 Thread Roger Pau Monne

Ensure that the base address is 2M aligned, or else the page table
entries created would be corrupt as reserved bits on the PDE end up
set.

We have encountered a broken firmware where grub2 would end up loading
Xen at a non 2M aligned region when using the multiboot2 protocol, and
that caused a very difficult to debug triple fault.

If the alignment is not as required by the page tables print an error
message and stop the boot.  Also add a build time check that the
calculation of symbol offsets don't break alignment of passed
addresses.

The check could be performed earlier, but so far the alignment is
required by the page tables, and hence feels more natural that the
check lives near to the piece of code that requires it.

Note that when booted as an EFI application from the PE entry point
the alignment check is already performed by
efi_arch_load_addr_check(), and hence there's no need to add another
check at the point where page tables get built in
efi_arch_memory_setup().

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Use test instead of and instruction.
 - Add a build time check for sym_offs correctness.
 - Reword part of the commit message.
---
 xen/arch/x86/boot/head.S | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 0fb7dd3029f2..b9c9447df9df 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -1,3 +1,4 @@
+#include 
 #include 
 #include 
 #include 
@@ -121,6 +122,7 @@ multiboot2_header:
 .Lbad_ldr_nst: .asciz "ERR: EFI SystemTable is not provided by bootloader!"
 .Lbad_ldr_nih: .asciz "ERR: EFI ImageHandle is not provided by bootloader!"
 .Lbad_efi_msg: .asciz "ERR: EFI IA-32 platforms are not supported!"
+.Lbag_alg_msg: .asciz "ERR: Xen must be loaded at a 2Mb boundary!"
 
 .section .init.data, "aw", @progbits
 .align 4
@@ -146,6 +148,9 @@ bad_cpu:
 not_multiboot:
 add $sym_offs(.Lbad_ldr_msg),%esi   # Error message
 jmp .Lget_vtb
+not_aligned:
+add $sym_offs(.Lbag_alg_msg),%esi   # Error message
+jmp .Lget_vtb
 .Lmb2_no_st:
 /*
  * Here we are on EFI platform. vga_text_buffer was zapped earlier
@@ -670,6 +675,15 @@ trampoline_setup:
 cmp %edi, %eax
 jb  1b
 
+.if !IS_ALIGNED(sym_offs(0), 1 << L2_PAGETABLE_SHIFT)
+.error "Symbol offset calculation breaks alignment"
+.endif
+
+/* Check that the image base is aligned. */
+lea sym_esi(_start), %eax
+test$(1 << L2_PAGETABLE_SHIFT) - 1, %eax
+jnz not_aligned
+
 /* Map Xen into the higher mappings using 2M superpages. */
 lea _PAGE_PSE + PAGE_HYPERVISOR_RWX + sym_esi(_start), %eax
 mov $sym_offs(_start),   %ecx   /* %eax = PTE to write ^  */
-- 
2.40.0

Re: [PATCH] xen/evtchn: Introduce new IOCTL to bind static evtchn

2023-05-02 Thread Rahul Singh

 Hi Ayan,

On 28 Apr 2023, at 2:30 pm, Ayan Kumar Halder  wrote:

Hi Rahul,

On 28/04/2023 13:36, Rahul Singh wrote:
CAUTION: This message has originated from an External Source. Please use proper 
judgment and caution when opening attachments, clicking links, or responding to 
this email.


Xen 4.17 supports the creation of static evtchns. To allow user space
application to bind static evtchns introduce new ioctl
"IOCTL_EVTCHN_BIND_STATIC". Existing IOCTL doing more than binding
that’s why we need to introduce the new IOCTL to only bind the static
event channels.

Also, static evtchns to be available for use during the lifetime of the
guest. When the application exits, __unbind_from_irq() end up
being called from release() fop because of that static evtchns are
getting closed. To avoid closing the static event channel, add the new
bool variable "is_static" in "struct irq_info" to mark the event channel
static when creating the event channel to avoid closing the static
evtchn.

Signed-off-by: Rahul Singh 
---
 drivers/xen/events/events_base.c |  7 +--
 drivers/xen/evtchn.c | 22 +-
 include/uapi/xen/evtchn.h|  9 +
 include/xen/events.h |  2 +-
 4 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c7715f8bd452..31f2d3634ad5 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -112,6 +112,7 @@ struct irq_info {
unsigned int irq_epoch; /* If eoi_cpu valid: irq_epoch of event */
u64 eoi_time;   /* Time in jiffies when to EOI. */
raw_spinlock_t lock;
+   u8 is_static;   /* Is event channel static */

I think we should avoid u8/u16/u32 and instead use uint8_t/uint16_t/uint32_t.

However in this case, you can use bool.

Make sense. I will change to bool in next patch.

Regards,
Rahul

Re: [PATCH] libxl: arm: Allow grant mappings for backends running on Dom0

2023-05-02 Thread Anthony PERARD

On Thu, Mar 30, 2023 at 02:13:08PM +0530, Viresh Kumar wrote:
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index 10f37990be57..4879f136aab8 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -1616,6 +1616,10 @@ properties in the Device Tree, the type field must be 
> set to "virtio,device".
>  Specifies the transport mechanism for the Virtio device, only "mmio" is
>  supported for now.
>  
> +=item B
> +
> +Allows Xen Grant memory mapping to be done from Dom0.

I think this description is missing context. I'm not sure what it would
means "from dom0" without reading the patch. Also, it says "allows",
maybe this doesn't convey the meaning of "forced". How about something
like:

Always use grant mapping, even when the backend is run in dom0.
(grant are already used if the backend is in another domain.)

> +
>  =back
>  
>  =item B
> diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c
> index faada49e184e..e1f15344ef97 100644
> --- a/tools/libs/light/libxl_virtio.c
> +++ b/tools/libs/light/libxl_virtio.c
> @@ -48,11 +48,13 @@ static int libxl__set_xenstore_virtio(libxl__gc *gc, 
> uint32_t domid,
>  flexarray_append_pair(back, "base", GCSPRINTF("%#"PRIx64, virtio->base));
>  flexarray_append_pair(back, "type", GCSPRINTF("%s", virtio->type));
>  flexarray_append_pair(back, "transport", GCSPRINTF("%s", transport));
> +flexarray_append_pair(back, "forced_grant", GCSPRINTF("%u", 
> virtio->forced_grant));
>  
>  flexarray_append_pair(front, "irq", GCSPRINTF("%u", virtio->irq));
>  flexarray_append_pair(front, "base", GCSPRINTF("%#"PRIx64, 
> virtio->base));
>  flexarray_append_pair(front, "type", GCSPRINTF("%s", virtio->type));
>  flexarray_append_pair(front, "transport", GCSPRINTF("%s", transport));
> +flexarray_append_pair(front, "forced_grant", GCSPRINTF("%u", 
> virtio->forced_grant));

This "forced_grant" feels weird to me in the protocol, I feel like this
use of grant or not could be handled by the backend. For example in
"blkif" protocol, there's plenty of "feature-*" which allows both
front-end and back-end to advertise which feature they can or want to
use.
But maybe the fact that the device tree needs to be modified to be able
to accommodate grant mapping means that libxl needs to ask the backend to
use grant or not, and the frontend needs to now if it needs to use
grant.

>  
>  return 0;
>  }
> @@ -104,6 +106,15 @@ static int libxl__virtio_from_xenstore(libxl__gc *gc, 
> const char *libxl_path,
>  }
>  }
>  
> +tmp = NULL;
> +rc = libxl__xs_read_checked(gc, XBT_NULL,
> + GCSPRINTF("%s/forced_grant", be_path), &tmp);
> +if (rc) goto out;
> +
> +if (tmp) {
> +virtio->forced_grant = strtoul(tmp, NULL, 0);
> +}
> +
>  tmp = NULL;
>  rc = libxl__xs_read_checked(gc, XBT_NULL,
>   GCSPRINTF("%s/type", be_path), &tmp);
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index 1f6f47daf4e1..3e34da099785 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -1215,6 +1215,8 @@ static int parse_virtio_config(libxl_device_virtio 
> *virtio, char *token)
>  } else if (MATCH_OPTION("transport", token, oparg)) {
>  rc = libxl_virtio_transport_from_string(oparg, &virtio->transport);
>  if (rc) return rc;
> +} else if (MATCH_OPTION("forced_grant", token, oparg)) {
> +virtio->forced_grant = strtoul(oparg, NULL, 0);

Maybe store only !!strtoul() ?
I don't think having values other that 0 or 1 is going to be good.

>  } else {
>  fprintf(stderr, "Unknown string \"%s\" in virtio spec\n", token);
>  return -1;

Thanks,

-- 
Anthony PERARD

[PATCH] 9pfs/xen: Fix segfault on shutdown

2023-05-02 Thread Jason Andryuk

xen_9pfs_free can't use gnttabdev since it is already closed and NULL-ed
out when free is called.  Do the teardown in _disconnect().  This
matches the setup done in _connect().

trace-events are also added for the XenDevOps functions.

Signed-off-by: Jason Andryuk 
---
 hw/9pfs/trace-events |  5 +
 hw/9pfs/xen-9p-backend.c | 36 +++-
 2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/hw/9pfs/trace-events b/hw/9pfs/trace-events
index 6c77966c0b..7b5b0b5a48 100644
--- a/hw/9pfs/trace-events
+++ b/hw/9pfs/trace-events
@@ -48,3 +48,8 @@ v9fs_readlink(uint16_t tag, uint8_t id, int32_t fid) "tag %d 
id %d fid %d"
 v9fs_readlink_return(uint16_t tag, uint8_t id, char* target) "tag %d id %d 
name %s"
 v9fs_setattr(uint16_t tag, uint8_t id, int32_t fid, int32_t valid, int32_t 
mode, int32_t uid, int32_t gid, int64_t size, int64_t atime_sec, int64_t 
mtime_sec) "tag %u id %u fid %d iattr={valid %d mode %d uid %d gid %d size 
%"PRId64" atime=%"PRId64" mtime=%"PRId64" }"
 v9fs_setattr_return(uint16_t tag, uint8_t id) "tag %u id %u"
+
+xen_9pfs_alloc(char *name) "name %s"
+xen_9pfs_connect(char *name) "name %s"
+xen_9pfs_disconnect(char *name) "name %s"
+xen_9pfs_free(char *name) "name %s"
diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 0e266c552b..c646a0b3d1 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -25,6 +25,8 @@
 #include "qemu/iov.h"
 #include "fsdev/qemu-fsdev.h"
 
+#include "trace.h"
+
 #define VERSIONS "1"
 #define MAX_RINGS 8
 #define MAX_RING_ORDER 9
@@ -337,6 +339,8 @@ static void xen_9pfs_disconnect(struct XenLegacyDevice 
*xendev)
 Xen9pfsDev *xen_9pdev = container_of(xendev, Xen9pfsDev, xendev);
 int i;
 
+trace_xen_9pfs_disconnect(xendev->name);
+
 for (i = 0; i < xen_9pdev->num_rings; i++) {
 if (xen_9pdev->rings[i].evtchndev != NULL) {
 
qemu_set_fd_handler(qemu_xen_evtchn_fd(xen_9pdev->rings[i].evtchndev),
@@ -345,40 +349,42 @@ static void xen_9pfs_disconnect(struct XenLegacyDevice 
*xendev)
xen_9pdev->rings[i].local_port);
 xen_9pdev->rings[i].evtchndev = NULL;
 }
-}
-}
-
-static int xen_9pfs_free(struct XenLegacyDevice *xendev)
-{
-Xen9pfsDev *xen_9pdev = container_of(xendev, Xen9pfsDev, xendev);
-int i;
-
-if (xen_9pdev->rings[0].evtchndev != NULL) {
-xen_9pfs_disconnect(xendev);
-}
-
-for (i = 0; i < xen_9pdev->num_rings; i++) {
 if (xen_9pdev->rings[i].data != NULL) {
 xen_be_unmap_grant_refs(&xen_9pdev->xendev,
 xen_9pdev->rings[i].data,
 xen_9pdev->rings[i].intf->ref,
 (1 << xen_9pdev->rings[i].ring_order));
+xen_9pdev->rings[i].data = NULL;
 }
 if (xen_9pdev->rings[i].intf != NULL) {
 xen_be_unmap_grant_ref(&xen_9pdev->xendev,
xen_9pdev->rings[i].intf,
xen_9pdev->rings[i].ref);
+xen_9pdev->rings[i].intf = NULL;
 }
 if (xen_9pdev->rings[i].bh != NULL) {
 qemu_bh_delete(xen_9pdev->rings[i].bh);
+xen_9pdev->rings[i].bh = NULL;
 }
 }
 
 g_free(xen_9pdev->id);
+xen_9pdev->id = NULL;
 g_free(xen_9pdev->tag);
+xen_9pdev->tag = NULL;
 g_free(xen_9pdev->path);
+xen_9pdev->path = NULL;
 g_free(xen_9pdev->security_model);
+xen_9pdev->security_model = NULL;
 g_free(xen_9pdev->rings);
+xen_9pdev->rings = NULL;
+return;
+}
+
+static int xen_9pfs_free(struct XenLegacyDevice *xendev)
+{
+trace_xen_9pfs_free(xendev->name);
+
 return 0;
 }
 
@@ -390,6 +396,8 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev)
 V9fsState *s = &xen_9pdev->state;
 QemuOpts *fsdev;
 
+trace_xen_9pfs_connect(xendev->name);
+
 if (xenstore_read_fe_int(&xen_9pdev->xendev, "num-rings",
  &xen_9pdev->num_rings) == -1 ||
 xen_9pdev->num_rings > MAX_RINGS || xen_9pdev->num_rings < 1) {
@@ -499,6 +507,8 @@ out:
 
 static void xen_9pfs_alloc(struct XenLegacyDevice *xendev)
 {
+trace_xen_9pfs_alloc(xendev->name);
+
 xenstore_write_be_str(xendev, "versions", VERSIONS);
 xenstore_write_be_int(xendev, "max-rings", MAX_RINGS);
 xenstore_write_be_int(xendev, "max-ring-page-order", MAX_RING_ORDER);
-- 
2.40.1

[xen-unstable-smoke test] 180505: tolerable all pass - PUSHED

2023-05-02 Thread osstest service owner

flight 180505 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180505/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  b033eddc9779109c06a26936321d27a2ef4e088b
baseline version:
 xen  ef841d2a2377f5297add27e637b725426bb4840a

Last test of basis   180494  2023-05-01 11:03:16 Z1 days
Testing same since   180505  2023-05-02 11:00:27 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Ayan Kumar Halder 
  Jan Beulich 
  Juergen Gross 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   ef841d2a23..b033eddc97  b033eddc9779109c06a26936321d27a2ef4e088b -> smoke

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Olaf Hering

Tue, 2 May 2023 15:44:41 +0200 Jan Beulich :

> How would an out-of-tree build help (which for the hypervisor we now
> have support for)? An incremental build there will hit exactly the same
> issue afaict.

Each container target will use a separate output directory. The Leap container 
will only see Leap things, the Tumbleweed container will only see Tumbleweed 
things.

A toolchain update within a container will be no different than it is today. 
But there will be no unexpected jumps anymore.


Olaf


pgpFe_iBTlC3m.pgp
Description: Digitale Signatur von OpenPGP

[xen-unstable test] 180501: tolerable FAIL

2023-05-02 Thread osstest service owner

flight 180501 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/180501/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
180496 pass in 180501
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 20 guest-start/debianhvm.repeat fail 
in 180496 pass in 180501
 test-amd64-amd64-xl-qcow2 21 guest-start/debian.repeat fail in 180496 pass in 
180501
 test-amd64-i386-migrupgrade  10 xen-install/src_host   fail pass in 180496

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180496
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180496
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180496
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180496
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 180496
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 180496
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180496
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180496
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180496
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180496
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180496
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180496
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Jan Beulich

On 02.05.2023 15:36, Olaf Hering wrote:
> Tue, 2 May 2023 15:29:19 +0200 Jan Beulich :
> 
>> Getting this to work automatically is a continued subject of discussion.
> 
> I think the only real solution is an out-of-tree build. Essentially every 
> single component needs to detect a toolchain change. This is unrealistic.

How would an out-of-tree build help (which for the hypervisor we now
have support for)? An incremental build there will hit exactly the same
issue afaict.

Jan

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Daniel P. Smith


On 5/2/23 09:30, Juergen Gross wrote:

On 02.05.23 15:23, Daniel P. Smith wrote:

On 5/2/23 09:13, Juergen Gross wrote:

On 02.05.23 15:03, Daniel P. Smith wrote:

On 4/30/23 10:46, Juergen Gross wrote:

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross 
---
  xen/common/sysctl.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 02505ab044..0cbfe8bd44 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -89,8 +89,7 @@ long 
do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
  if ( num_domains == 
op->u.getdomaininfolist.max_domains )

  break;
-    ret = xsm_getdomaininfo(XSM_HOOK, d);
-    if ( ret )
+    if ( xsm_getdomaininfo(XSM_HOOK, d) )
  continue;
  getdomaininfo(d, &info);



This change does not match the commit message. This says it fixes an 
issue, but unless I am totally missing something, this change is 
nothing more than formatting that drops the use of an intermediate 
variable. Please feel free to correct me if I am wrong here, 
otherwise I believe the commit message should be changed to reflect 
the code change.


You are missing the fact that ret getting set by a failing 
xsm_getdomaininfo()
call might result in the ret value being propagated to the sysctl 
caller. And

this should not happen. So the fix is to NOT modify ret here.


You are correct, my apologies for that.


No need to apologize. :-)


I believe it is proper to admit when you are wrong.

Second, as far as the problem description goes. The *only* time the 
call to xsm_getdomaininfo() at this location will return anything 
other than 0, is when FLASK is being used and a domain whose type is 
not allowed getdomaininfo is making the call. XSM_HOOK signals a 
no-op check for the default/dummy policy, and the SILO policy does 
not override the default/dummy policy for this check.


Your statement sounds as if xsm_getdomaininfo() would always return 
the same
value for a given caller domain. Isn't that return value also 
depending on the
domain specified via the second parameter? In case it isn't, why does 
that

parameter even exist?


It would if the default action was something other than XSM_HOOK. Look 
at line 82 of include/xsm/dummy.h. XSM_HOOK will always return 0 
regardless of the src or dest domains. The function 
xsm_defualt_action() is the policy for both default/dummy and SILO 
with the exception for evntchn, grants, and argo checks for SILO.


Ah, okay. I didn't analyze all of the involved xsm code.


No worries! I am always willing to help in any way that I can. While I 
don't have the bandwidth to be proactive and keep up with everything on 
xen-devel, please do not hesitate to ask me or ping me on anything XSM 
related. I will gladly take a look and provide what insights I might 
have on your query.


v/r,
dps

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Andrew Cooper

On 02/05/2023 1:04 pm, Olaf Hering wrote:
> Tue, 2 May 2023 13:33:13 +0200 Olaf Hering :
>
>> I will investigate why it failed to build for me.
> This happens if one builds first with the Tumbleweed container, and later 
> with the Leap container, without a 'git clean -dffx' in between.
>
> Is there a way to invalidate everything if the toolchain changes?

I thought we had a fix for this.  But it turns out it's still on the list.

https://lore.kernel.org/xen-devel/20230320152836.43205-1-anthony.per...@citrix.com/

Does this improve things for you?

~Andrew

Re: [PATCH v3 2/2] acpi: Add TPM2 interface definition.

2023-05-02 Thread Jan Beulich

On 25.04.2023 19:47, Jennifer Herbert wrote:
> --- a/tools/libacpi/acpi2_0.h
> +++ b/tools/libacpi/acpi2_0.h
> @@ -121,6 +121,36 @@ struct acpi_20_tcpa {
>  };
>  #define ACPI_2_0_TCPA_LAML_SIZE (64*1024)
>  
> +/*
> + * TPM2
> + */

Nit: While I'm willing to accept the comment style violation here as
(apparently) intentional, ...

> +struct acpi_20_tpm2 {
> +struct acpi_header header;
> +uint16_t platform_class;
> +uint16_t reserved;
> +uint64_t control_area_address;
> +uint32_t start_method;
> +uint8_t start_method_params[12];
> +uint32_t log_area_minimum_length;
> +uint64_t log_area_start_address;
> +};
> +#define TPM2_ACPI_CLASS_CLIENT  0
> +#define TPM2_START_METHOD_CRB   7
> +
> +/* TPM register I/O Mapped region, location of which defined in the
> + * TCG PC Client Platform TPM Profile Specification for TPM 2.0.
> + * See table 9 - Only Locality 0 is used here. This is emulated by QEMU.
> + * Definition of Register space is found in table 12.
> + */

... this comment wants adjusting to hypervisor style (/* on its own line),
as that looks to be the aimed-at style in this file.

> @@ -352,6 +353,7 @@ static int construct_secondary_tables(struct acpi_ctxt 
> *ctxt,
>  struct acpi_20_tcpa *tcpa;
>  unsigned char *ssdt;
>  void *lasa;
> +struct acpi_20_tpm2 *tpm2;

Could I talk you into moving this up by two lines, such that it'll be
adjacent to "tcpa"?

> @@ -450,6 +452,43 @@ static int construct_secondary_tables(struct acpi_ctxt 
> *ctxt,
>   tcpa->header.length);
>  }
>  break;
> +
> +case 2:
> +/* Check VID stored in bits 37:32 (3rd 16 bit word) of CRB
> + * identifier register.  See table 16 of TCG PC client platform
> + * TPM profile specification for TPM 2.0.
> + */

Nit: This comment again wants a style adjustment.

> --- /dev/null
> +++ b/tools/libacpi/ssdt_tpm2.asl
> @@ -0,0 +1,36 @@
> +/*
> + * ssdt_tpm2.asl
> + *
> + * Copyright (c) 2018-2022, Citrix Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU Lesser General Public License as published
> + * by the Free Software Foundation; version 2.1 only. with the special
> + * exception on linking described in file LICENSE.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + */

While the full conversion to SPDX was done in the hypervisor only so far,
I think new tool stack source files would better use the much shorter
SPDX equivalent, too.

Then on top of Jason's R-b,
Acked-by: Jan Beulich 

Jan

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Olaf Hering

Tue, 2 May 2023 15:29:19 +0200 Jan Beulich :

> Getting this to work automatically is a continued subject of discussion.

I think the only real solution is an out-of-tree build. Essentially every 
single component needs to detect a toolchain change. This is unrealistic.


Olaf


pgpjSDzLmeFs3.pgp
Description: Digitale Signatur von OpenPGP

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Juergen Gross


On 02.05.23 15:23, Daniel P. Smith wrote:

On 5/2/23 09:13, Juergen Gross wrote:

On 02.05.23 15:03, Daniel P. Smith wrote:

On 4/30/23 10:46, Juergen Gross wrote:

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross 
---
  xen/common/sysctl.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 02505ab044..0cbfe8bd44 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -89,8 +89,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
  if ( num_domains == op->u.getdomaininfolist.max_domains )
  break;
-    ret = xsm_getdomaininfo(XSM_HOOK, d);
-    if ( ret )
+    if ( xsm_getdomaininfo(XSM_HOOK, d) )
  continue;
  getdomaininfo(d, &info);



This change does not match the commit message. This says it fixes an issue, 
but unless I am totally missing something, this change is nothing more than 
formatting that drops the use of an intermediate variable. Please feel free 
to correct me if I am wrong here, otherwise I believe the commit message 
should be changed to reflect the code change.


You are missing the fact that ret getting set by a failing xsm_getdomaininfo()
call might result in the ret value being propagated to the sysctl caller. And
this should not happen. So the fix is to NOT modify ret here.


You are correct, my apologies for that.


No need to apologize. :-)

Second, as far as the problem description goes. The *only* time the call to 
xsm_getdomaininfo() at this location will return anything other than 0, is 
when FLASK is being used and a domain whose type is not allowed getdomaininfo 
is making the call. XSM_HOOK signals a no-op check for the default/dummy 
policy, and the SILO policy does not override the default/dummy policy for 
this check.


Your statement sounds as if xsm_getdomaininfo() would always return the same
value for a given caller domain. Isn't that return value also depending on the
domain specified via the second parameter? In case it isn't, why does that
parameter even exist?


It would if the default action was something other than XSM_HOOK. Look at line 
82 of include/xsm/dummy.h. XSM_HOOK will always return 0 regardless of the src 
or dest domains. The function xsm_defualt_action() is the policy for both 
default/dummy and SILO with the exception for evntchn, grants, and argo checks 
for SILO.


Ah, okay. I didn't analyze all of the involved xsm code.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Jan Beulich

On 02.05.2023 14:04, Olaf Hering wrote:
> Tue, 2 May 2023 13:33:13 +0200 Olaf Hering :
> 
>> I will investigate why it failed to build for me.
> 
> This happens if one builds first with the Tumbleweed container, and later 
> with the Leap container, without a 'git clean -dffx' in between.
> 
> Is there a way to invalidate everything if the toolchain changes?

Getting this to work automatically is a continued subject of discussion.
Touching xen/.config before starting the build ought to work, though.

Jan

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Daniel P. Smith


On 5/2/23 09:23, Daniel P. Smith wrote:

On 5/2/23 09:13, Juergen Gross wrote:

On 02.05.23 15:03, Daniel P. Smith wrote:

On 4/30/23 10:46, Juergen Gross wrote:

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross 
---
  xen/common/sysctl.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 02505ab044..0cbfe8bd44 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -89,8 +89,7 @@ long 
do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)

  if ( num_domains == op->u.getdomaininfolist.max_domains )
  break;
-    ret = xsm_getdomaininfo(XSM_HOOK, d);
-    if ( ret )
+    if ( xsm_getdomaininfo(XSM_HOOK, d) )
  continue;
  getdomaininfo(d, &info);



This change does not match the commit message. This says it fixes an 
issue, but unless I am totally missing something, this change is 
nothing more than formatting that drops the use of an intermediate 
variable. Please feel free to correct me if I am wrong here, 
otherwise I believe the commit message should be changed to reflect 
the code change.


You are missing the fact that ret getting set by a failing 
xsm_getdomaininfo()
call might result in the ret value being propagated to the sysctl 
caller. And

this should not happen. So the fix is to NOT modify ret here.


You are correct, my apologies for that.

Second, as far as the problem description goes. The *only* time the 
call to xsm_getdomaininfo() at this location will return anything 
other than 0, is when FLASK is being used and a domain whose type is 
not allowed getdomaininfo is making the call. XSM_HOOK signals a 
no-op check for the default/dummy policy, and the SILO policy does 
not override the default/dummy policy for this check.


Your statement sounds as if xsm_getdomaininfo() would always return 
the same
value for a given caller domain. Isn't that return value also 
depending on the
domain specified via the second parameter? In case it isn't, why does 
that

parameter even exist?


It would if the default action was something other than XSM_HOOK. Look 
at line 82 of include/xsm/dummy.h. XSM_HOOK will always return 0 
regardless of the src or dest domains. The function xsm_defualt_action() 
is the policy for both default/dummy and SILO with the exception for 
evntchn, grants, and argo checks for SILO.


Sorry, one last clarification. xsm_default_action() is also what is used 
when XSM=n. The difference is that for XSM=n, xsm_default_action() is 
in-lined at the call site whereas with XSM=y and not using FLASK results 
in a function call xsm_default_action().


v/r,
dps

Re: [PATCH 1/2] x86/head: check base address alignment

2023-05-02 Thread Jan Beulich

On 02.05.2023 15:02, Roger Pau Monné wrote:
> On Tue, May 02, 2023 at 01:11:12PM +0200, Jan Beulich wrote:
>> On 02.05.2023 13:05, Jan Beulich wrote:
>>> On 02.05.2023 12:51, Roger Pau Monné wrote:
 On Tue, May 02, 2023 at 12:28:55PM +0200, Jan Beulich wrote:
> On 02.05.2023 11:54, Andrew Cooper wrote:
>> On 02/05/2023 10:22 am, Roger Pau Monne wrote:
>>> @@ -670,6 +674,11 @@ trampoline_setup:
>>>  cmp %edi, %eax
>>>  jb  1b
>>>  
>>> +/* Check that the image base is aligned. */
>>> +lea sym_esi(_start), %eax
>>> +and $(1 << L2_PAGETABLE_SHIFT) - 1, %eax
>>> +jnz not_aligned
>>
>> You just want to check the value in %esi, which is the base of the Xen
>> image.  Something like:
>>
>> mov %esi, %eax
>> and ...
>> jnz
>
> Or yet more simply "test $..., %esi" and then "jnz ..."?

 As replied to Andrew, I would rather keep this inline with the address
 used to build the PDE, which is sym_esi(_start).
>>>
>>> Well, I won't insist, and you've got Andrew's R-b already.
>>
>> Actually, one more remark here: While using sym_esi() is more in line
>> with the actual consumer of the data, the check triggering because of
>> the transformation yielding a misaligned value (in turn because of a
>> bug elsewhere) would yield a misleading error message: We might well
>> have been loaded at a 2Mb-aligned boundary, and instead its internal
>> logic which would then have been wrong. (I'm sorry, now you'll get to
>> judge whether keeping the check in line with other code or with the
>> diagnostic is going to be better. Or split things into a build-time
>> and a runtime check, as previously suggested.)
> 
> What about adding a build time check that XEN_VIRT_START is 2MB
> aligned, and then just switching to test instead of and, would that be
> acceptable?

Hmm, yes, why not. (Except I would still express it as sym_offs(0)
rather than a direct use of XEN_VIRT_START, once again to better
match surrounding code.)

Jan

> I know that using sym_esi(_start) instead of just esi won't change the
> result of the check if XEN_VIRT_START is aligned, but I would prefer
> to keep the usage of sym_esi(_start) for consistency with the value
> used to build the page tables, as I think it's clearer.
> 
> Thanks, Roger.

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Daniel P. Smith


On 5/2/23 09:13, Juergen Gross wrote:

On 02.05.23 15:03, Daniel P. Smith wrote:

On 4/30/23 10:46, Juergen Gross wrote:

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross 
---
  xen/common/sysctl.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 02505ab044..0cbfe8bd44 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -89,8 +89,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) 
u_sysctl)

  if ( num_domains == op->u.getdomaininfolist.max_domains )
  break;
-    ret = xsm_getdomaininfo(XSM_HOOK, d);
-    if ( ret )
+    if ( xsm_getdomaininfo(XSM_HOOK, d) )
  continue;
  getdomaininfo(d, &info);



This change does not match the commit message. This says it fixes an 
issue, but unless I am totally missing something, this change is 
nothing more than formatting that drops the use of an intermediate 
variable. Please feel free to correct me if I am wrong here, otherwise 
I believe the commit message should be changed to reflect the code 
change.


You are missing the fact that ret getting set by a failing 
xsm_getdomaininfo()
call might result in the ret value being propagated to the sysctl 
caller. And

this should not happen. So the fix is to NOT modify ret here.


You are correct, my apologies for that.

Second, as far as the problem description goes. The *only* time the 
call to xsm_getdomaininfo() at this location will return anything 
other than 0, is when FLASK is being used and a domain whose type is 
not allowed getdomaininfo is making the call. XSM_HOOK signals a no-op 
check for the default/dummy policy, and the SILO policy does not 
override the default/dummy policy for this check.


Your statement sounds as if xsm_getdomaininfo() would always return the 
same
value for a given caller domain. Isn't that return value also depending 
on the

domain specified via the second parameter? In case it isn't, why does that
parameter even exist?


It would if the default action was something other than XSM_HOOK. Look 
at line 82 of include/xsm/dummy.h. XSM_HOOK will always return 0 
regardless of the src or dest domains. The function xsm_defualt_action() 
is the policy for both default/dummy and SILO with the exception for 
evntchn, grants, and argo checks for SILO.


v/r,
DPS

Re: [PATCH] sysctl: XSM hook should not cause XEN_SYSCTL_getdomaininfolist to (appear to) fail

2023-05-02 Thread Jan Beulich

On 02.05.2023 15:13, Daniel P. Smith wrote:
> On 5/2/23 07:00, Roger Pau Monné wrote:
>> On Tue, May 02, 2023 at 06:43:33AM -0400, Daniel P. Smith wrote:
>>> On 5/2/23 03:17, Jan Beulich wrote:
 Unlike for XEN_DOMCTL_getdomaininfo, where the XSM check is intended to
 cause the operation to fail, in the loop here it ought to merely
 determine whether information for the domain at hand may be reported
 back. Therefore if on the last iteration the hook results in denial,
 this should not affect the sub-op's return value.

 Fixes: d046f361dc93 ("Xen Security Modules: XSM")
 Signed-off-by: Jan Beulich 
 ---
 The hook being able to deny access to data for certain domains means
 that no caller can assume to have a system-wide picture when holding the
 results.

 Wouldn't it make sense to permit the function to merely "count" domains?
 While racy in general (including in its present, "normal" mode of
 operation), within a tool stack this could be used as long as creation
 of new domains is suppressed between obtaining the count and then using
 it.

 In XEN_DOMCTL_getpageframeinfo2 said commit had introduced a 2nd such
 issue, but luckily that sub-op and xsm_getpageframeinfo() are long gone.

>>>
>>> I understand there is a larger issue at play here but neutering the security
>>> control/XSM check is not the answer. This literally changes the way a FLASK
>>> policy that people currently have would be enforced, as well as contrary to
>>> how they understand the access control that it provides. Even though the
>>> code path does not fall under XSM maintainer, I would NACK this patch. IMHO,
>>> it is better to find a solution that does not abuse, misuse, or invalidate
>>> the purpose of the XSM calls.
>>>
>>> On a side note, I am a little concern that only one person thought to
>>> include the XSM maintainer, or any of the XSM reviewers, onto a patch and
>>> the discussion around a patch that clearly relates to XSM for us to gauge
>>> the consequences of the patch. I am not assuming intentions here, only
>>> wanting to raise the concern.
>>>
>>> So for what it is worth, NACK.
>>
>> I assume the NACK is to the remarks after the '---'?
>>
>> The patch itself doesn't change the enforcement of the XSM checks,
>> just prevents returning an error when the information from the last
>> domain in the loop can not be fetched.
>>
>> Am I missing something?
> 
> Actually, I should have finished my first cup of tea and looked closer 
> at the patch in the larger context instead of the description, as the 
> two do not align. You are correct, and provided I am not wrong here, the 
> change is a no-op formatting change that removes an intermediate 
> variable. I do not see how directly checking the return in an if versus 
> checking the return stored in a variable. Additionally, the claim is 
> that this occurs when XSM is enabled, which is also incorrect. The only 
> difference at this location in code between not having XSM enabled and 
> having it enabled is that for the latter, xsm_getdomaininfo() is an 
> in-lined version versus a function call. In either case, both will 
> return 0 unless you are using FLASK and have a policy blocking the 
> domain from making the call.

While perhaps imprecise, "XSM enabled" typically is taken for "Flask
is in use". Then again, looking back, neither title nor description
say "XSM enabled". And it truly was the XSM hook which might have
caused the sub-op to wrongly be reported as failed (given, as you say,
a policy is in place which actually can cause failure from that hook).

Jan

Re: [PATCH v4 04/20] virtio-scsi: stop using aio_disable_external() during unplug

2023-05-02 Thread Kevin Wolf

Am 01.05.2023 um 17:09 hat Stefan Hajnoczi geschrieben:
> On Fri, Apr 28, 2023 at 04:22:55PM +0200, Kevin Wolf wrote:
> > Am 25.04.2023 um 19:27 hat Stefan Hajnoczi geschrieben:
> > > This patch is part of an effort to remove the aio_disable_external()
> > > API because it does not fit in a multi-queue block layer world where
> > > many AioContexts may be submitting requests to the same disk.
> > > 
> > > The SCSI emulation code is already in good shape to stop using
> > > aio_disable_external(). It was only used by commit 9c5aad84da1c
> > > ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
> > > disk") to ensure that virtio_scsi_hotunplug() works while the guest
> > > driver is submitting I/O.
> > > 
> > > Ensure virtio_scsi_hotunplug() is safe as follows:
> > > 
> > > 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
> > >device_set_realized() calls qatomic_set(&dev->realized, false) so
> > >that future scsi_device_get() calls return NULL because they exclude
> > >SCSIDevices with realized=false.
> > > 
> > >That means virtio-scsi will reject new I/O requests to this
> > >SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
> > >virtio_scsi_hotunplug() is still executing. We are protected against
> > >new requests!
> > > 
> > > 2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
> > >that in-flight requests are cancelled synchronously. This ensures
> > >that no in-flight requests remain once qdev_simple_device_unplug_cb()
> > >returns.
> > > 
> > > Thanks to these two conditions we don't need aio_disable_external()
> > > anymore.
> > > 
> > > Cc: Zhengui Li 
> > > Reviewed-by: Paolo Bonzini 
> > > Reviewed-by: Daniil Tatianin 
> > > Signed-off-by: Stefan Hajnoczi 
> > 
> > qemu-iotests 040 starts failing for me after this patch, with what looks
> > like a use-after-free error of some kind.
> > 
> > (gdb) bt
> > #0  0x55b6e3e1f31c in job_type (job=0xe3e3e3e3e3e3e3e3) at ../job.c:238
> > #1  0x55b6e3e1cee5 in is_block_job (job=0xe3e3e3e3e3e3e3e3) at 
> > ../blockjob.c:41
> > #2  0x55b6e3e1ce7d in block_job_next_locked (bjob=0x55b6e72b7570) at 
> > ../blockjob.c:54
> > #3  0x55b6e3df6370 in blockdev_mark_auto_del (blk=0x55b6e74af0a0) at 
> > ../blockdev.c:157
> > #4  0x55b6e393e23b in scsi_qdev_unrealize (qdev=0x55b6e7c04d40) at 
> > ../hw/scsi/scsi-bus.c:303
> > #5  0x55b6e3db0d0e in device_set_realized (obj=0x55b6e7c04d40, 
> > value=false, errp=0x55b6e497c918 ) at ../hw/core/qdev.c:599
> > #6  0x55b6e3dba36e in property_set_bool (obj=0x55b6e7c04d40, 
> > v=0x55b6e7d7f290, name=0x55b6e41bd6d8 "realized", opaque=0x55b6e7246d20, 
> > errp=0x55b6e497c918 )
> > at ../qom/object.c:2285
> > #7  0x55b6e3db7e65 in object_property_set (obj=0x55b6e7c04d40, 
> > name=0x55b6e41bd6d8 "realized", v=0x55b6e7d7f290, errp=0x55b6e497c918 
> > ) at ../qom/object.c:1420
> > #8  0x55b6e3dbd84a in object_property_set_qobject (obj=0x55b6e7c04d40, 
> > name=0x55b6e41bd6d8 "realized", value=0x55b6e74c1890, errp=0x55b6e497c918 
> > )
> > at ../qom/qom-qobject.c:28
> > #9  0x55b6e3db8570 in object_property_set_bool (obj=0x55b6e7c04d40, 
> > name=0x55b6e41bd6d8 "realized", value=false, errp=0x55b6e497c918 
> > ) at ../qom/object.c:1489
> > #10 0x55b6e3daf2b5 in qdev_unrealize (dev=0x55b6e7c04d40) at 
> > ../hw/core/qdev.c:306
> > #11 0x55b6e3db509d in qdev_simple_device_unplug_cb 
> > (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at 
> > ../hw/core/qdev-hotplug.c:72
> > #12 0x55b6e3c520f9 in virtio_scsi_hotunplug 
> > (hotplug_dev=0x55b6e81c3630, dev=0x55b6e7c04d40, errp=0x7ffec5519200) at 
> > ../hw/scsi/virtio-scsi.c:1065
> > #13 0x55b6e3db4dec in hotplug_handler_unplug 
> > (plug_handler=0x55b6e81c3630, plugged_dev=0x55b6e7c04d40, 
> > errp=0x7ffec5519200) at ../hw/core/hotplug.c:56
> > #14 0x55b6e3a28f84 in qdev_unplug (dev=0x55b6e7c04d40, 
> > errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:935
> > #15 0x55b6e3a290fa in qmp_device_del (id=0x55b6e74c1760 "scsi0", 
> > errp=0x7ffec55192e0) at ../softmmu/qdev-monitor.c:955
> > #16 0x55b6e3fb0a5f in qmp_marshal_device_del (args=0x7f61cc005eb0, 
> > ret=0x7f61d5a8ae38, errp=0x7f61d5a8ae40) at qapi/qapi-commands-qdev.c:114
> > #17 0x55b6e3fd52e1 in do_qmp_dispatch_bh (opaque=0x7f61d5a8ae08) at 
> > ../qapi/qmp-dispatch.c:128
> > #18 0x55b6e4007b9e in aio_bh_call (bh=0x55b6e7dea730) at 
> > ../util/async.c:155
> > #19 0x55b6e4007d2e in aio_bh_poll (ctx=0x55b6e72447c0) at 
> > ../util/async.c:184
> > #20 0x55b6e3fe3b45 in aio_dispatch (ctx=0x55b6e72447c0) at 
> > ../util/aio-posix.c:421
> > #21 0x55b6e4009544 in aio_ctx_dispatch (source=0x55b6e72447c0, 
> > callback=0x0, user_data=0x0) at ../util/async.c:326
> > #22 0x7f61ddc14c7f in g_main_dispatch (context=0x55b6e7244b20) at 
> > ../glib/gmain.c:3454
> > #23 g_main_context_dispatch (context=0x55b6e7244b20) at ../glib/gmain.c:4

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Daniel P. Smith


On 5/2/23 09:10, Roger Pau Monné wrote:

On Tue, May 02, 2023 at 09:03:00AM -0400, Daniel P. Smith wrote:

On 4/30/23 10:46, Juergen Gross wrote:

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross 
---
   xen/common/sysctl.c | 3 +--
   1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 02505ab044..0cbfe8bd44 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -89,8 +89,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
   if ( num_domains == op->u.getdomaininfolist.max_domains )
   break;
-ret = xsm_getdomaininfo(XSM_HOOK, d);
-if ( ret )
+if ( xsm_getdomaininfo(XSM_HOOK, d) )
   continue;
   getdomaininfo(d, &info);



This change does not match the commit message. This says it fixes an issue,
but unless I am totally missing something, this change is nothing more than
formatting that drops the use of an intermediate variable. Please feel free
to correct me if I am wrong here, otherwise I believe the commit message
should be changed to reflect the code change.


By dropping that intermediate variable it prevents returning an error
as the result of the hypercall if xsm_getdomaininfo() for the last
domain fails.


Ah, understood. I missed ret is state tracking.


Note that xsm_getdomaininfo() failing for other domains not the last
one don't cause the return value of the hypercall to be an error
code, because the variable containing the error gets overwritten by
further loops.


In the end, this is just addressing an issue that has not been seen by 
anyone and happened upon while debugging another issue.


V/r,
DPS

Re: [PATCH] sysctl: XSM hook should not cause XEN_SYSCTL_getdomaininfolist to (appear to) fail

2023-05-02 Thread Jan Beulich

On 02.05.2023 14:54, Daniel P. Smith wrote:
> On 5/2/23 06:59, Jan Beulich wrote:
>> On 02.05.2023 12:43, Daniel P. Smith wrote:
>>> On 5/2/23 03:17, Jan Beulich wrote:
 Unlike for XEN_DOMCTL_getdomaininfo, where the XSM check is intended to
 cause the operation to fail, in the loop here it ought to merely
 determine whether information for the domain at hand may be reported
 back. Therefore if on the last iteration the hook results in denial,
 this should not affect the sub-op's return value.

 Fixes: d046f361dc93 ("Xen Security Modules: XSM")
 Signed-off-by: Jan Beulich 
 ---
 The hook being able to deny access to data for certain domains means
 that no caller can assume to have a system-wide picture when holding the
 results.

 Wouldn't it make sense to permit the function to merely "count" domains?
 While racy in general (including in its present, "normal" mode of
 operation), within a tool stack this could be used as long as creation
 of new domains is suppressed between obtaining the count and then using
 it.

 In XEN_DOMCTL_getpageframeinfo2 said commit had introduced a 2nd such
 issue, but luckily that sub-op and xsm_getpageframeinfo() are long gone.

>>>
>>> I understand there is a larger issue at play here but neutering the
>>> security control/XSM check is not the answer. This literally changes the
>>> way a FLASK policy that people currently have would be enforced, as well
>>> as contrary to how they understand the access control that it provides.
>>> Even though the code path does not fall under XSM maintainer, I would
>>> NACK this patch. IMHO, it is better to find a solution that does not
>>> abuse, misuse, or invalidate the purpose of the XSM calls.
>>>
>>> On a side note, I am a little concern that only one person thought to
>>> include the XSM maintainer, or any of the XSM reviewers, onto a patch
>>> and the discussion around a patch that clearly relates to XSM for us to
>>> gauge the consequences of the patch. I am not assuming intentions here,
>>> only wanting to raise the concern.
>>
>> Well, yes, for the discussion items I could have remembered to include
>> you. The code change itself, otoh, doesn't require your ack, even if it
>> is the return value of an XSM function which was used wrongly here.
> 
> I beg to disagree, not that you could have, but that you should have. 
> This is now the second XSM issue, that I am aware of at least, that 
> myself and the XSM reviewers have been left out of. How and where the 
> XSM hooks are deployed are critical to how XSM function, regardless of 
> how mundane the change may be. By your logic, as the XSM maintainer I 
> can make changes to the XSM code that changes how the system behaves for 
> x86 and claim you have no Ack/Nack authority since it is XSM code. These 
> subsystems are symbiotic, and we owe each other the due respect to 
> include to the other when these systems touch or influence each other.

No, that's not a proper representation of "my logic". Everyone can comment
on any patch, and pending objections will prevent it from going in. Still
not everyone needs to be Cc-ed on every patch. If you want to get to see
ones you're not Cc-ed on, you'll need to be subscribed to the list, to
look at (and perhaps comment on) all the ones of interest to you.

Jan

Re: [PATCH] sysctl: XSM hook should not cause XEN_SYSCTL_getdomaininfolist to (appear to) fail

2023-05-02 Thread Daniel P. Smith


On 5/2/23 07:00, Roger Pau Monné wrote:

On Tue, May 02, 2023 at 06:43:33AM -0400, Daniel P. Smith wrote:



On 5/2/23 03:17, Jan Beulich wrote:

Unlike for XEN_DOMCTL_getdomaininfo, where the XSM check is intended to
cause the operation to fail, in the loop here it ought to merely
determine whether information for the domain at hand may be reported
back. Therefore if on the last iteration the hook results in denial,
this should not affect the sub-op's return value.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Jan Beulich 
---
The hook being able to deny access to data for certain domains means
that no caller can assume to have a system-wide picture when holding the
results.

Wouldn't it make sense to permit the function to merely "count" domains?
While racy in general (including in its present, "normal" mode of
operation), within a tool stack this could be used as long as creation
of new domains is suppressed between obtaining the count and then using
it.

In XEN_DOMCTL_getpageframeinfo2 said commit had introduced a 2nd such
issue, but luckily that sub-op and xsm_getpageframeinfo() are long gone.



I understand there is a larger issue at play here but neutering the security
control/XSM check is not the answer. This literally changes the way a FLASK
policy that people currently have would be enforced, as well as contrary to
how they understand the access control that it provides. Even though the
code path does not fall under XSM maintainer, I would NACK this patch. IMHO,
it is better to find a solution that does not abuse, misuse, or invalidate
the purpose of the XSM calls.

On a side note, I am a little concern that only one person thought to
include the XSM maintainer, or any of the XSM reviewers, onto a patch and
the discussion around a patch that clearly relates to XSM for us to gauge
the consequences of the patch. I am not assuming intentions here, only
wanting to raise the concern.

So for what it is worth, NACK.


I assume the NACK is to the remarks after the '---'?

The patch itself doesn't change the enforcement of the XSM checks,
just prevents returning an error when the information from the last
domain in the loop can not be fetched.

Am I missing something?


Actually, I should have finished my first cup of tea and looked closer 
at the patch in the larger context instead of the description, as the 
two do not align. You are correct, and provided I am not wrong here, the 
change is a no-op formatting change that removes an intermediate 
variable. I do not see how directly checking the return in an if versus 
checking the return stored in a variable. Additionally, the claim is 
that this occurs when XSM is enabled, which is also incorrect. The only 
difference at this location in code between not having XSM enabled and 
having it enabled is that for the latter, xsm_getdomaininfo() is an 
in-lined version versus a function call. In either case, both will 
return 0 unless you are using FLASK and have a policy blocking the 
domain from making the call.


V/r,
Daniel P. Smith

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Juergen Gross


On 02.05.23 15:03, Daniel P. Smith wrote:

On 4/30/23 10:46, Juergen Gross wrote:

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross 
---
  xen/common/sysctl.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 02505ab044..0cbfe8bd44 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -89,8 +89,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
  if ( num_domains == op->u.getdomaininfolist.max_domains )
  break;
-    ret = xsm_getdomaininfo(XSM_HOOK, d);
-    if ( ret )
+    if ( xsm_getdomaininfo(XSM_HOOK, d) )
  continue;
  getdomaininfo(d, &info);



This change does not match the commit message. This says it fixes an issue, but 
unless I am totally missing something, this change is nothing more than 
formatting that drops the use of an intermediate variable. Please feel free to 
correct me if I am wrong here, otherwise I believe the commit message should be 
changed to reflect the code change.


You are missing the fact that ret getting set by a failing xsm_getdomaininfo()
call might result in the ret value being propagated to the sysctl caller. And
this should not happen. So the fix is to NOT modify ret here.

Second, as far as the problem description goes. The *only* time the call to 
xsm_getdomaininfo() at this location will return anything other than 0, is when 
FLASK is being used and a domain whose type is not allowed getdomaininfo is 
making the call. XSM_HOOK signals a no-op check for the default/dummy policy, 
and the SILO policy does not override the default/dummy policy for this check.


Your statement sounds as if xsm_getdomaininfo() would always return the same
value for a given caller domain. Isn't that return value also depending on the
domain specified via the second parameter? In case it isn't, why does that
parameter even exist?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Roger Pau Monné

On Tue, May 02, 2023 at 09:03:00AM -0400, Daniel P. Smith wrote:
> On 4/30/23 10:46, Juergen Gross wrote:
> > In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
> > can fail if the last domain scanned isn't allowed to be accessed by
> > the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).
> > 
> > Fix that by just ignoring scanned domains where xsm_getdomaininfo()
> > is returning an error, like it is effectively done when such a
> > situation occurs for a domain not being the last one scanned.
> > 
> > Fixes: d046f361dc93 ("Xen Security Modules: XSM")
> > Signed-off-by: Juergen Gross 
> > ---
> >   xen/common/sysctl.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> > index 02505ab044..0cbfe8bd44 100644
> > --- a/xen/common/sysctl.c
> > +++ b/xen/common/sysctl.c
> > @@ -89,8 +89,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) 
> > u_sysctl)
> >   if ( num_domains == op->u.getdomaininfolist.max_domains )
> >   break;
> > -ret = xsm_getdomaininfo(XSM_HOOK, d);
> > -if ( ret )
> > +if ( xsm_getdomaininfo(XSM_HOOK, d) )
> >   continue;
> >   getdomaininfo(d, &info);
> 
> 
> This change does not match the commit message. This says it fixes an issue,
> but unless I am totally missing something, this change is nothing more than
> formatting that drops the use of an intermediate variable. Please feel free
> to correct me if I am wrong here, otherwise I believe the commit message
> should be changed to reflect the code change.

By dropping that intermediate variable it prevents returning an error
as the result of the hypercall if xsm_getdomaininfo() for the last
domain fails.

Note that xsm_getdomaininfo() failing for other domains not the last
one don't cause the return value of the hypercall to be an error
code, because the variable containing the error gets overwritten by
further loops.

Regards, Roger.

Re: [PATCH 1/2] x86/head: check base address alignment

2023-05-02 Thread Roger Pau Monné

On Tue, May 02, 2023 at 01:11:12PM +0200, Jan Beulich wrote:
> On 02.05.2023 13:05, Jan Beulich wrote:
> > On 02.05.2023 12:51, Roger Pau Monné wrote:
> >> On Tue, May 02, 2023 at 12:28:55PM +0200, Jan Beulich wrote:
> >>> On 02.05.2023 11:54, Andrew Cooper wrote:
>  On 02/05/2023 10:22 am, Roger Pau Monne wrote:
> > @@ -670,6 +674,11 @@ trampoline_setup:
> >  cmp %edi, %eax
> >  jb  1b
> >  
> > +/* Check that the image base is aligned. */
> > +lea sym_esi(_start), %eax
> > +and $(1 << L2_PAGETABLE_SHIFT) - 1, %eax
> > +jnz not_aligned
> 
>  You just want to check the value in %esi, which is the base of the Xen
>  image.  Something like:
> 
>  mov %esi, %eax
>  and ...
>  jnz
> >>>
> >>> Or yet more simply "test $..., %esi" and then "jnz ..."?
> >>
> >> As replied to Andrew, I would rather keep this inline with the address
> >> used to build the PDE, which is sym_esi(_start).
> > 
> > Well, I won't insist, and you've got Andrew's R-b already.
> 
> Actually, one more remark here: While using sym_esi() is more in line
> with the actual consumer of the data, the check triggering because of
> the transformation yielding a misaligned value (in turn because of a
> bug elsewhere) would yield a misleading error message: We might well
> have been loaded at a 2Mb-aligned boundary, and instead its internal
> logic which would then have been wrong. (I'm sorry, now you'll get to
> judge whether keeping the check in line with other code or with the
> diagnostic is going to be better. Or split things into a build-time
> and a runtime check, as previously suggested.)

What about adding a build time check that XEN_VIRT_START is 2MB
aligned, and then just switching to test instead of and, would that be
acceptable?

I know that using sym_esi(_start) instead of just esi won't change the
result of the check if XEN_VIRT_START is aligned, but I would prefer
to keep the usage of sym_esi(_start) for consistency with the value
used to build the page tables, as I think it's clearer.

Thanks, Roger.

Re: [PATCH] xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

2023-05-02 Thread Daniel P. Smith


On 4/30/23 10:46, Juergen Gross wrote:

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross 
---
  xen/common/sysctl.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 02505ab044..0cbfe8bd44 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -89,8 +89,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
  if ( num_domains == op->u.getdomaininfolist.max_domains )
  break;
  
-ret = xsm_getdomaininfo(XSM_HOOK, d);

-if ( ret )
+if ( xsm_getdomaininfo(XSM_HOOK, d) )
  continue;
  
  getdomaininfo(d, &info);



This change does not match the commit message. This says it fixes an 
issue, but unless I am totally missing something, this change is nothing 
more than formatting that drops the use of an intermediate variable. 
Please feel free to correct me if I am wrong here, otherwise I believe 
the commit message should be changed to reflect the code change.


Second, as far as the problem description goes. The *only* time the call 
to xsm_getdomaininfo() at this location will return anything other than 
0, is when FLASK is being used and a domain whose type is not allowed 
getdomaininfo is making the call. XSM_HOOK signals a no-op check for the 
default/dummy policy, and the SILO policy does not override the 
default/dummy policy for this check.


V/r,
Daniel P. Smith

Re: [PATCH v3 2/3] tools: Use new xc function for some xc_domain_getinfo() calls

2023-05-02 Thread Andrew Cooper

On 02/05/2023 12:18 pm, Alejandro Vallejo wrote:
> On Tue, May 02, 2023 at 12:13:37PM +0100, Alejandro Vallejo wrote:
>> Move calls that require a information about a single precisely identified
>> domain to the new xc_domain_getinfo_single().
>>
>> Signed-off-by: Alejandro Vallejo 
>> Reviewed-by: Andrew Cooper 
>>
>> ---
>> Cc: Andrew Cooper 
>> Cc: Wei Liu 
>> Cc: Anthony PERARD 
>> Cc: Tim Deegan 
>> Cc: George Dunlap 
>> Cc: Juergen Gross 
>>
>> v3:
>>  * Stylistic changes that fell under the cracks on v2
>>  * Reinserted -errno convention from v1 that had been
>>removed in v2
> Mistake here. It's _NOT_ supposed to have that "Reviewed-by"

Nevertheless, v3 now looks good.

Reviewed-by: Andrew Cooper

Re: [PATCH] sysctl: XSM hook should not cause XEN_SYSCTL_getdomaininfolist to (appear to) fail

2023-05-02 Thread Daniel P. Smith


On 5/2/23 06:59, Jan Beulich wrote:

On 02.05.2023 12:43, Daniel P. Smith wrote:

On 5/2/23 03:17, Jan Beulich wrote:

Unlike for XEN_DOMCTL_getdomaininfo, where the XSM check is intended to
cause the operation to fail, in the loop here it ought to merely
determine whether information for the domain at hand may be reported
back. Therefore if on the last iteration the hook results in denial,
this should not affect the sub-op's return value.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Jan Beulich 
---
The hook being able to deny access to data for certain domains means
that no caller can assume to have a system-wide picture when holding the
results.

Wouldn't it make sense to permit the function to merely "count" domains?
While racy in general (including in its present, "normal" mode of
operation), within a tool stack this could be used as long as creation
of new domains is suppressed between obtaining the count and then using
it.

In XEN_DOMCTL_getpageframeinfo2 said commit had introduced a 2nd such
issue, but luckily that sub-op and xsm_getpageframeinfo() are long gone.



I understand there is a larger issue at play here but neutering the
security control/XSM check is not the answer. This literally changes the
way a FLASK policy that people currently have would be enforced, as well
as contrary to how they understand the access control that it provides.
Even though the code path does not fall under XSM maintainer, I would
NACK this patch. IMHO, it is better to find a solution that does not
abuse, misuse, or invalidate the purpose of the XSM calls.

On a side note, I am a little concern that only one person thought to
include the XSM maintainer, or any of the XSM reviewers, onto a patch
and the discussion around a patch that clearly relates to XSM for us to
gauge the consequences of the patch. I am not assuming intentions here,
only wanting to raise the concern.


Well, yes, for the discussion items I could have remembered to include
you. The code change itself, otoh, doesn't require your ack, even if it
is the return value of an XSM function which was used wrongly here.


I beg to disagree, not that you could have, but that you should have. 
This is now the second XSM issue, that I am aware of at least, that 
myself and the XSM reviewers have been left out of. How and where the 
XSM hooks are deployed are critical to how XSM function, regardless of 
how mundane the change may be. By your logic, as the XSM maintainer I 
can make changes to the XSM code that changes how the system behaves for 
x86 and claim you have no Ack/Nack authority since it is XSM code. These 
subsystems are symbiotic, and we owe each other the due respect to 
include to the other when these systems touch or influence each other.



So for what it is worth, NACK.


I'm puzzled: I hope you don't mean NACK to the patch (or effectively
Jürgen's identical one, which I had noticed only after sending mine).
Yet beyond that I don't see anything here which could be NACKed. I've
merely raised a couple of points for discussion.


I will comment on Jurgen's patch.

Re: [PATCH v3 1/3] tools: Modify single-domid callers of xc_domain_getinfolist()

2023-05-02 Thread Christian Lindig




> On 2 May 2023, at 12:13, Alejandro Vallejo  
> wrote:
> 
> xc_domain_getinfolist() internally relies on a sysctl that performs
> a linear search for the domids. Many callers of xc_domain_getinfolist()
> who require information about a precise domid are much better off calling
> xc_domain_getinfo_single() instead, that will use the getdomaininfo domctl
> instead and ensure the returned domid matches the requested one. The domtctl
> will find the domid faster too, because that uses hashed lists.
> 
> Signed-off-by: Alejandro Vallejo 

Acked-by: Christian Lindig 

I mostly care about the OCaml bindings - looks good to me.

— C

[PATCH v6 05/16] x86/xen: set MTRR state when running as Xen PV initial domain

2023-05-02 Thread Juergen Gross

When running as Xen PV initial domain (aka dom0), MTRRs are disabled
by the hypervisor, but the system should nevertheless use correct
cache memory types. This has always kind of worked, as disabled MTRRs
resulted in disabled PAT, too, so that the kernel avoided code paths
resulting in inconsistencies. This bypassed all of the sanity checks
the kernel is doing with enabled MTRRs in order to avoid memory
mappings with conflicting memory types.

This has been changed recently, leading to PAT being accepted to be
enabled, while MTRRs stayed disabled. The result is that
mtrr_type_lookup() no longer is accepting all memory type requests,
but started to return WB even if UC- was requested. This led to
driver failures during initialization of some devices.

In reality MTRRs are still in effect, but they are under complete
control of the Xen hypervisor. It is possible, however, to retrieve
the MTRR settings from the hypervisor.

In order to fix those problems, overwrite the MTRR state via
mtrr_overwrite_state() with the MTRR data from the hypervisor, if the
system is running as a Xen dom0.

Fixes: 72cbc8f04fe2 ("x86/PAT: Have pat_enabled() properly reflect state when 
running on Xen")
Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
---
V2:
- new patch
V3:
- move the call of mtrr_overwrite_state() to xen_pv_init_platform()
V4:
- only call mtrr_overwrite_state() if any MTRR got from Xen
  (Boris Ostrovsky)
---
 arch/x86/xen/enlighten_pv.c | 52 +
 1 file changed, 52 insertions(+)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 093b78c8bbec..8732b85d5650 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -119,6 +120,54 @@ static int __init parse_xen_msr_safe(char *str)
 }
 early_param("xen_msr_safe", parse_xen_msr_safe);
 
+/* Get MTRR settings from Xen and put them into mtrr_state. */
+static void __init xen_set_mtrr_data(void)
+{
+#ifdef CONFIG_MTRR
+   struct xen_platform_op op = {
+   .cmd = XENPF_read_memtype,
+   .interface_version = XENPF_INTERFACE_VERSION,
+   };
+   unsigned int reg;
+   unsigned long mask;
+   uint32_t eax, width;
+   static struct mtrr_var_range var[MTRR_MAX_VAR_RANGES] __initdata;
+
+   /* Get physical address width (only 64-bit cpus supported). */
+   width = 36;
+   eax = cpuid_eax(0x8000);
+   if ((eax >> 16) == 0x8000 && eax >= 0x8008) {
+   eax = cpuid_eax(0x8008);
+   width = eax & 0xff;
+   }
+
+   for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
+   op.u.read_memtype.reg = reg;
+   if (HYPERVISOR_platform_op(&op))
+   break;
+
+   /*
+* Only called in dom0, which has all RAM PFNs mapped at
+* RAM MFNs, and all PCI space etc. is identity mapped.
+* This means we can treat MFN == PFN regarding MTRR settings.
+*/
+   var[reg].base_lo = op.u.read_memtype.type;
+   var[reg].base_lo |= op.u.read_memtype.mfn << PAGE_SHIFT;
+   var[reg].base_hi = op.u.read_memtype.mfn >> (32 - PAGE_SHIFT);
+   mask = ~((op.u.read_memtype.nr_mfns << PAGE_SHIFT) - 1);
+   mask &= (1UL << width) - 1;
+   if (mask)
+   mask |= MTRR_PHYSMASK_V;
+   var[reg].mask_lo = mask;
+   var[reg].mask_hi = mask >> 32;
+   }
+
+   /* Only overwrite MTRR state if any MTRR could be got from Xen. */
+   if (reg)
+   mtrr_overwrite_state(var, reg, MTRR_TYPE_UNCACHABLE);
+#endif
+}
+
 static void __init xen_pv_init_platform(void)
 {
/* PV guests can't operate virtio devices without grants. */
@@ -135,6 +184,9 @@ static void __init xen_pv_init_platform(void)
 
/* pvclock is in shared info area */
xen_init_time_ops();
+
+   if (xen_initial_domain())
+   xen_set_mtrr_data();
 }
 
 static void __init xen_pv_guest_late_init(void)
-- 
2.35.3

[PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-02 Thread Juergen Gross

This series tries to fix the rather special case of PAT being available
without having MTRRs (either due to CONFIG_MTRR being not set, or
because the feature has been disabled e.g. by a hypervisor).

The main use cases are Xen PV guests and SEV-SNP guests running under
Hyper-V.

Instead of trying to work around all the issues by adding if statements
here and there, just try to use the complete available infrastructure
by setting up a read-only MTRR state when needed.

In the Xen PV case the current MTRR MSR values can be read from the
hypervisor, while for the SEV-SNP case all needed is to set the
default caching mode to "WB".

I have added more cleanup which has been discussed when looking into
the most recent failures.

Note that I couldn't test the Hyper-V related change (patch 3).

Running on bare metal and with Xen didn't show any problems with the
series applied.

It should be noted that patches 9+10 are replacing today's way to
lookup the MTRR cache type for a memory region from looking at the
MTRR register values to building a memory map with the cache types.
This should make the lookup much faster and much easier to understand.

Changes in V2:
- replaced former patches 1+2 with new patches 1-4, avoiding especially
  the rather hacky approach of V1, while making all the MTRR type
  conflict tests available for the Xen PV case
- updated patch 6 (was patch 4 in V1)

Changes in V3:
- dropped patch 5 of V2, as already applied
- split patch 1 of V2 into 2 patches
- new patches 6-10
- addressed comments

Changes in V4:
- addressed comments

Changes in V5
- addressed comments
- some other small fixes
- new patches 3, 8 and 15

Changes in V6:
- patch 1 replaces patches 1+2 of V5
- new patches 8+12
- addressed comments

Juergen Gross (16):
  x86/mtrr: remove physical address size calculation
  x86/mtrr: replace some constants with defines
  x86/mtrr: support setting MTRR state for software defined MTRRs
  x86/hyperv: set MTRR state when running as SEV-SNP Hyper-V guest
  x86/xen: set MTRR state when running as Xen PV initial domain
  x86/mtrr: replace vendor tests in MTRR code
  x86/mtrr: have only one set_mtrr() variant
  x86/mtrr: move 32-bit code from mtrr.c to legacy.c
  x86/mtrr: allocate mtrr_value array dynamically
  x86/mtrr: add get_effective_type() service function
  x86/mtrr: construct a memory map with cache modes
  x86/mtrr: add mtrr=debug command line option
  x86/mtrr: use new cache_map in mtrr_type_lookup()
  x86/mtrr: don't let mtrr_type_lookup() return MTRR_TYPE_INVALID
  x86/mm: only check uniform after calling mtrr_type_lookup()
  x86/mtrr: remove unused code

 .../admin-guide/kernel-parameters.txt |   4 +
 arch/x86/hyperv/ivm.c |   4 +
 arch/x86/include/asm/mtrr.h   |  43 +-
 arch/x86/include/uapi/asm/mtrr.h  |   6 +-
 arch/x86/kernel/cpu/mtrr/Makefile |   2 +-
 arch/x86/kernel/cpu/mtrr/amd.c|   2 +-
 arch/x86/kernel/cpu/mtrr/centaur.c|  11 +-
 arch/x86/kernel/cpu/mtrr/cleanup.c|  22 +-
 arch/x86/kernel/cpu/mtrr/cyrix.c  |   2 +-
 arch/x86/kernel/cpu/mtrr/generic.c| 677 --
 arch/x86/kernel/cpu/mtrr/legacy.c |  90 +++
 arch/x86/kernel/cpu/mtrr/mtrr.c   | 195 ++---
 arch/x86/kernel/cpu/mtrr/mtrr.h   |  18 +-
 arch/x86/kernel/setup.c   |   2 +
 arch/x86/mm/pgtable.c |  24 +-
 arch/x86/xen/enlighten_pv.c   |  52 ++
 16 files changed, 721 insertions(+), 433 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mtrr/legacy.c

-- 
2.35.3

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Olaf Hering

Tue, 2 May 2023 13:33:13 +0200 Olaf Hering :

> I will investigate why it failed to build for me.

This happens if one builds first with the Tumbleweed container, and later with 
the Leap container, without a 'git clean -dffx' in between.

Is there a way to invalidate everything if the toolchain changes?


Olaf


pgpQqBruBn6oh.pgp
Description: Digitale Signatur von OpenPGP

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Andrew Cooper

On 02/05/2023 12:33 pm, Olaf Hering wrote:
> Tue, 2 May 2023 09:31:56 +0200 Jan Beulich :
>
>> How does 2.37 vs 2.39 matter? CET-IBT support is present in gas as of 2.29.
> I have no idea. It turned out, the previous Leap image was based on 15.3, 
> while the current one will be 15.4.
>
> If I run this manually, it appears the error is produced properly:
>
> gcc -Wall -fcf-protection=branch -mmanual-endbr 
> -mindirect-branch=thunk-extern -c -x assembler -o /dev/null - ; echo $?
> gcc: error: unrecognized command line option ‘-fcf-protection=branch’; did 
> you mean ‘-fno-protect-parens’?
> gcc: error: unrecognized command line option ‘-mmanual-endbr’
> 1
>
> An for some reason there is no failure with the refreshed image on gitlab:
>
> https://gitlab.com/xen-project/xen/-/jobs/4210269545/artifacts/external_file/build.log
>
> I will investigate why it failed to build for me.

CET-IBT is far more dependent on the compiler, than it is on binutils.

The minimum version of GCC necessary is 9, but if you've backported the
requisite options then an older GCC will work too.

~Andrew

Re: [PATCH v3 1/2] acpi: Make TPM version configurable.

2023-05-02 Thread Jan Beulich

On 25.04.2023 19:47, Jennifer Herbert wrote:
> This patch makes the TPM version, for which the ACPI libary probes, 
> configurable.
> If acpi_config.tpm_verison is set to 1, it indicates that 1.2 (TCPA) should 
> be probed.
> I have also added to hvmloader an option to allow setting this new config, 
> which can
> be triggered by setting the platform/tpm_version xenstore key.
> 
> Signed-off-by: Jennifer Herbert 
> ---
>  docs/misc/xenstore-paths.pandoc |  9 +
>  tools/firmware/hvmloader/util.c | 19 ++---
>  tools/libacpi/build.c   | 69 +++--
>  tools/libacpi/libacpi.h |  3 +-
>  4 files changed, 64 insertions(+), 36 deletions(-)

Please can you get used to providing a brief rev log somewhere here?

> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -994,13 +994,22 @@ void hvmloader_acpi_build_tables(struct acpi_config 
> *config,
>  if ( !strncmp(xenstore_read("platform/acpi_laptop_slate", "0"), "1", 1)  
> )
>  config->table_flags |= ACPI_HAS_SSDT_LAPTOP_SLATE;
>  
> -config->table_flags |= (ACPI_HAS_TCPA | ACPI_HAS_IOAPIC |
> -ACPI_HAS_WAET | ACPI_HAS_PMTIMER |
> -ACPI_HAS_BUTTONS | ACPI_HAS_VGA |
> -ACPI_HAS_8042 | ACPI_HAS_CMOS_RTC);
> +config->table_flags |= (ACPI_HAS_IOAPIC | ACPI_HAS_WAET |
> +ACPI_HAS_PMTIMER | ACPI_HAS_BUTTONS |
> +ACPI_HAS_VGA | ACPI_HAS_8042 |
> +ACPI_HAS_CMOS_RTC);
>  config->acpi_revision = 4;
>  
> -config->tis_hdr = (uint16_t *)ACPI_TIS_HDR_ADDRESS;
> +s = xenstore_read("platform/tpm_version", "1");
> +config->tpm_version = strtoll(s, NULL, 0);

Due to field width, someone specifying 257 will also get a 1.2 TPM,
if I'm not mistaken.

> +switch( config->tpm_version )

Nit: Style (missing blank).

> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -409,38 +409,47 @@ static int construct_secondary_tables(struct acpi_ctxt 
> *ctxt,
>  memcpy(ssdt, ssdt_laptop_slate, sizeof(ssdt_laptop_slate));
>  table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt);
>  }
> -
> -/* TPM TCPA and SSDT. */
> -if ( (config->table_flags & ACPI_HAS_TCPA) &&
> - (config->tis_hdr[0] != 0 && config->tis_hdr[0] != 0x) &&
> - (config->tis_hdr[1] != 0 && config->tis_hdr[1] != 0x) )
> +/* TPM and its SSDT. */
> +if ( config->table_flags & ACPI_HAS_TPM )
>  {
> -ssdt = ctxt->mem_ops.alloc(ctxt, sizeof(ssdt_tpm), 16);
> -if (!ssdt) return -1;
> -memcpy(ssdt, ssdt_tpm, sizeof(ssdt_tpm));
> -table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt);
> -
> -tcpa = ctxt->mem_ops.alloc(ctxt, sizeof(struct acpi_20_tcpa), 16);
> -if (!tcpa) return -1;
> -memset(tcpa, 0, sizeof(*tcpa));
> -table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, tcpa);
> -
> -tcpa->header.signature = ACPI_2_0_TCPA_SIGNATURE;
> -tcpa->header.length= sizeof(*tcpa);
> -tcpa->header.revision  = ACPI_2_0_TCPA_REVISION;
> -fixed_strcpy(tcpa->header.oem_id, ACPI_OEM_ID);
> -fixed_strcpy(tcpa->header.oem_table_id, ACPI_OEM_TABLE_ID);
> -tcpa->header.oem_revision = ACPI_OEM_REVISION;
> -tcpa->header.creator_id   = ACPI_CREATOR_ID;
> -tcpa->header.creator_revision = ACPI_CREATOR_REVISION;
> -if ( (lasa = ctxt->mem_ops.alloc(ctxt, ACPI_2_0_TCPA_LAML_SIZE, 16)) 
> != NULL )
> +switch ( config->tpm_version )
>  {
> -tcpa->lasa = ctxt->mem_ops.v2p(ctxt, lasa);
> -tcpa->laml = ACPI_2_0_TCPA_LAML_SIZE;
> -memset(lasa, 0, tcpa->laml);
> -set_checksum(tcpa,
> - offsetof(struct acpi_header, checksum),
> - tcpa->header.length);
> +case 0: /* Assume legacy code wanted tpm 1.2 */

Along the lines of what Jason said: Unless this is known to be needed for
anything, I'd prefer if it was omitted.

Jan

Re: HAS_CC_CET_IBT misdetected

2023-05-02 Thread Olaf Hering

Tue, 2 May 2023 09:31:56 +0200 Jan Beulich :

> How does 2.37 vs 2.39 matter? CET-IBT support is present in gas as of 2.29.

I have no idea. It turned out, the previous Leap image was based on 15.3, while 
the current one will be 15.4.

If I run this manually, it appears the error is produced properly:

gcc -Wall -fcf-protection=branch -mmanual-endbr -mindirect-branch=thunk-extern 
-c -x assembler -o /dev/null - ; echo $?
gcc: error: unrecognized command line option ‘-fcf-protection=branch’; did you 
mean ‘-fno-protect-parens’?
gcc: error: unrecognized command line option ‘-mmanual-endbr’
1

An for some reason there is no failure with the refreshed image on gitlab:

https://gitlab.com/xen-project/xen/-/jobs/4210269545/artifacts/external_file/build.log

I will investigate why it failed to build for me.


Olaf


pgp0RK4oaSuzU.pgp
Description: Digitale Signatur von OpenPGP

Re: [PATCH RFC] SUPPORT.md: Make all security support explicit

2023-05-02 Thread Jan Beulich

On 28.04.2023 10:14, George Dunlap wrote:
> On Fri, Apr 28, 2023 at 9:12 AM George Dunlap  wrote:
> It occurred to me that in many (most? all?) cases it would be more
> effective to define the security support parameters in the
> documentation itself.

I think I agree; the alternative of needing to look in two places (one
telling the syntax, the other telling whether it's "legitimate" to use)
would be prone to people omitting the 2nd step. And this isn't going to
be meaningfully more work right now: Any option we don't mean to
security-support won't need annotating, i.e. like in SUPPORT.md absence
of an explicit statement would mean "not supported".

While in the examples you list only command line options, I guess the
same could apply to xl.cfg / xl.conf ones? Albeit I notice xl.cfg.5.pod.in
in its title specifically says "syntax" right now, which then may want
changing.

For Kconfig items it's not as clear, because I wouldn't consider the
various Kconfig files "documentation", yet I guess we shouldn't require
people to look at source code.

Jan

Re: [PATCH RFC] SUPPORT.md: Make all security support explicit

2023-05-02 Thread Jan Beulich

On 28.04.2023 10:12, George Dunlap wrote:
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -17,6 +17,36 @@ for the definitions of the support status levels etc.
>  Release Notes
>  :  href="https://wiki.xenproject.org/wiki/Xen_Project_X.YY_Release_Notes";>RN
>  
> +# General security support
> +
> +An XSA will always be issued for security-related bugs which are
> +present in a "plain vanilla" configuration.  A "plain vanilla"
> +configuration is defined as follows:
> +
> +* The Xen hypervisor is built from a tagged release of Xen, or a
> +  commit which was on the tip of one of the supported stable branches.
> +
> +* The Xen hypervisor was built with the default config for the platform
> +
> +* No Xen command-line parameters were specified
> +
> +* No parameters for Xen-related drivers in the Linux kernel were specified
> +
> +* No modifications were made to the default xl.conf
> +
> +* xl.cfg files use only core functionality
> +
> +* Alternate toolstacks only activate functionality activated by the
> +  core functionality of xl.cfg files.
> +
> +Any system outside this configuration will only be considered security
> +supported if the functionality is explicitly listed as supported in
> +this document.
> +
> +If a security-related bug exits only in a configuration listed as not
> +security supported, the security team will generally not issue an XSA;
> +the bug will simply be handled in public.

In this last paragraph, did you perhaps mean "not listed as security
supported"? Otherwise we wouldn't improve our situation, unless I'm
misunderstanding and word order doesn't matter here in English. In which
case some unambiguous wording would need to be found.

Jan

Re: [PATCH v3 2/3] tools: Use new xc function for some xc_domain_getinfo() calls

2023-05-02 Thread Alejandro Vallejo

On Tue, May 02, 2023 at 12:13:37PM +0100, Alejandro Vallejo wrote:
> Move calls that require a information about a single precisely identified
> domain to the new xc_domain_getinfo_single().
> 
> Signed-off-by: Alejandro Vallejo 
> Reviewed-by: Andrew Cooper 
> 
> ---
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: Anthony PERARD 
> Cc: Tim Deegan 
> Cc: George Dunlap 
> Cc: Juergen Gross 
> 
> v3:
>  * Stylistic changes that fell under the cracks on v2
>  * Reinserted -errno convention from v1 that had been
>removed in v2

Mistake here. It's _NOT_ supposed to have that "Reviewed-by"

Re: [PATCH v3 1/3] tools: Modify single-domid callers of xc_domain_getinfolist()

2023-05-02 Thread Alejandro Vallejo

On Tue, May 02, 2023 at 12:13:36PM +0100, Alejandro Vallejo wrote:
> xc_domain_getinfolist() internally relies on a sysctl that performs
> a linear search for the domids. Many callers of xc_domain_getinfolist()
> who require information about a precise domid are much better off calling
> xc_domain_getinfo_single() instead, that will use the getdomaininfo domctl
> instead and ensure the returned domid matches the requested one. The domtctl
> will find the domid faster too, because that uses hashed lists.
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: Anthony PERARD 
> Cc: Juergen Gross 
> Cc: Christian Lindig 
> 
> v3:
>  * Replaced single-domid xc_domain_getinfolist() call in ocaml stub with
>xc_domain_getinfo_single()

My mistake here. It's supposed to have a "R-by: Andrew Cooper"

[PATCH v3 2/3] tools: Use new xc function for some xc_domain_getinfo() calls

2023-05-02 Thread Alejandro Vallejo

Move calls that require a information about a single precisely identified
domain to the new xc_domain_getinfo_single().

Signed-off-by: Alejandro Vallejo 
Reviewed-by: Andrew Cooper 

---
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: Anthony PERARD 
Cc: Tim Deegan 
Cc: George Dunlap 
Cc: Juergen Gross 

v3:
 * Stylistic changes that fell under the cracks on v2
 * Reinserted -errno convention from v1 that had been
   removed in v2

---
 tools/console/client/main.c |  7 ++---
 tools/debugger/kdd/kdd-xen.c|  5 ++--
 tools/libs/ctrl/xc_domain.c |  9 +++---
 tools/libs/ctrl/xc_pagetab.c|  7 ++---
 tools/libs/ctrl/xc_private.c|  9 +++---
 tools/libs/ctrl/xc_private.h|  7 +++--
 tools/libs/guest/xg_core.c  | 23 ++
 tools/libs/guest/xg_core.h  |  6 ++--
 tools/libs/guest/xg_core_arm.c  | 10 +++
 tools/libs/guest/xg_core_x86.c  | 18 +--
 tools/libs/guest/xg_cpuid_x86.c | 40 +
 tools/libs/guest/xg_dom_boot.c  | 16 +++---
 tools/libs/guest/xg_domain.c|  8 ++---
 tools/libs/guest/xg_offline_page.c  | 12 
 tools/libs/guest/xg_private.h   |  1 +
 tools/libs/guest/xg_resume.c| 20 ++---
 tools/libs/guest/xg_sr_common.h |  2 +-
 tools/libs/guest/xg_sr_restore.c| 17 ---
 tools/libs/guest/xg_sr_restore_x86_pv.c |  2 +-
 tools/libs/guest/xg_sr_save.c   | 27 +++--
 tools/libs/guest/xg_sr_save_x86_pv.c|  6 ++--
 tools/libs/light/libxl_sched.c  | 16 +-
 tools/libs/light/libxl_x86_acpi.c   |  4 +--
 tools/misc/xen-hvmcrash.c   |  6 ++--
 tools/misc/xen-lowmemd.c|  6 ++--
 tools/misc/xen-mfndump.c| 22 ++
 tools/misc/xen-vmtrace.c|  6 ++--
 tools/vchan/vchan-socket-proxy.c|  6 ++--
 tools/xenstore/xenstored_domain.c   | 15 +-
 tools/xentrace/xenctx.c |  8 ++---
 30 files changed, 158 insertions(+), 183 deletions(-)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index 1a6fa162f7..6775006488 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -408,17 +408,16 @@ int main(int argc, char **argv)
if (dom_path == NULL)
err(errno, "xs_get_domain_path()");
if (type == CONSOLE_INVAL) {
-   xc_dominfo_t xcinfo;
+   xc_domaininfo_t xcinfo;
xc_interface *xc_handle = xc_interface_open(0,0,0);
if (xc_handle == NULL)
err(errno, "Could not open xc interface");
-   if ( (xc_domain_getinfo(xc_handle, domid, 1, &xcinfo) != 1) ||
-(xcinfo.domid != domid) ) {
+   if (xc_domain_getinfo_single(xc_handle, domid, &xcinfo) < 0) {
xc_interface_close(xc_handle);
err(errno, "Failed to get domain information");
}
/* default to pv console for pv guests and serial for hvm 
guests */
-   if (xcinfo.hvm)
+   if (xcinfo.flags & XEN_DOMINF_hvm_guest)
type = CONSOLE_SERIAL;
else
type = CONSOLE_PV;
diff --git a/tools/debugger/kdd/kdd-xen.c b/tools/debugger/kdd/kdd-xen.c
index e78c9311c4..e63e267023 100644
--- a/tools/debugger/kdd/kdd-xen.c
+++ b/tools/debugger/kdd/kdd-xen.c
@@ -570,7 +570,7 @@ kdd_guest *kdd_guest_init(char *arg, FILE *log, int 
verbosity)
 kdd_guest *g = NULL;
 xc_interface *xch = NULL;
 uint32_t domid;
-xc_dominfo_t info;
+xc_domaininfo_t info;
 
 g = calloc(1, sizeof (kdd_guest));
 if (!g) 
@@ -590,7 +590,8 @@ kdd_guest *kdd_guest_init(char *arg, FILE *log, int 
verbosity)
 g->domid = domid;
 
 /* Check that the domain exists and is HVM */
-if (xc_domain_getinfo(xch, domid, 1, &info) != 1 || !info.hvm)
+if (xc_domain_getinfo_single(xch, domid, &info) < 0 ||
+!(info.flags & XEN_DOMINF_hvm_guest))
 goto err;
 
 snprintf(g->id, (sizeof g->id) - 1, 
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index d5f0923088..66179e6f12 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1960,15 +1960,14 @@ int xc_domain_memory_mapping(
 uint32_t add_mapping)
 {
 DECLARE_DOMCTL;
-xc_dominfo_t info;
+xc_domaininfo_t info;
 int ret = 0, rc;
 unsigned long done = 0, nr, max_batch_sz;
 
-if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ||
- info.domid != domid )
+if ( xc_domain_getinfo_single(xch, domid, &info) < 0 )
 {
-PERROR("Could not get info for domain");
-return -EINVAL;
+PERROR("Could not get info for dom%u", domid);
+return -1;
 }
 if ( !xc_core_arch_auto_translated_physmap(&info) )
 return 0;
diff -

[PATCH v3 3/3] domctl: Modify XEN_DOMCTL_getdomaininfo to fail if domid is not found

2023-05-02 Thread Alejandro Vallejo

It previously mimicked the getdomaininfo sysctl semantics by returning
the first domid higher than the requested domid that does exist. This
unintuitive behaviour causes quite a few mistakes and makes the call
needlessly slow in its error path.

This patch removes the fallback search, returning -ESRCH if the requested
domain doesn't exist. Domain discovery can still be done through the sysctl
interface as that performs a linear search on the list of domains.

With this modification the xc_domain_getinfo() function is deprecated and
removed to make sure it's not mistakenly used expecting the old behaviour.
The new xc wrapper is xc_domain_getinfo_single().

All previous callers of xc_domain_getinfo() have been updated to use
xc_domain_getinfo_single() or xc_domain_getinfolist() instead. This also
means xc_dominfo_t is no longer used by anything and can be purged.

Resolves: xen-project/xen#105
Signed-off-by: Alejandro Vallejo 
Reviewed-by: Andrew Cooper 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Stefano Stabellini 
Cc: Wei Liu 
Cc: Anthony PERARD 
Cc: Juergen Gross 

v3:
 * No changes

---
 tools/include/xenctrl.h | 43 --
 tools/libs/ctrl/xc_domain.c | 73 -
 xen/common/domctl.c | 32 +---
 3 files changed, 2 insertions(+), 146 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 752fc87580..08a15c5911 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -444,28 +444,6 @@ typedef struct xc_core_header {
  * DOMAIN MANAGEMENT FUNCTIONS
  */
 
-typedef struct xc_dominfo {
-uint32_t  domid;
-uint32_t  ssidref;
-unsigned int  dying:1, crashed:1, shutdown:1,
-  paused:1, blocked:1, running:1,
-  hvm:1, debugged:1, xenstore:1, hap:1;
-unsigned int  shutdown_reason; /* only meaningful if shutdown==1 */
-unsigned long nr_pages; /* current number, not maximum */
-unsigned long nr_outstanding_pages;
-unsigned long nr_shared_pages;
-unsigned long nr_paged_pages;
-unsigned long shared_info_frame;
-uint64_t  cpu_time;
-unsigned long max_memkb;
-unsigned int  nr_online_vcpus;
-unsigned int  max_vcpu_id;
-xen_domain_handle_t handle;
-unsigned int  cpupool;
-uint8_t   gpaddr_bits;
-struct xen_arch_domainconfig arch_config;
-} xc_dominfo_t;
-
 typedef xen_domctl_getdomaininfo_t xc_domaininfo_t;
 
 static inline unsigned int dominfo_shutdown_reason(const xc_domaininfo_t *info)
@@ -721,27 +699,6 @@ int xc_domain_getinfo_single(xc_interface *xch,
  uint32_t domid,
  xc_domaininfo_t *info);
 
-/**
- * This function will return information about one or more domains. It is
- * designed to iterate over the list of domains. If a single domain is
- * requested, this function will return the next domain in the list - if
- * one exists. It is, therefore, important in this case to make sure the
- * domain requested was the one returned.
- *
- * @parm xch a handle to an open hypervisor interface
- * @parm first_domid the first domain to enumerate information from.  Domains
- *   are currently enumerate in order of creation.
- * @parm max_doms the number of elements in info
- * @parm info an array of max_doms size that will contain the information for
- *the enumerated domains.
- * @return the number of domains enumerated or -1 on error
- */
-int xc_domain_getinfo(xc_interface *xch,
-  uint32_t first_domid,
-  unsigned int max_doms,
-  xc_dominfo_t *info);
-
-
 /**
  * This function will set the execution context for the specified vcpu.
  *
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index 66179e6f12..724fa6f753 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -357,85 +357,12 @@ int xc_domain_getinfo_single(xc_interface *xch,
 if ( do_domctl(xch, &domctl) < 0 )
 return -1;
 
-if ( domctl.u.getdomaininfo.domain != domid )
-{
-errno = ESRCH;
-return -1;
-}
-
 if ( info )
 *info = domctl.u.getdomaininfo;
 
 return 0;
 }
 
-int xc_domain_getinfo(xc_interface *xch,
-  uint32_t first_domid,
-  unsigned int max_doms,
-  xc_dominfo_t *info)
-{
-unsigned int nr_doms;
-uint32_t next_domid = first_domid;
-DECLARE_DOMCTL;
-int rc = 0;
-
-memset(info, 0, max_doms*sizeof(xc_dominfo_t));
-
-for ( nr_doms = 0; nr_doms < max_doms; nr_doms++ )
-{
-domctl.cmd = XEN_DOMCTL_getdomaininfo;
-domctl.domain = next_domid;
-if ( (rc = do_domctl(xch, &domctl)) < 0 )
-break;
-info->domid  = domctl.domain;
-
-info->dying= !!(domctl.u.getdomaininfo.flags&XEN_DOMINF_dying);
-info->shutdown =

[PATCH v3 0/3] Rationalize usage of xc_domain_getinfo{,list}()

2023-05-02 Thread Alejandro Vallejo

The first 4 patches of v2 already made it to staging. This is a corrected
repost of the 3 remaining ones.

Original cover letter:

xc_domain_getinfo() returns the list of domains with domid >= first_domid.
It does so by repeatedly invoking XEN_DOMCTL_getdomaininfo, which leads to
unintuitive behaviour (asking for domid=1 might succeed returning domid=2).
Furthermore, N hypercalls are required whereas the equivalent functionality
can be achieved using with XEN_SYSCTL_getdomaininfo.

Ideally, we want a DOMCTL interface that operates over a single precisely
specified domain and a SYSCTL interface that can be used for bulk queries.

All callers of xc_domain_getinfo() that are better off using SYSCTL are
migrated to use that instead. That includes callers performing domain
discovery and those requesting info for more than 1 domain per hypercall.

A new xc_domain_getinfo_single() is introduced with stricter semantics than
xc_domain_getinfo() (failing if domid isn't found) to migrate the rest to.

With no callers left the xc_dominfo_t structure and the xc_domain_getinfo()
call itself can be cleanly removed, and the DOMCTL interface simplified to
only use its fastpath.

With the DOMCTL ammended, the new xc_domain_getinfo_single() drops its
stricter check, becoming a simple wrapper to invoke the hypercall itself.

Alejandro Vallejo (3):
  tools: Modify single-domid callers of xc_domain_getinfolist()
  tools: Use new xc function for some xc_domain_getinfo() calls
  domctl: Modify XEN_DOMCTL_getdomaininfo to fail if domid is not found

 tools/console/client/main.c |  7 +--
 tools/debugger/kdd/kdd-xen.c|  5 +-
 tools/include/xenctrl.h | 43 -
 tools/libs/ctrl/xc_domain.c | 82 ++---
 tools/libs/ctrl/xc_pagetab.c|  7 +--
 tools/libs/ctrl/xc_private.c|  9 +--
 tools/libs/ctrl/xc_private.h|  7 ++-
 tools/libs/guest/xg_core.c  | 23 +++
 tools/libs/guest/xg_core.h  |  6 +-
 tools/libs/guest/xg_core_arm.c  | 10 +--
 tools/libs/guest/xg_core_x86.c  | 18 +++---
 tools/libs/guest/xg_cpuid_x86.c | 40 ++--
 tools/libs/guest/xg_dom_boot.c  | 16 ++---
 tools/libs/guest/xg_domain.c|  8 +--
 tools/libs/guest/xg_offline_page.c  | 12 ++--
 tools/libs/guest/xg_private.h   |  1 +
 tools/libs/guest/xg_resume.c| 20 +++---
 tools/libs/guest/xg_sr_common.h |  2 +-
 tools/libs/guest/xg_sr_restore.c| 17 ++---
 tools/libs/guest/xg_sr_restore_x86_pv.c |  2 +-
 tools/libs/guest/xg_sr_save.c   | 27 
 tools/libs/guest/xg_sr_save_x86_pv.c|  6 +-
 tools/libs/light/libxl_dom.c| 17 ++---
 tools/libs/light/libxl_dom_suspend.c|  7 +--
 tools/libs/light/libxl_domain.c | 13 ++--
 tools/libs/light/libxl_mem.c|  4 +-
 tools/libs/light/libxl_sched.c  | 26 
 tools/libs/light/libxl_x86_acpi.c   |  4 +-
 tools/misc/xen-hvmcrash.c   |  6 +-
 tools/misc/xen-lowmemd.c|  6 +-
 tools/misc/xen-mfndump.c| 22 +++
 tools/misc/xen-vmtrace.c|  6 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c |  6 +-
 tools/vchan/vchan-socket-proxy.c|  6 +-
 tools/xenpaging/xenpaging.c | 10 +--
 tools/xenstore/xenstored_domain.c   | 15 +++--
 tools/xentrace/xenctx.c |  8 +--
 xen/common/domctl.c | 32 +-
 38 files changed, 184 insertions(+), 372 deletions(-)

-- 
2.34.1

1 2 >

1 - 100 of 138 matches

Mail list logo