date:20210809

flight 164144 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164144/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-arm64-arm64-xl-thunderx 14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1  13 debian-fixup fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2  13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm 13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm  14 guest-startfail in 164138 REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stopfail in 164143 REGR. vs. 152332

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-qemuu-freebsd11-amd64 21 guest-start/freebsd.repeat fail in 
164138 pass in 164144
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 21 leak-check/check fail in 
164138 pass in 164144
 test-amd64-amd64-xl-shadow   14 guest-start  fail in 164143 pass in 164144
 test-arm64-arm64-xl-thunderx 13 debian-fixup fail in 164143 pass in 164144
 test-arm64-arm64-xl-xsm  13 debian-fixup   fail pass in 164138
 test-amd64-amd64-amd64-pvgrub 19 guest-localmigrate/x10fail pass in 164143
 test-amd64-amd64-examine  4 memdisk-try-append fail pass in 164143
 test-armhf-armhf-xl-vhd  13 guest-startfail pass in 164143

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-vhd 14 migrate-support-check fail in 164143 never pass
 test-armhf-armhf-xl-vhd 15 saverestore-support-check fail in 164143 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152332
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-c

Re: [PATCH] xen-blkfront: Remove redundant assignment to variable err

2021-08-09 Thread Jens Axboe

On 8/6/21 5:06 AM, Colin King wrote:
> From: Colin Ian King 
> 
> The variable err is being assigned a value that is never read, the
> assignment is redundant and can be removed.

Added for 5.15, thanks.

-- 
Jens Axboe

Re: [PATCH] xen-blkfront: Remove redundant assignment to variable err

2021-08-09 Thread Boris Ostrovsky



On 8/6/21 7:06 AM, Colin King wrote:
> From: Colin Ian King 
>
> The variable err is being assigned a value that is never read, the
> assignment is redundant and can be removed.
>
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 


Reviewed-by: Boris Ostrovsky

Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-09 Thread Dave Hansen

On 8/9/21 10:56 AM, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.

>From an x86/mm perspective:

Acked-by: Dave Hansen 

A tiny nit:

> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>  
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;
> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;
> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
This might be worthwhile to move to a header.  That ensures that
hv_is_isolation_supported() use can avoid even a function call.  But, I
see this is used in modules and its use here is also in a slow path, so
it's not a big deal

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space

On 09.08.21 23:45, Julien Grall wrote:

Hi Julien

On 09/08/2021 19:24, Oleksandr wrote:

On 09.08.21 18:42, Julien Grall wrote:

Hi Oleksandr,

Hi Julien.

Thank you for the input.

On 07/08/2021 18:03, Oleksandr wrote:

On 06.08.21 03:30, Stefano Stabellini wrote:

Hi Stefano

On Wed, 4 Aug 2021, Julien Grall wrote:
+#define GUEST_SAFE_RANGE_BASE xen_mk_ullong(0xDE) /*
128GB */

+#define GUEST_SAFE_RANGE_SIZE xen_mk_ullong(0x02)

While the possible new DT bindings has not been agreed yet, I
re-used
existing "reg" property under the hypervisor node to pass safe
range as a

second region,
https://elixir.bootlin.com/linux/v5.14-rc4/source/Documentation/devicetree/bindings/arm/xen.txt#L10:

So a single region works for a guest today, but for dom0 we will
need multiple
regions because it is may be difficult to find enough contiguous
space for a

single region.

That said, as dom0 is mapped 1:1 (including some guest mapping),
there is also
the question where to allocate the safe region. For grant table,
we so far
re-use the Xen address space because it is assumed it will space
will always

be bigger than the grant table.

I am not sure yet where we could allocate the safe regions.
Stefano, do you

have any ideas?

The safest choice would be the address range corresponding to memory
(/memory) not already allocated to Dom0.

For instance from my last boot logs:
(XEN) Allocating 1:1 mappings totalling 1600MB for dom0:
(XEN) BANK[0] 0x001000-0x007000 (1536MB)
(XEN) BANK[1] 0x007800-0x007c00 (64MB)

All the other ranges could be given as unallocated space:

- 0x0 - 0x1000
- 0x7000 - 0x7800
- 0x8__ - 0x8_8000_

Thank you for the ideas.

If I got the idea correctly, yes, as these ranges represent the
real RAM, so no I/O would be in conflict with them and as the
result - no overlaps would be expected.
But, I wonder, would this work if we have IOMMU enabled for Dom0
and need to establish 1:1 mapping for the DMA devices to work with
grant mappings...
In arm_iommu_map_page() we call guest_physmap_add_entry() with gfn
= mfn, so the question is could we end up with this new gfn
replacing the valid mapping

(with gfn allocated from the safe region)?

Right, when we enable the IOMMU for dom0, Xen will add an extra
mapping with GFN == MFN for foreign and grant pages. This is because
Linux is not aware that whether a device is protected by an IOMMU.
Therefore it is assuming it is not and will use the MFN to configure
for DMA transaction.

We can't remove the mapping without significant changes in Linux and
Xen. I would not mandate them for this work.

That said, I think it would be acceptable to have different way to
find the region depending on the dom0 configuration. So we could use
the RAM not used by dom0 when the IOMMU is turned off.

The second best choice would be an hole: an address range not used by
anybody else (no reg property) and also not even mappable by a bus
(not
covered by a ranges property). This is not the best choice because
there

can cases where physical resources appear afterwards.

Are you saying that the original device-tree doesn't even describe
them in any way (i.e. reserved...)?

Unfortunately, yes.

So the decision where the safe region is located will be done by
Xen. There is no involvement of the domain (it will discover the
region from the DT). Therefore, I don't think we need to think about
everything right now as we could adapt this is exact region is not
part of the stable ABI.

The hotplug is one I would defer because this is not supported (and
quite likely not working) in Xen upstream today.

Sounds reasonable.

Now regarding the case where dom0 is using the IOMMU. The assumption
is Xen will be able to figure out all the regions used from the
firmware table (ACPI or DT).

AFAIK, this assumption would be correct for DT. However, for ACPI, I
remember we were not able to find all the MMIOs region in Xen (see
[1] and [2]). So even this solution would not work for ACPI.

If I am not mistaken, we don't support IOMMU with ACPI yet. So we
could defer the problem to when this is going to be supported.

Sounds reasonable.

To summarize:

0. Skip ACPI case for now, implement for DT case

Just to be clear, I suggested to skip it when the IOMMU is enabled
with ACPI. We should still support the case without IOMMU. The
implementation would be the same as 2.

yes, sorry for not being precise

1. If IOMMU is enabled for Dom0 -> provide holes found in Host DT as
safe ranges

I would take into the account holes >= 1MB.

May I ask why 1MB?

Nothing special, just thinking to not bother with small regions which
would not be too useful overall, but could bloat resulting reg property.

Anyway, I would be ok with any sizes.

I am wondering, do we need a special alignment here other than a
PAGE_SIZE?

It needs to be 64KB aligned so a guest using

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space

On 09/08/2021 19:24, Oleksandr wrote:

On 09.08.21 18:42, Julien Grall wrote:

Hi Oleksandr,

Hi Julien.

Thank you for the input.

On 07/08/2021 18:03, Oleksandr wrote:

On 06.08.21 03:30, Stefano Stabellini wrote:

Hi Stefano

On Wed, 4 Aug 2021, Julien Grall wrote:
+#define GUEST_SAFE_RANGE_BASE xen_mk_ullong(0xDE) /*
128GB */

+#define GUEST_SAFE_RANGE_SIZE xen_mk_ullong(0x02)

While the possible new DT bindings has not been agreed yet, I re-used
existing "reg" property under the hypervisor node to pass safe
range as a

second region,
https://elixir.bootlin.com/linux/v5.14-rc4/source/Documentation/devicetree/bindings/arm/xen.txt#L10:

So a single region works for a guest today, but for dom0 we will
need multiple
regions because it is may be difficult to find enough contiguous
space for a

single region.

That said, as dom0 is mapped 1:1 (including some guest mapping),
there is also
the question where to allocate the safe region. For grant table, we
so far
re-use the Xen address space because it is assumed it will space
will always

be bigger than the grant table.

I am not sure yet where we could allocate the safe regions.
Stefano, do you

have any ideas?

The safest choice would be the address range corresponding to memory
(/memory) not already allocated to Dom0.

For instance from my last boot logs:
(XEN) Allocating 1:1 mappings totalling 1600MB for dom0:
(XEN) BANK[0] 0x001000-0x007000 (1536MB)
(XEN) BANK[1] 0x007800-0x007c00 (64MB)

All the other ranges could be given as unallocated space:

- 0x0 - 0x1000
- 0x7000 - 0x7800
- 0x8__ - 0x8_8000_

Thank you for the ideas.

If I got the idea correctly, yes, as these ranges represent the real
RAM, so no I/O would be in conflict with them and as the result - no
overlaps would be expected.
But, I wonder, would this work if we have IOMMU enabled for Dom0 and
need to establish 1:1 mapping for the DMA devices to work with grant
mappings...
In arm_iommu_map_page() we call guest_physmap_add_entry() with gfn =
mfn, so the question is could we end up with this new gfn replacing
the valid mapping

(with gfn allocated from the safe region)?

We can't remove the mapping without significant changes in Linux and
Xen. I would not mandate them for this work.

That said, I think it would be acceptable to have different way to
find the region depending on the dom0 configuration. So we could use
the RAM not used by dom0 when the IOMMU is turned off.

The second best choice would be an hole: an address range not used by
anybody else (no reg property) and also not even mappable by a bus (not
covered by a ranges property). This is not the best choice because
there

can cases where physical resources appear afterwards.

Are you saying that the original device-tree doesn't even describe
them in any way (i.e. reserved...)?

Unfortunately, yes.

So the decision where the safe region is located will be done by Xen.
There is no involvement of the domain (it will discover the region
from the DT). Therefore, I don't think we need to think about
everything right now as we could adapt this is exact region is not
part of the stable ABI.

The hotplug is one I would defer because this is not supported (and
quite likely not working) in Xen upstream today.

Sounds reasonable.

Now regarding the case where dom0 is using the IOMMU. The assumption
is Xen will be able to figure out all the regions used from the
firmware table (ACPI or DT).

AFAIK, this assumption would be correct for DT. However, for ACPI, I
remember we were not able to find all the MMIOs region in Xen (see [1]
and [2]). So even this solution would not work for ACPI.

If I am not mistaken, we don't support IOMMU with ACPI yet. So we
could defer the problem to when this is going to be supported.

Sounds reasonable.

To summarize:

0. Skip ACPI case for now, implement for DT case

Just to be clear, I suggested to skip it when the IOMMU is enabled with
ACPI. We should still support the case without IOMMU. The implementation
would be the same as 2.

1. If IOMMU is enabled for Dom0 -> provide holes found in Host DT as
safe ranges

I would take into the account holes >= 1MB.

May I ask why 1MB?

I am wondering, do we need a
special alignment here other than a PAGE_SIZE?

It needs to be 64KB aligned so a guest using 64KB pages can use it.

2. If IOMMU is disabled for Dom0 -> provide RAM which not assigned to
Dom0 as safe ranges

We could even provide holes here as well.

I would rather not. We likely need hack for the hotplug case. So I want
to keep them contained to IOMMU unless there is

Re: NULL scheduler DoS





On 09/08/2021 18:35, Julien Grall wrote:



On 09/08/2021 17:19, Ahmed, Daniele wrote:

Hi all,


Hi Daniele,

Thank you for the report!

The NULL scheduler is affected by an issue that triggers an assertion 
and reboots the hypervisor.


This issue arise when:

  * a guest is being created with a configuration specifying a file that
    does not exist
  * the hypervisor boots with the null scheduler

4.16 is affected and 4.15 also.

This is the stack trace from 4.16:

(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) [ Xen-4.16-unstable x86_64 debug=y Not tainted ]
(XEN) CPU: 3
(XEN) RIP: e008:[] 
common/sched/null.c#unit_deassign+0x1c3/0x2ec

(XEN) RFLAGS: 00010006 CONTEXT: hypervisor
(XEN) rax: 83005ce1c850 rbx: 0001 rcx: 0001
(XEN) rdx: 83007fde6fc0 rsi: 83005ce1c790 rdi: 83007ffb7850
(XEN) rbp: 83007ffdfda0 rsp: 83007ffdfd48 r8: 
(XEN) r9: 00048fee r10:  r11: 
(XEN) r12: 82d0405c9298 r13: 83007f7fd508 r14: 83005ce1c850
(XEN) r15: 82d0405e2680 cr0: 8005003b cr4: 003526e0
(XEN) cr3: 7f6b3000 cr2: 888072e79dc0
(XEN) fsb:  gsb: 888071ac gss: 
(XEN) ds: 002b es: 002b fs:  gs:  ss: e010 cs: e008
(XEN) Xen code around  
(common/sched/null.c#unit_deassign+0x1c3/0x2ec):
(XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f 
b7 00 66

(XEN) Xen stack trace from rsp=83007ffdfd48:
(XEN) 83007ffdfd88 82d04023961c 0004 83005ce1cc50
(XEN) 0002 83007ffdfd90 83005ce1c790 82d0405c9298
(XEN) 83007f7fd508 83005ce1c850 82d0405e2680 83007ffdfde0
(XEN) 82d04024f889 83007ffb7850 83005dd63000 83005ce1c790
(XEN) 83005845ab28 83005845a000  83007ffdfe00
(XEN) 82d040253326 83005dd63000  83007ffdfe38
(XEN) 82d04020506b 83007a881080  
(XEN)  82d0405d6f80 83007ffdfe70 82d04022d9e5
(XEN) 00110003 82d0405cf100 82d0405cf100 
(XEN) 82d0405cef80 83007ffdfea8 82d04022e14b 0003
(XEN) 82d0405cf100 7fff 0003 0003
(XEN) 83007ffdfeb8 82d04022e1e6 83007ffdfef0 82d0403172b4
(XEN) 82d04031721d 83007fec1000 83007ffb6000 0003
(XEN) 83007ffcc000 83007ffdfe18  
(XEN)   0003 0003
(XEN) 0246 0003  1bf9dde5
(XEN)  810023aa 0003 deadbeefdeadf00d
(XEN) deadbeefdeadf00d 0100 810023aa e033
(XEN) 0246 c900400a3ea8 e02b 7ffdff707fffd140
(XEN) 00017fe37a6c 7ffe8010  e013
(XEN) Xen call trace:
(XEN) [] R 
common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) [] F 
common/sched/null.c#null_unit_remove+0xfc/0x136

(XEN) [] F sched_destroy_vcpu+0xca/0x199
(XEN) [] F 
common/domain.c#complete_domain_destroy+0x68/0x13f
(XEN) [] F 
common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b

(XEN) [] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN) [] F do_softirq+0x13/0x15
(XEN) [] F arch/x86/domain.c#idle_loop+0x97/0xee
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 3:
(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

This is the line of the assertion that triggers the reboot: 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377 
 



To reproduce the vulnerability, I took the following steps:


Just to make clear for the others in the thread, per SUPPORT.MD, the 
NULL scheduler is not security supported. Hence why this is sent to 
xen-devel directly.


Also, for completeness, debug build are also not security supported. On 
production build, the ASSERT() would be turned to a NOP which could 
result to potentially more interesting issue. Anyway, that's not a 
problem here. :)




  * Install XEN; only 4.15+ seem to be vulnerable
  * Use the null scheduler (depends on your setup): edit
    /etc/default/grub adding at the end of the file:
    GRUB_CMDLINE_XEN="sched=null" and update grub
  * Reboot into xen
  * Create a file guest.cfg with the following contents

name="guest"
builder="hvm"
memory=512

serial = [ 'file:/tmp/log', 'pty' ]

disk = [ '/home/user/boot.iso,,hdc,cdrom' ]

on_reboot = "destroy"

vcpus=1


Make sure that the file //home/user/boot.iso/ does not exist

  * Create a guest wit

[linux-linus test] 164143: regressions - FAIL

flight 164143 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164143/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-arm64-arm64-xl-thunderx 13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1  13 debian-fixup fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2  13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm 13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm  14 guest-startfail in 164138 REGR. vs. 152332

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-qemuu-freebsd11-amd64 21 guest-start/freebsd.repeat fail in 
164138 pass in 164143
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 21 leak-check/check fail in 
164138 pass in 164143
 test-amd64-amd64-xl-shadow   14 guest-startfail pass in 164138
 test-arm64-arm64-xl-xsm  13 debian-fixup   fail pass in 164138

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152332
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-suppo

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space

On 09.08.21 18:42, Julien Grall wrote:

Hi Oleksandr,

Hi Julien.

Thank you for the input.

On 07/08/2021 18:03, Oleksandr wrote:

On 06.08.21 03:30, Stefano Stabellini wrote:

Hi Stefano

On Wed, 4 Aug 2021, Julien Grall wrote:
+#define GUEST_SAFE_RANGE_BASE xen_mk_ullong(0xDE) /*
128GB */

+#define GUEST_SAFE_RANGE_SIZE xen_mk_ullong(0x02)

While the possible new DT bindings has not been agreed yet, I re-used
existing "reg" property under the hypervisor node to pass safe
range as a

second region,
https://elixir.bootlin.com/linux/v5.14-rc4/source/Documentation/devicetree/bindings/arm/xen.txt#L10:

So a single region works for a guest today, but for dom0 we will
need multiple
regions because it is may be difficult to find enough contiguous
space for a

single region.

That said, as dom0 is mapped 1:1 (including some guest mapping),
there is also
the question where to allocate the safe region. For grant table, we
so far
re-use the Xen address space because it is assumed it will space
will always

be bigger than the grant table.

I am not sure yet where we could allocate the safe regions.
Stefano, do you

have any ideas?

The safest choice would be the address range corresponding to memory
(/memory) not already allocated to Dom0.

For instance from my last boot logs:
(XEN) Allocating 1:1 mappings totalling 1600MB for dom0:
(XEN) BANK[0] 0x001000-0x007000 (1536MB)
(XEN) BANK[1] 0x007800-0x007c00 (64MB)

All the other ranges could be given as unallocated space:

- 0x0 - 0x1000
- 0x7000 - 0x7800
- 0x8__ - 0x8_8000_

Thank you for the ideas.

If I got the idea correctly, yes, as these ranges represent the real
RAM, so no I/O would be in conflict with them and as the result - no
overlaps would be expected.
But, I wonder, would this work if we have IOMMU enabled for Dom0 and
need to establish 1:1 mapping for the DMA devices to work with grant
mappings...
In arm_iommu_map_page() we call guest_physmap_add_entry() with gfn =
mfn, so the question is could we end up with this new gfn replacing
the valid mapping

(with gfn allocated from the safe region)?

We can't remove the mapping without significant changes in Linux and
Xen. I would not mandate them for this work.

That said, I think it would be acceptable to have different way to
find the region depending on the dom0 configuration. So we could use
the RAM not used by dom0 when the IOMMU is turned off.

The second best choice would be an hole: an address range not used by
anybody else (no reg property) and also not even mappable by a bus (not
covered by a ranges property). This is not the best choice because
there

can cases where physical resources appear afterwards.

Are you saying that the original device-tree doesn't even describe
them in any way (i.e. reserved...)?

Unfortunately, yes.

So the decision where the safe region is located will be done by Xen.
There is no involvement of the domain (it will discover the region
from the DT). Therefore, I don't think we need to think about
everything right now as we could adapt this is exact region is not
part of the stable ABI.

The hotplug is one I would defer because this is not supported (and
quite likely not working) in Xen upstream today.

Sounds reasonable.

Now regarding the case where dom0 is using the IOMMU. The assumption
is Xen will be able to figure out all the regions used from the
firmware table (ACPI or DT).

AFAIK, this assumption would be correct for DT. However, for ACPI, I
remember we were not able to find all the MMIOs region in Xen (see [1]
and [2]). So even this solution would not work for ACPI.

If I am not mistaken, we don't support IOMMU with ACPI yet. So we
could defer the problem to when this is going to be supported.

Sounds reasonable.

To summarize:

0. Skip ACPI case for now, implement for DT case

1. If IOMMU is enabled for Dom0 -> provide holes found in Host DT as
safe ranges

I would take into the account holes >= 1MB. I am wondering, do we need a
special alignment here other than a PAGE_SIZE?

2. If IOMMU is disabled for Dom0 -> provide RAM which not assigned to
Dom0 as safe ranges

We could even provide holes here as well.

Cheers,

[1] https://marc.info/?l=linux-arm-kernel&m=148469169210500&w=2
[2] Xen commit 80f9c316708400cea4417e36337267d3b26591db

--
Regards,

Oleksandr Tyshchenko

Re: [PATCH v2 0/6] PCI: Drop duplicated tracking of a pci_dev's bound driver

2021-08-09 Thread Bjorn Helgaas

On Sat, Aug 07, 2021 at 11:26:45AM +0200, Uwe Kleine-König wrote:
> On Fri, Aug 06, 2021 at 04:24:52PM -0500, Bjorn Helgaas wrote:
> > On Fri, Aug 06, 2021 at 08:46:23AM +0200, Uwe Kleine-König wrote:
> > > On Thu, Aug 05, 2021 at 06:42:34PM -0500, Bjorn Helgaas wrote:
> > 
> > > > I looked at all the bus_type.probe() methods, it looks like pci_dev is
> > > > not the only offender here.  At least the following also have a driver
> > > > pointer in the device struct:
> > > > 
> > > >   parisc_device.driver
> > > >   acpi_device.driver
> > > >   dio_dev.driver
> > > >   hid_device.driver
> > > >   pci_dev.driver
> > > >   pnp_dev.driver
> > > >   rio_dev.driver
> > > >   zorro_dev.driver
> > > 
> > > Right, when I converted zorro_dev it was pointed out that the code was
> > > copied from pci and the latter has the same construct. :-)
> > > See
> > > https://lore.kernel.org/r/20210730191035.1455248-5-u.kleine-koe...@pengutronix.de
> > > for the patch, I don't find where pci was pointed out, maybe it was on
> > > irc only.
> > 
> > Oh, thanks!  I looked to see if you'd done something similar
> > elsewhere, but I missed this one.
> > 
> > > > Looking through the places that care about pci_dev.driver (the ones
> > > > updated by patch 5/6), many of them are ... a little dubious to begin
> > > > with.  A few need the "struct pci_error_handlers *err_handler"
> > > > pointer, so that's probably legitimate.  But many just need a name,
> > > > and should probably be using dev_driver_string() instead.
> > > 
> > > Yeah, I considered adding a function to get the driver name from a
> > > pci_dev and a function to get the error handlers. Maybe it's an idea to
> > > introduce these two and then use to_pci_driver(pdev->dev.driver) for the
> > > few remaining users? Maybe doing that on top of my current series makes
> > > sense to have a clean switch from pdev->driver to pdev->dev.driver?!
> > 
> > I'd propose using dev_driver_string() for these places:
> > 
> >   eeh_driver_name() (could change callers to use dev_driver_string())
> >   bcma_host_pci_probe()
> >   qm_alloc_uacce()
> >   hns3_get_drvinfo()
> >   prestera_pci_probe()
> >   mlxsw_pci_probe()
> >   nfp_get_drvinfo()
> >   ssb_pcihost_probe()
> 
> So the idea is:
> 
>   PCI: Simplify pci_device_remove()
>   PCI: Drop useless check from pci_device_probe()
>   xen/pci: Drop some checks that are always true
> 
> are kept as is as preparation. (Do you want to take them from this v2,
> or should I include them again in v3?)

Easiest if you include them until we merge the series.

> Then convert the list of functions above to use dev_driver_string() in a
> 4th patch.
> 
> > The use in mpt_device_driver_register() looks unnecessary: it's only
> > to get a struct pci_device_id *, which is passed to ->probe()
> > functions that don't need it.
> 
> This is patch #5.
> 
> > The use in adf_enable_aer() looks wrong: it sets the err_handler
> > pointer in one of the adf_driver structs.  I think those structs
> > should be basically immutable, and the drivers that call
> > adf_enable_aer() from their .probe() methods should set
> > ".err_handler = &adf_err_handler" in their static adf_driver
> > definitions instead.
> 
> I don't understand that one without some research, probably this yields
> at least one patch.

Yeah, it's a little messy because you'd have to make adf_err_handler
non-static and add an extern for it.  Sample below.

> > I think that basically leaves these:
> > 
> >   uncore_pci_probe() # .id_table, custom driver "registration"
> >   match_id() # .id_table, arch/x86/kernel/probe_roms.c
> >   xhci_pci_quirks()  # .id_table
> >   pci_error_handlers()   # roll-your-own AER handling, 
> > drivers/misc/cxl/guest.c
> > 
> > I think it would be fine to use to_pci_driver(pdev->dev.driver) for
> > these few.
> 
> Converting these will be patch 7 then and patch 8 can then drop the
> duplicated handling.
> 
> Sounds reasonable?

Sounds good to me.  Thanks for working on this!

Bjorn


diff --git a/drivers/crypto/qat/qat_4xxx/adf_drv.c 
b/drivers/crypto/qat/qat_4xxx/adf_drv.c
index a8805c815d16..75e6c5540523 100644
--- a/drivers/crypto/qat/qat_4xxx/adf_drv.c
+++ b/drivers/crypto/qat/qat_4xxx/adf_drv.c
@@ -310,6 +310,7 @@ static struct pci_driver adf_driver = {
.probe = adf_probe,
.remove = adf_remove,
.sriov_configure = adf_sriov_configure,
+   .err_handler = adf_err_handler,
 };
 
 module_pci_driver(adf_driver);
diff --git a/drivers/crypto/qat/qat_common/adf_aer.c 
b/drivers/crypto/qat/qat_common/adf_aer.c
index d2ae293d0df6..701c3c5f8b9b 100644
--- a/drivers/crypto/qat/qat_common/adf_aer.c
+++ b/drivers/crypto/qat/qat_common/adf_aer.c
@@ -166,7 +166,7 @@ static void adf_resume(struct pci_dev *pdev)
dev_info(&pdev->dev, "Device is up and running\n");
 }
 
-static const struct pci_error_handlers adf_err_handler = {
+const struct pci_error_handlers adf_err_handler = {
.error_detected = adf_error_de

[PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM

From: Tianyu Lan 

Hyper-V Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls dma_map_decrypted()
to mark bounce buffer visible to host and map it in extra
address space. Populate dma memory decrypted ops with hv
map/unmap function.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

The map function vmap_pfn() can't work in the early place
hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
buffer in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 28 ++
 arch/x86/include/asm/mshyperv.h |  2 +
 arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
 drivers/hv/vmbus_drv.c  |  3 ++
 drivers/iommu/hyperv-iommu.c| 65 +
 include/linux/hyperv.h  |  1 +
 6 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index c13ec5560d73..0f05e4d6fc62 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -265,3 +265,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int 
numpages, bool visible)
 
return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index a30c60f189a3..b247739f57ac 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -250,6 +250,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct 
hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
   enum hv_mem_host_visibility visibility);
 int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
 void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
 void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 54f9aa7e8457..43bd031aa332 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 57bbbaa4e8f7..f068e22a5636 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = &vmbus_dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..01e874b3b43a 100644
--- a/dr

[PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver

From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   6 ++
 drivers/net/hyperv/netvsc.c   | 144 +-
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   5 ++
 4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855dff10..862419912bfb 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,8 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
+   u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1730,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd935412853..fc312e5db4d5 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   vunmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   vunmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device 
*net_device, u32 q_idx)
return nvchan->mrc.slots ? 0 : -ENOMEM;
 }
 
+static void *netvsc_remap_buf(void *buf, unsigned long size)
+{
+   unsigned long *pfns;
+   void *vaddr;
+   int i;
+
+   pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
+  GFP_KERNEL);
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
+   + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
 static int netvsc_init_buf(struct hv_device *device,
   struct netvsc_device *net_device,
   const struct netvsc_device_info *device_info)
@@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
unsigned int buf_size;
size_t map_words;
int i, ret = 0;
+   void *vaddr;
 
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->recv_original_buf = net_device->recv_buf;
+   net_device->recv_buf = vaddr;
+   }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->send_original_buf = net_devi

[PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Use dma_map_decrypted() in the swiotlb code, store remap address returned
and use the remap address to copy data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v1:
   * Make swiotlb_init_io_tlb_mem() return error code and return
 error when dma_map_decrypted() fails.
---
 include/linux/swiotlb.h |  4 
 kernel/dma/swiotlb.c| 32 
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f507e3eacbea..584560ecaa8e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and 
store
+ * virtual address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fa81c096c1d..29b6d888ef3b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -176,7 +176,7 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
@@ -194,14 +194,21 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].alloc_size = 0;
}
 
-   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+   mem->vaddr = dma_map_decrypted(vaddr, bytes);
+   if (!mem->vaddr) {
+   pr_err("Failed to decrypt memory.\n");
+   return -ENOMEM;
+   }
+
+   memset(mem->vaddr, 0, bytes);
+   return 0;
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
struct io_tlb_mem *mem;
size_t alloc_size;
+   int ret;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -216,7 +223,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+   ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+   if (ret) {
+   memblock_free(__pa(mem), alloc_size);
+   return ret;
+   }
 
io_tlb_default_mem = mem;
if (verbose)
@@ -304,6 +315,8 @@ int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
struct io_tlb_mem *mem;
+   int size = get_order(struct_size(mem, slots, nslabs));
+   int ret;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -312,12 +325,15 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
nslabs)
if (WARN_ON_ONCE(io_tlb_default_mem))
return -ENOMEM;
 
-   mem = (void *)__get_free_pages(GFP_KERNEL,
-   get_order(struct_size(mem, slots, nslabs)));
+   mem = (void *)__get_free_pages(GFP_KERNEL, size);
if (!mem)
return -ENOMEM;
 
-   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+   ret = swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+   if (ret) {
+   free_pages((unsigned long)mem, size);
+   return ret;
+   }
 
io_tlb_default_mem = mem;
swiotlb_print_info();
@@ -360,7 +376,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t 
tlb_addr, size_t size
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem-

[PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 68 +++---
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 328bb961c281..78320719bdd8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -427,6 +429,8 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   u32 hvpg_count;
+   struct hv_dma_range *dma_range;
 };
 
 
@@ -509,6 +513,14 @@ struct storvsc_scan_work {
u8 tgt_id;
 };
 
+#define storvsc_dma_map(dev, page, offset, size, dir) \
+   dma_map_page(dev, page, offset, size, dir)
+
+#define storvsc_dma_unmap(dev, dma_range, dir) \
+   dma_unmap_page(dev, dma_range.dma,  \
+  dma_range.mapping_size,  \
+  dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
+
 static void storvsc_device_scan(struct work_struct *work)
 {
struct storvsc_scan_work *wrk;
@@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
struct hv_device *device;
struct storvsc_device *stor_device;
struct Scsi_Host *shost;
+   int i;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
}
 
+   if (request->dma_range) {
+   for (i = 0; i < request->hvpg_count; i++)
+   storvsc_dma_unmap(&device->device,
+   request->dma_range[i],
+   
request->vstor_packet.vm_srb.data_in == READ_TYPE);
+
+   kfree(request->dma_range);
+   }
+
storvsc_on_receive(stor_device, packet, request);
continue;
}
@@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
+   dma_addr_t dma;
u64 hvpfn;
+   u32 size;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   cmd_request->dma_range = kcalloc(hvpg_count,
+sizeof(*cmd_request->dma_range),
+GFP_ATOMIC);
+   if (!cmd_request->dma_range) {
+   ret = -ENOMEM;
+   goto free_payload;
+   }
 
for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
/*
@@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * last sgl should be reached at the same time that
 * the PFN array is filled.
 */
-   while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   while (hvpfns_to_add--) {
+   size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
+  (unsigned long)length);
+   dma = storvsc_dma_map(&dev->device, 
pfn_to_page(hvpfn++),
+ offset_in_hvpg, size,
+ scmnd->sc_data_direction);
+   if (dma_mapping_error(&dev->device, dma)) {
+   ret = -ENOMEM;
+   goto free_dma_range;
+   }
+
+   if (offset_in_hvpg) {
+   payload->range.offset = dma & 
~HV_HYP_PAGE_MASK;
+

[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function

From: Tianyu Lan 

In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
needs to be mapped into address space above vTOM and so
introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
bounce buffer memory. The platform can populate man/unmap callback
in the dma memory decrypted ops.
---
 include/linux/dma-map-ops.h |  9 +
 kernel/dma/mapping.c| 22 ++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..01d60a024e45 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -71,6 +71,11 @@ struct dma_map_ops {
unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
+struct dma_memory_decrypted_ops {
+   void *(*map)(void *addr, unsigned long size);
+   void (*unmap)(void *addr);
+};
+
 #ifdef CONFIG_DMA_OPS
 #include 
 
@@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device 
*dev)
 }
 #endif /* CONFIG_DMA_API_DEBUG */
 
+void *dma_map_decrypted(void *addr, unsigned long size);
+int dma_unmap_decrypted(void *addr, unsigned long size);
+
 extern const struct dma_map_ops dma_dummy_ops;
+extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 
 #endif /* _LINUX_DMA_MAP_OPS_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..6fb150dc1750 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -13,11 +13,13 @@
 #include 
 #include 
 #include 
+#include 
 #include "debug.h"
 #include "direct.h"
 
 bool dma_default_coherent;
 
+struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 /*
  * Managed DMA API
  */
@@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+void *dma_map_decrypted(void *addr, unsigned long size)
+{
+   if (set_memory_decrypted((unsigned long)addr,
+size / PAGE_SIZE))
+   return NULL;
+
+   if (dma_memory_generic_decrypted_ops.map)
+   return dma_memory_generic_decrypted_ops.map(addr, size);
+   else
+   return addr;
+}
+
+int dma_unmap_encrypted(void *addr, unsigned long size)
+{
+   if (dma_memory_generic_decrypted_ops.unmap)
+   dma_memory_generic_decrypted_ops.unmap(addr);
+
+   return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
+}
-- 
2.25.1

[PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/Kconfig|  1 +
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 84 ++-
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index d1123ceb38f3..dd12af20e467 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -8,6 +8,7 @@ config HYPERV
|| (ARM64 && !CPU_BIG_ENDIAN))
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
+   select VMAP_PFN
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 4c4717c26240..60ef881a700c 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -712,6 +712,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(&newchannel->outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(&newchannel->inbound,
+ &page[send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 40bc0eff6665..15cd23a561f3 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt, u32 max_pkt_size);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2aee356840a2..d4f93fca1108 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(&channel->outbound.ring_buffer_mutex);
 }
 
-/* Initialize the ring buffer. */
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-  struct page *pages, u32 page_cnt, u32 max_pkt_size)
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
 {
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long *pfns_wraparound;
+   void *vaddr;
int i;
-   struct page **pages_wraparound;
 
-   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
 
/*
 * First page holds struct hv_ring_buffer, do wraparound mapping for
 * the rest.
 */
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+   pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
   GFP_KERNEL);
-   if (!pages_wraparound)
+   if (!pfns_wraparound)
return -ENOMEM;
 
-   pages_wraparound[0] = pages;
+   pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
-
-   ring_info->ring_buffer = (struct hv_ring_buffer *)
-   vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
-
-   kfree(pages_wraparound);
+   pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
+   i % (page_cnt - 1) + 1;
 
-
-   if (!ring_info->ring_buffer)
+   vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
+   kfree(pfns_wraparound);
+   if (!vaddr)
return -ENOMEM;
 
-   ring_info->ring_buffer->read_index =
-   ring_info->ring_buffer->write_index = 0;
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+

[PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message

From: Tianyu Lan 

The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary). Introduce monitor_pages_va to store
the remap address and unmap these va when disconnect vmbus.

Signed-off-by: Tianyu Lan 
---
Change since v1:
* Not remap monitor pages in the non-SNP isolation VM.
---
 drivers/hv/connection.c   | 65 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 66 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6d315c1465e0..bf0ac3167bd2 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hyperv_vmbus.h"
@@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_isolation_type_snp()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_isolation_type_snp()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[1]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -159,6 +191,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -216,6 +249,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_is_isolation_supported()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -284,6 +327,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -303,6 +348,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_is_isolation_supported()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   memunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+

[PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM

From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 43 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 -
 drivers/hv/hv.c |  8 +-
 include/asm-generic/mshyperv.h  | 29 ++
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index ec0e5c259740..c13ec5560d73 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,49 @@
 #include 
 #include 
 
+#define GHCB_USAGE_HYPERV_CALL 1
+
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
+   hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 730985676ea3..a30c60f189a3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -255,6 +255,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5e479d54918c..6d315c1465e0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 59f7173c4d9f..e5c9fc467893 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+   if (hv_isolation_type_snp())
+   status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+   (void *)aligned_msg, NULL,
+   sizeof(struct hv_input_post_message));
+   else
+   status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+   aligned_msg, NULL);
 
/* Preemption must remain disabled until after the hypercall
 * so some other thread can't get scheduled onto this cpu and
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 90dac369a2dc..400181b855c1 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -31,6 +31,35 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reser

[PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page

From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyper-V requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB.

Signed-off-by: Tianyu Lan 
---
Change since v1:
 * Introduce sev_es_ghcb_hv_call_simple() and share code
   between SEV and Hyper-V code.
---
 arch/x86/hyperv/hv_init.c   |  33 ++---
 arch/x86/hyperv/ivm.c   | 110 +
 arch/x86/include/asm/mshyperv.h |  78 +++-
 arch/x86/include/asm/sev.h  |   3 +
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 arch/x86/kernel/sev-shared.c|  63 ++---
 drivers/hv/hv.c | 121 ++--
 include/asm-generic/mshyperv.h  |  12 +++-
 8 files changed, 329 insertions(+), 94 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b3683083208a..ab0b33f621e7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -423,7 +423,7 @@ void __init hyperv_init(void)
goto clean_guest_os_id;
 
if (hv_isolation_type_snp()) {
-   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
if (!ms_hyperv.ghcb_base)
goto clean_guest_os_id;
 
@@ -432,6 +432,9 @@ void __init hyperv_init(void)
ms_hyperv.ghcb_base = NULL;
goto clean_guest_os_id;
}
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -523,6 +526,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -596,30 +600,3 @@ bool hv_is_hyperv_initialized(void)
return hypercall_msr.enable;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.priv_high & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
-   return 0;
-
-   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
-   return 0;
-
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-
-DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
-
-bool hv_isolation_type_snp(void)
-{
-   return static_branch_unlikely(&isolation_type_snp);
-}
-EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 8c905ffdba7f..ec0e5c259740 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,6 +6,8 @@
  *  Tianyu Lan 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -13,6 +15,114 @@
 #include 
 #include 
 
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+   ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
+
+   if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
+   pr_warn("Fail to write msr via ghcb %llx.\n", msr);
+
+   local_irq_restore(flags);
+}
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+   if (sev_es_ghcb_hv_call_simple(&hv_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
+   pr_warn("Fail to read msr via ghcb %llx.\n", msr);
+   else
+   *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+

[PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

From: Tianyu Lan 

Mark vmbus ring buffer visible with set_memory_decrypted() when
establish gpadl handle.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c   | 44 --
 include/linux/hyperv.h | 11 +++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c73b074..4c4717c26240 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -465,7 +466,14 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
struct list_head *curr;
u32 next_gpadl_handle;
unsigned long flags;
-   int ret = 0;
+   int ret = 0, index;
+
+   index = atomic_inc_return(&channel->gpadl_index) - 1;
+
+   if (index > VMBUS_GPADL_RANGE_COUNT - 1) {
+   pr_err("Gpadl handle position(%d) has been occupied.\n", index);
+   return -ENOSPC;
+   }
 
next_gpadl_handle =
(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
@@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
if (ret)
return ret;
 
+   ret = set_memory_decrypted((unsigned long)kbuffer,
+  HVPFN_UP(size));
+   if (ret) {
+   pr_warn("Failed to set host visibility.\n");
+   return ret;
+   }
+
init_completion(&msginfo->waitevent);
msginfo->waiting_channel = channel;
 
@@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
/* At this point, we received the gpadl created msg */
*gpadl_handle = gpadlmsg->gpadl;
 
+   channel->gpadl_array[index].size = size;
+   channel->gpadl_array[index].buffer = kbuffer;
+   channel->gpadl_array[index].gpadlhandle = *gpadl_handle;
+
 cleanup:
spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
list_del(&msginfo->msglistentry);
@@ -549,6 +568,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
}
 
kfree(msginfo);
+
+   if (ret) {
+   set_memory_encrypted((unsigned long)kbuffer,
+HVPFN_UP(size));
+   atomic_dec(&channel->gpadl_index);
+   }
+
return ret;
 }
 
@@ -676,6 +702,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 
/* Establish the gpadl for the ring buffer */
newchannel->ringbuffer_gpadlhandle = 0;
+   atomic_set(&newchannel->gpadl_index, 0);
 
err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
  page_address(newchannel->ringbuffer_page),
@@ -811,7 +838,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 
gpadl_handle)
struct vmbus_channel_gpadl_teardown *msg;
struct vmbus_channel_msginfo *info;
unsigned long flags;
-   int ret;
+   int ret, i;
 
info = kzalloc(sizeof(*info) +
   sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
@@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
u32 gpadl_handle)
spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);
 
kfree(info);
+
+   /* Find gpadl buffer virtual address and size. */
+   for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
+   if (channel->gpadl_array[i].gpadlhandle == gpadl_handle)
+   break;
+
+   if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer,
+   HVPFN_UP(channel->gpadl_array[i].size)))
+   pr_warn("Fail to set mem host visibility.\n");
+
+   channel->gpadl_array[i].gpadlhandle = 0;
+   atomic_dec(&channel->gpadl_index);
+
return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index ddc8713ce57b..90b542597143 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -803,6 +803,14 @@ struct vmbus_device {
 
 #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
 
+struct vmbus_gpadl {
+   u32 gpadlhandle;
+   u32 size;
+   void *buffer;
+};
+
+#define VMBUS_GPADL_RANGE_COUNT3
+
 struct vmbus_channel {
struct list_head listentry;
 
@@ -823,6 +831,9 @@ struct vmbus_channel {
struct completion rescind_event;
 
u32 ringbuffer_gpadlhandle;
+   /* GPADL_RING and Send/Receive GPADL_BUFFER. */
+   struct vmbus_gpadl gpadl_array[VMBUS_GPADL_RANGE_COUNT];
+   atomic_t gpadl_index;
 
/* Allocated memory for ring buffer */
struct page *ringbuffer_page;
-- 
2.25.1

[PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

From: Tianyu Lan 

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted(). Add HYPERVISOR feature check in the
hv_is_isolation_supported() to optimize in non-virtualization
environment.

Signed-off-by: Tianyu Lan 
---
Change since v2:
   * Rework __set_memory_enc_dec() and call Hyper-V and AMD function
 according to platform check.

Change since v1:
   * Use new staic call x86_set_memory_enc to avoid add Hyper-V
 specific check in the set_memory code.
---
 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |   6 ++
 arch/x86/hyperv/ivm.c  | 114 +
 arch/x86/include/asm/hyperv-tlfs.h |  20 +
 arch/x86/include/asm/mshyperv.h|   4 +-
 arch/x86/mm/pat/set_memory.c   |  19 +++--
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |   1 +
 8 files changed, 160 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0bb4d9ca7a55..b3683083208a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
 
 bool hv_is_isolation_supported(void)
 {
+   if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+   return 0;
+
+   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
+   return 0;
+
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
 
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..8c905ffdba7f
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+  enum hv_mem_host_visibility visibility)
+{
+   struct hv_gpa_range_for_visibility **input_pcpu, *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_gpa_range_for_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -EINVAL;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, &pages_processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+static int __hv_set_mem_host_visibility(void *kbuffer, int pagecount,
+ enum hv_mem_host_visibility visibility)
+{
+   u64 *pfn_array;
+   int ret = 0;
+   int i, pfn;
+
+   if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
+   return 0;
+
+   pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+   if (!pfn_array)
+   return -ENOMEM;
+
+   for (i = 0, pfn = 0; i < pagecount; i++) {
+   pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+   pfn++;
+
+   if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+   ret |= hv_mark_gpa_visibility(pfn, pfn_array,
+   visibility);
+   pfn = 0;
+
+

[PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.

From: Tianyu Lan 

Hyper-V exposes shared memory boundary via cpuid
HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 12 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 6b5835a087a3..2b7f396ef1a5 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.priv_high & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+   ms_hyperv.shared_gpa_boundary =
+   (u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 4269f3174e58..aa26d24a5ca9 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,8 +35,18 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
-   u32 isolation_config_b;
+   union {
+   u32 isolation_config_b;
+   struct {
+   u32 cvm_type : 4;
+   u32 Reserved11 : 1;
+   u32 shared_gpa_boundary_active : 1;
+   u32 shared_gpa_boundary_bits : 6;
+   u32 Reserved12 : 20;
+   };
+   };
void  __percpu **ghcb_base;
+   u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1

[PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.


Change since V2:
   - Drop x86_set_memory_enc static call and use platform check
 in the __set_memory_enc_dec() to run platform callback of
 set memory encrypted or decrypted.

Change since V1:
   - Introduce x86_set_memory_enc static call and so platforms can
 override __set_memory_enc_dec() with their implementation
   - Introduce sev_es_ghcb_hv_call_simple() and share code
 between SEV and Hyper-V code.
   - Not remap monitor pages in the non-SNP isolation VM
   - Make swiotlb_init_io_tlb_mem() return error code and return
 error when dma_map_decrypted() fails.

Change since RFC V4:
   - Introduce dma map decrypted function to remap bounce buffer
  and provide dma map decrypted ops for platform to hook callback.  
  
   - Split swiotlb and dma map decrypted change into two patches
   - Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
   - Add interface set_memory_decrypted_map() to decrypt memory and
 map bounce buffer in extra address space
   - Remove swiotlb remap function and store the remap address
 returned by set_memory_decrypted_map() in swiotlb mem data structure.
   - Introduce hv_set_mem_enc() to make code more readable in the 
__set_memory_enc_dec().

Change since RFC v2:
   - Remove not UIO driver in Isolation VM patch
   - Use vmap_pfn() to replace ioremap_page_range function in
   order to avoid exposing symbol ioremap_page_range() and
   ioremap_page_range()
   - Call hv set mem host visibility hvcall in 
set_memory_encrypted/decrypted()
   - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
   - Fix code style


Tianyu Lan (13):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in the Isolation VM.
  x86/HV: Add new hvcall guest address host visibility support
  HV: Mark vmbus ring buffer visible to host in Isolation VM
  HV: Add Write/Read MSR registers via ghcb page
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  75 ++--
 arch/x86/hyperv/ivm.c  | 295 +
 arch/x86/include/asm/hyperv-tlfs.h |  20 ++
 arch/x86/include/asm/mshyperv.h|  87 -
 arch/x86/include/asm/sev.h |   3 +
 arch/x86/kernel/cpu/mshyperv.c |   5 +
 arch/x86/kernel/sev-shared.c   |  63 +++---
 arch/x86/mm/pat/set_memory.c   |  19 +-
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/channel.c   |  54 +-
 drivers/hv/connection.c|  71 ++-
 drivers/hv/hv.c| 129 +
 drivers/hv/hyperv_vmbus.h  |   3 +
 drivers/hv/ring_buffer.c   |  84 ++--
 drivers/hv/vmbus_drv.c |   3 +
 drivers/iommu/hyperv-iommu.c   |  65 +++
 drivers/net/hyperv/hyperv_net.h|   6 +
 drivers/net/hyperv/netvsc.c| 144 +-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c |  68 ++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  54 +-
 include/linux/

[PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   | 66 +++--
 arch/x86/include/asm/mshyperv.h |  2 +
 include/asm-generic/mshyperv.h  |  2 +
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 708a2712a516..0bb4d9ca7a55 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved;
 struct hv_vp_assist_page **hv_vp_assist_page;
 EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 
+static int hyperv_init_ghcb(void)
+{
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EINVAL;
+
+   /*
+* GHCB page is allocated by paravisor. The address
+* returned by MSR_AMD64_SEV_ES_GHCB is above shared
+* ghcb boundary and map it here.
+*/
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+
+   return 0;
+}
+
 static int hv_cpu_init(unsigned int cpu)
 {
union hv_vp_assist_msr_contents msr = { 0 };
@@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
}
}
 
+   hyperv_init_ghcb();
+
return 0;
 }
 
@@ -177,6 +205,14 @@ static int hv_cpu_die(unsigned int cpu)
 {
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
+   void **ghcb_va = NULL;
+
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   memunmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
 
hv_common_cpu_die(cpu);
 
@@ -383,9 +419,19 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   if (hyperv_init_ghcb()) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -456,7 +502,8 @@ void __init hyperv_init(void)
hv_query_ext_cap(0);
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -484,6 +531,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -559,3 +609,11 @@ bool hv_is_isolation_supported(void)
 {
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+bool hv_isolation_type_snp(void)
+{
+   return static_branch_unlikely(&isolation_type_snp);
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..6627cfd2bfba 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+
 typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
void *data);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c1ab6a6e72b5..4269f3174e58 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -36,6 +36,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+   void  __percpu **ghcb_base;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
@@ -237,6 +238,7 @@ bool hv_is_hyperv_initialized(void);
 bool hv_is_hibernation_supported(void);
 enum hv_isolation_type hv_get_isolation_type(void);
 bool

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space




On 09.08.21 20:18, Julien Grall wrote:


Hi Julien




On 09/08/2021 18:14, Oleksandr wrote:


On 09.08.21 17:51, Julien Grall wrote:
Hi Julien.


Hi Oleksandr,

I am writing down here what we discussed on another thread and on 
IRC. This will be easier to track in a single thread.


On 04/08/2021 23:00, Julien Grall wrote:

On 04/08/2021 21:56, Oleksandr wrote:
Now, I am wondering, would it be possible to update/clarify the 
current "reg" purpose and use it to pass a safe unallocated space 
for any Xen specific mappings (grant, foreign, whatever) instead 
of just for the grant table region. In case, it is not allowed for 
any reason (compatibility PoV, etc), would it be possible to 
extend a property by passing an extra range separately, something 
similar to how I described above?


I think it should be fine to re-use the same region so long the 
size of the first bank is at least the size of the original region.


While answering to the DT binding question on the DT ML, I realized 
that this is probably not going to be fine because there is a bug in 
Xen when mapping grant-table frame.


The function gnttab_map_frame() is used to map the grant table 
frame. If there is an old mapping, it will first remove it.


The function is using the helper gnttab_map_frame() to find the 
corresponding GFN or return INVALID_GFN if not mapped.


On Arm, gnttab_map_frame() is implementing using an array index by 
the grant table frame number. The trouble is we don't update the 
array when the page is unmapped. So if the GFN is re-used before the 
grant-table is remapped, then we will end up to remove whatever was 
mapped there (this could be a foreign page...).


This behavior already happens today as the toolstack will use the 
first GFN of the region if Linux doesn't support the acquire 
resource interface. We are getting away in the Linux because the 
toolstack only map the first grant table frame and:
 - Newer Linux will not used the region provided by the DT and 
nothing will be mapped there.
 - Older Linux will use the region but still map the grant table 
frame 0 to the same GFN.


I am not sure about U-boot and other OSes here.

This is not new but it is going to be become a bigger source of 
problem (read more chance to hit it) as we try to re-use the first 
region.


This means the first region should exclusively used for the 
grant-table (in a specific order) until the issue is properly fixed.


Thank you for the explanation, it is clear now.





A potential fix is to update the array in p2m_put_l3_page(). The 
default max size of the array is 1024, so it might be fine to just 
walk it (it would be simply a comparison).


I think, this would work. Looks like we don't need to walk for each 
gfn which is being freed, we could just filter it by p2m_is_ram() ...


Well. This would still potentially result to a few unnecessary walk. I 
would consider to introduce a new P2M type or possibly add a check if 
the page is in xenheap (grant-table are xenheap pages so far).


Indeed, this would be better, personally I would prefer to check if page 
is in xenheap.






Cheers,


--
Regards,

Oleksandr Tyshchenko

Re: NULL scheduler DoS





On 09/08/2021 17:19, Ahmed, Daniele wrote:

Hi all,


Hi Daniele,

Thank you for the report!

The NULL scheduler is affected by an issue that triggers an assertion 
and reboots the hypervisor.


This issue arise when:

  * a guest is being created with a configuration specifying a file that
does not exist
  * the hypervisor boots with the null scheduler

4.16 is affected and 4.15 also.

This is the stack trace from 4.16:

(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) [ Xen-4.16-unstable x86_64 debug=y Not tainted ]
(XEN) CPU: 3
(XEN) RIP: e008:[] 
common/sched/null.c#unit_deassign+0x1c3/0x2ec

(XEN) RFLAGS: 00010006 CONTEXT: hypervisor
(XEN) rax: 83005ce1c850 rbx: 0001 rcx: 0001
(XEN) rdx: 83007fde6fc0 rsi: 83005ce1c790 rdi: 83007ffb7850
(XEN) rbp: 83007ffdfda0 rsp: 83007ffdfd48 r8: 
(XEN) r9: 00048fee r10:  r11: 
(XEN) r12: 82d0405c9298 r13: 83007f7fd508 r14: 83005ce1c850
(XEN) r15: 82d0405e2680 cr0: 8005003b cr4: 003526e0
(XEN) cr3: 7f6b3000 cr2: 888072e79dc0
(XEN) fsb:  gsb: 888071ac gss: 
(XEN) ds: 002b es: 002b fs:  gs:  ss: e010 cs: e008
(XEN) Xen code around  
(common/sched/null.c#unit_deassign+0x1c3/0x2ec):
(XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f 
b7 00 66

(XEN) Xen stack trace from rsp=83007ffdfd48:
(XEN) 83007ffdfd88 82d04023961c 0004 83005ce1cc50
(XEN) 0002 83007ffdfd90 83005ce1c790 82d0405c9298
(XEN) 83007f7fd508 83005ce1c850 82d0405e2680 83007ffdfde0
(XEN) 82d04024f889 83007ffb7850 83005dd63000 83005ce1c790
(XEN) 83005845ab28 83005845a000  83007ffdfe00
(XEN) 82d040253326 83005dd63000  83007ffdfe38
(XEN) 82d04020506b 83007a881080  
(XEN)  82d0405d6f80 83007ffdfe70 82d04022d9e5
(XEN) 00110003 82d0405cf100 82d0405cf100 
(XEN) 82d0405cef80 83007ffdfea8 82d04022e14b 0003
(XEN) 82d0405cf100 7fff 0003 0003
(XEN) 83007ffdfeb8 82d04022e1e6 83007ffdfef0 82d0403172b4
(XEN) 82d04031721d 83007fec1000 83007ffb6000 0003
(XEN) 83007ffcc000 83007ffdfe18  
(XEN)   0003 0003
(XEN) 0246 0003  1bf9dde5
(XEN)  810023aa 0003 deadbeefdeadf00d
(XEN) deadbeefdeadf00d 0100 810023aa e033
(XEN) 0246 c900400a3ea8 e02b 7ffdff707fffd140
(XEN) 00017fe37a6c 7ffe8010  e013
(XEN) Xen call trace:
(XEN) [] R common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) [] F common/sched/null.c#null_unit_remove+0xfc/0x136
(XEN) [] F sched_destroy_vcpu+0xca/0x199
(XEN) [] F 
common/domain.c#complete_domain_destroy+0x68/0x13f
(XEN) [] F 
common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b

(XEN) [] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN) [] F do_softirq+0x13/0x15
(XEN) [] F arch/x86/domain.c#idle_loop+0x97/0xee
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 3:
(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

This is the line of the assertion that triggers the reboot: 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377 



To reproduce the vulnerability, I took the following steps:


Just to make clear for the others in the thread, per SUPPORT.MD, the 
NULL scheduler is not security supported. Hence why this is sent to 
xen-devel directly.


Also, for completeness, debug build are also not security supported. On 
production build, the ASSERT() would be turned to a NOP which could 
result to potentially more interesting issue. Anyway, that's not a 
problem here. :)




  * Install XEN; only 4.15+ seem to be vulnerable
  * Use the null scheduler (depends on your setup): edit
/etc/default/grub adding at the end of the file:
GRUB_CMDLINE_XEN="sched=null" and update grub
  * Reboot into xen
  * Create a file guest.cfg with the following contents

name="guest"
builder="hvm"
memory=512

serial = [ 'file:/tmp/log', 'pty' ]

disk = [ '/home/user/boot.iso,,hdc,cdrom' ]

on_reboot = "destroy"

vcpus=1


Make sure that the file //home/user/boot.iso/ does not exist

  * Create a guest with this configuration: xl create -c guest.cfg

CC’

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space





On 09/08/2021 18:14, Oleksandr wrote:


On 09.08.21 17:51, Julien Grall wrote:
Hi Julien.


Hi Oleksandr,

I am writing down here what we discussed on another thread and on IRC. 
This will be easier to track in a single thread.


On 04/08/2021 23:00, Julien Grall wrote:

On 04/08/2021 21:56, Oleksandr wrote:
Now, I am wondering, would it be possible to update/clarify the 
current "reg" purpose and use it to pass a safe unallocated space 
for any Xen specific mappings (grant, foreign, whatever) instead of 
just for the grant table region. In case, it is not allowed for any 
reason (compatibility PoV, etc), would it be possible to extend a 
property by passing an extra range separately, something similar to 
how I described above?


I think it should be fine to re-use the same region so long the size 
of the first bank is at least the size of the original region.


While answering to the DT binding question on the DT ML, I realized 
that this is probably not going to be fine because there is a bug in 
Xen when mapping grant-table frame.


The function gnttab_map_frame() is used to map the grant table frame. 
If there is an old mapping, it will first remove it.


The function is using the helper gnttab_map_frame() to find the 
corresponding GFN or return INVALID_GFN if not mapped.


On Arm, gnttab_map_frame() is implementing using an array index by the 
grant table frame number. The trouble is we don't update the array 
when the page is unmapped. So if the GFN is re-used before the 
grant-table is remapped, then we will end up to remove whatever was 
mapped there (this could be a foreign page...).


This behavior already happens today as the toolstack will use the 
first GFN of the region if Linux doesn't support the acquire resource 
interface. We are getting away in the Linux because the toolstack only 
map the first grant table frame and:
 - Newer Linux will not used the region provided by the DT and nothing 
will be mapped there.
 - Older Linux will use the region but still map the grant table frame 
0 to the same GFN.


I am not sure about U-boot and other OSes here.

This is not new but it is going to be become a bigger source of 
problem (read more chance to hit it) as we try to re-use the first 
region.


This means the first region should exclusively used for the 
grant-table (in a specific order) until the issue is properly fixed.


Thank you for the explanation, it is clear now.





A potential fix is to update the array in p2m_put_l3_page(). The 
default max size of the array is 1024, so it might be fine to just 
walk it (it would be simply a comparison).


I think, this would work. Looks like we don't need to walk for each gfn 
which is being freed, we could just filter it by p2m_is_ram() ...


Well. This would still potentially result to a few unnecessary walk. I 
would consider to introduce a new P2M type or possibly add a check if 
the page is in xenheap (grant-table are xenheap pages so far).


Cheers,

--
Julien Grall

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space




On 09.08.21 17:51, Julien Grall wrote:

Hi,



Hi Julien.




I am writing down here what we discussed on another thread and on IRC. 
This will be easier to track in a single thread.


On 04/08/2021 23:00, Julien Grall wrote:

On 04/08/2021 21:56, Oleksandr wrote:
Now, I am wondering, would it be possible to update/clarify the 
current "reg" purpose and use it to pass a safe unallocated space 
for any Xen specific mappings (grant, foreign, whatever) instead of 
just for the grant table region. In case, it is not allowed for any 
reason (compatibility PoV, etc), would it be possible to extend a 
property by passing an extra range separately, something similar to 
how I described above?


I think it should be fine to re-use the same region so long the size 
of the first bank is at least the size of the original region.


While answering to the DT binding question on the DT ML, I realized 
that this is probably not going to be fine because there is a bug in 
Xen when mapping grant-table frame.


The function gnttab_map_frame() is used to map the grant table frame. 
If there is an old mapping, it will first remove it.


The function is using the helper gnttab_map_frame() to find the 
corresponding GFN or return INVALID_GFN if not mapped.


On Arm, gnttab_map_frame() is implementing using an array index by the 
grant table frame number. The trouble is we don't update the array 
when the page is unmapped. So if the GFN is re-used before the 
grant-table is remapped, then we will end up to remove whatever was 
mapped there (this could be a foreign page...).


This behavior already happens today as the toolstack will use the 
first GFN of the region if Linux doesn't support the acquire resource 
interface. We are getting away in the Linux because the toolstack only 
map the first grant table frame and:
 - Newer Linux will not used the region provided by the DT and nothing 
will be mapped there.
 - Older Linux will use the region but still map the grant table frame 
0 to the same GFN.


I am not sure about U-boot and other OSes here.

This is not new but it is going to be become a bigger source of 
problem (read more chance to hit it) as we try to re-use the first 
region.


This means the first region should exclusively used for the 
grant-table (in a specific order) until the issue is properly fixed.


Thank you for the explanation, it is clear now.





A potential fix is to update the array in p2m_put_l3_page(). The 
default max size of the array is 1024, so it might be fine to just 
walk it (it would be simply a comparison).


I think, this would work. Looks like we don't need to walk for each gfn 
which is being freed, we could just filter it by p2m_is_ram() ...






Note that this is not a problem on x86 because the is using the M2P. 
So when a mapping is removed, the mapping MFN -> GFN will also be 
removed.


Cheers,


--
Regards,

Oleksandr Tyshchenko

Re: [PATCH v2 5/6] PCI: Adapt all code locations to not use struct pci_dev::driver directly

2021-08-09 Thread Ido Schimmel

On Tue, Aug 03, 2021 at 12:01:49PM +0200, Uwe Kleine-König wrote:
> This prepares removing the driver member of struct pci_dev which holds the
> same information than struct pci_dev::dev->driver.
> 
> Signed-off-by: Uwe Kleine-König 
> ---
>  arch/powerpc/include/asm/ppc-pci.h|  3 +-
>  arch/powerpc/kernel/eeh_driver.c  | 12 ---
>  arch/x86/events/intel/uncore.c|  2 +-
>  arch/x86/kernel/probe_roms.c  |  2 +-
>  drivers/bcma/host_pci.c   |  6 ++--
>  drivers/crypto/hisilicon/qm.c |  2 +-
>  drivers/crypto/qat/qat_common/adf_aer.c   |  2 +-
>  drivers/message/fusion/mptbase.c  |  4 +--
>  drivers/misc/cxl/guest.c  | 21 +--
>  drivers/misc/cxl/pci.c| 25 +++--
>  .../ethernet/hisilicon/hns3/hns3_ethtool.c|  2 +-
>  .../ethernet/marvell/prestera/prestera_pci.c  |  2 +-
>  drivers/net/ethernet/mellanox/mlxsw/pci.c |  2 +-
>  .../ethernet/netronome/nfp/nfp_net_ethtool.c  |  2 +-
>  drivers/pci/iov.c | 23 +++-
>  drivers/pci/pci-driver.c  | 28 ---
>  drivers/pci/pci.c | 10 +++---
>  drivers/pci/pcie/err.c| 35 ++-
>  drivers/pci/xen-pcifront.c|  3 +-
>  drivers/ssb/pcihost_wrapper.c |  7 ++--
>  drivers/usb/host/xhci-pci.c   |  3 +-
>  21 files changed, 112 insertions(+), 84 deletions(-)

For mlxsw:

Tested-by: Ido Schimmel

Re: NULL scheduler DoS

2021-08-09 Thread Ahmed, Daniele

CC’ing jul...@xen.org

From: "Ahmed, Daniele" 
Date: Monday, 9 August 2021 at 17:19
To: "xen-devel@lists.xenproject.org" 
Cc: Dario Faggioli , Stefano Stabellini 
, "Grall, Julien" , "Doebel, 
Bjoern" , "Pohlack, Martin" 
Subject: NULL scheduler DoS

Hi all,
The NULL scheduler is affected by an issue that triggers an assertion and 
reboots the hypervisor.

This issue arise when:

  *   a guest is being created with a configuration specifying a file that does 
not exist
  *   the hypervisor boots with the null scheduler
4.16 is affected and 4.15 also.

This is the stack trace from 4.16:

(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) [ Xen-4.16-unstable x86_64 debug=y Not tainted ]
(XEN) CPU: 3
(XEN) RIP: e008:[] 
common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) RFLAGS: 00010006 CONTEXT: hypervisor
(XEN) rax: 83005ce1c850 rbx: 0001 rcx: 0001
(XEN) rdx: 83007fde6fc0 rsi: 83005ce1c790 rdi: 83007ffb7850
(XEN) rbp: 83007ffdfda0 rsp: 83007ffdfd48 r8: 
(XEN) r9: 00048fee r10:  r11: 
(XEN) r12: 82d0405c9298 r13: 83007f7fd508 r14: 83005ce1c850
(XEN) r15: 82d0405e2680 cr0: 8005003b cr4: 003526e0
(XEN) cr3: 7f6b3000 cr2: 888072e79dc0
(XEN) fsb:  gsb: 888071ac gss: 
(XEN) ds: 002b es: 002b fs:  gs:  ss: e010 cs: e008
(XEN) Xen code around  
(common/sched/null.c#unit_deassign+0x1c3/0x2ec):
(XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f b7 00 66
(XEN) Xen stack trace from rsp=83007ffdfd48:
(XEN) 83007ffdfd88 82d04023961c 0004 83005ce1cc50
(XEN) 0002 83007ffdfd90 83005ce1c790 82d0405c9298
(XEN) 83007f7fd508 83005ce1c850 82d0405e2680 83007ffdfde0
(XEN) 82d04024f889 83007ffb7850 83005dd63000 83005ce1c790
(XEN) 83005845ab28 83005845a000  83007ffdfe00
(XEN) 82d040253326 83005dd63000  83007ffdfe38
(XEN) 82d04020506b 83007a881080  
(XEN)  82d0405d6f80 83007ffdfe70 82d04022d9e5
(XEN) 00110003 82d0405cf100 82d0405cf100 
(XEN) 82d0405cef80 83007ffdfea8 82d04022e14b 0003
(XEN) 82d0405cf100 7fff 0003 0003
(XEN) 83007ffdfeb8 82d04022e1e6 83007ffdfef0 82d0403172b4
(XEN) 82d04031721d 83007fec1000 83007ffb6000 0003
(XEN) 83007ffcc000 83007ffdfe18  
(XEN)   0003 0003
(XEN) 0246 0003  1bf9dde5
(XEN)  810023aa 0003 deadbeefdeadf00d
(XEN) deadbeefdeadf00d 0100 810023aa e033
(XEN) 0246 c900400a3ea8 e02b 7ffdff707fffd140
(XEN) 00017fe37a6c 7ffe8010  e013
(XEN) Xen call trace:
(XEN) [] R common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) [] F common/sched/null.c#null_unit_remove+0xfc/0x136
(XEN) [] F sched_destroy_vcpu+0xca/0x199
(XEN) [] F common/domain.c#complete_domain_destroy+0x68/0x13f
(XEN) [] F common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b
(XEN) [] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN) [] F do_softirq+0x13/0x15
(XEN) [] F arch/x86/domain.c#idle_loop+0x97/0xee
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 3:
(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

This is the line of the assertion that triggers the reboot: 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377

To reproduce the vulnerability, I took the following steps:

  *   Install XEN; only 4.15+ seem to be vulnerable
  *   Use the null scheduler (depends on your setup): edit /etc/default/grub 
adding at the end of the file: GRUB_CMDLINE_XEN="sched=null" and update grub
  *   Reboot into xen
  *   Create a file guest.cfg with the following contents
name="guest"
builder="hvm"
memory=512

serial = [ 'file:/tmp/log', 'pty' ]

disk = [ '/home/user/boot.iso,,hdc,cdrom' ]

on_reboot = "destroy"

vcpus=1

Make sure that the file /home/user/boot.iso does not exist

  *   Create a guest with this configuration: xl create -c guest.cfg
CC’ing Dario, Stefano and Julien to whom I’ve shown this.

Thanks

Daniele

[PATCH] PCI: Fix general code style

2021-08-09 Thread Sergio Miguéns Iglesias

The code style for most files was fixed. This means that blank lines
were added when needed (normally after variable declarations), spaces
before tabs were removed, some code alignment issues were solved, block
comment style was fixed, every instance of "unsigned var" was replaced
with "unsigned int var"... Etc.

This commit does not change the logic of the code, it just fixes
aesthetic problems.

Signed-off-by: Sergio Miguéns Iglesias 
---
 drivers/pci/access.c   | 22 +-
 drivers/pci/bus.c  |  3 ++-
 drivers/pci/msi.c  | 12 +++-
 drivers/pci/pci-acpi.c |  3 ++-
 drivers/pci/pci-driver.c   | 19 +--
 drivers/pci/pci-sysfs.c| 14 --
 drivers/pci/pci.c  | 16 
 drivers/pci/proc.c | 15 +++
 drivers/pci/quirks.c   | 35 ---
 drivers/pci/remove.c   |  1 +
 drivers/pci/rom.c  |  2 +-
 drivers/pci/setup-bus.c|  5 -
 drivers/pci/setup-irq.c| 12 +++-
 drivers/pci/setup-res.c|  2 +-
 drivers/pci/slot.c |  5 -
 drivers/pci/syscall.c  |  5 +++--
 drivers/pci/xen-pcifront.c | 20 
 17 files changed, 133 insertions(+), 58 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 46935695cfb9..4f8d04a0ac1d 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -33,13 +33,15 @@ DEFINE_RAW_SPINLOCK(pci_lock);
 #endif
 
 #define PCI_OP_READ(size, type, len) \
-int noinline pci_bus_read_config_##size \
+noinline int pci_bus_read_config_##size \
(struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
 {  \
int res;\
unsigned long flags;\
+   \
u32 data = 0;   \
-   if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;   \
+   if (PCI_##size##_BAD)   \
+   return PCIBIOS_BAD_REGISTER_NUMBER; \
pci_lock_config(flags); \
res = bus->ops->read(bus, devfn, pos, len, &data);  \
*value = (type)data;\
@@ -48,12 +50,14 @@ int noinline pci_bus_read_config_##size \
 }
 
 #define PCI_OP_WRITE(size, type, len) \
-int noinline pci_bus_write_config_##size \
+noinline int pci_bus_write_config_##size \
(struct pci_bus *bus, unsigned int devfn, int pos, type value)  \
 {  \
int res;\
unsigned long flags;\
-   if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;   \
+   \
+   if (PCI_##size##_BAD)   \
+   return PCIBIOS_BAD_REGISTER_NUMBER; \
pci_lock_config(flags); \
res = bus->ops->write(bus, devfn, pos, len, value); \
pci_unlock_config(flags);   \
@@ -214,7 +218,7 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
 }
 
 /* Returns 0 on success, negative values indicate error. */
-#define PCI_USER_READ_CONFIG(size, type)   
\
+#define PCI_USER_READ_CONFIG(size, type)   \
 int pci_user_read_config_##size
\
(struct pci_dev *dev, int pos, type *val)   \
 {  \
@@ -222,12 +226,12 @@ int pci_user_read_config_##size   
\
u32 data = -1;  \
if (PCI_##size##_BAD)   \
return -EINVAL; \
-   raw_spin_lock_irq(&pci_lock);   \
+   raw_spin_lock_irq(&pci_lock);   \
if (unlikely(dev->block_cfg_access))\
pci_wait_cfg(dev);  \
ret = dev->bus->ops->read(dev->bus, dev->devfn, \
pos, sizeof(type), &data);  \
-   raw_spin_unlock_irq(&pci_lock); \
+   raw_spin_unlock_irq(&pci_lock); \
*val = (type)data;

Re: [PATCH v2 5/6] PCI: Adapt all code locations to not use struct pci_dev::driver directly

2021-08-09 Thread Andrew Donnellan


On 3/8/21 8:01 pm, Uwe Kleine-König wrote:

This prepares removing the driver member of struct pci_dev which holds the
same information than struct pci_dev::dev->driver.

Signed-off-by: Uwe Kleine-König 


cxl hunks look alright.

Acked-by: Andrew Donnellan  # cxl

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

NULL scheduler DoS

2021-08-09 Thread Ahmed, Daniele

Hi all,
The NULL scheduler is affected by an issue that triggers an assertion and 
reboots the hypervisor.

This issue arise when:

  *   a guest is being created with a configuration specifying a file that does 
not exist
  *   the hypervisor boots with the null scheduler
4.16 is affected and 4.15 also.

This is the stack trace from 4.16:

(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) [ Xen-4.16-unstable x86_64 debug=y Not tainted ]
(XEN) CPU: 3
(XEN) RIP: e008:[] 
common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) RFLAGS: 00010006 CONTEXT: hypervisor
(XEN) rax: 83005ce1c850 rbx: 0001 rcx: 0001
(XEN) rdx: 83007fde6fc0 rsi: 83005ce1c790 rdi: 83007ffb7850
(XEN) rbp: 83007ffdfda0 rsp: 83007ffdfd48 r8: 
(XEN) r9: 00048fee r10:  r11: 
(XEN) r12: 82d0405c9298 r13: 83007f7fd508 r14: 83005ce1c850
(XEN) r15: 82d0405e2680 cr0: 8005003b cr4: 003526e0
(XEN) cr3: 7f6b3000 cr2: 888072e79dc0
(XEN) fsb:  gsb: 888071ac gss: 
(XEN) ds: 002b es: 002b fs:  gs:  ss: e010 cs: e008
(XEN) Xen code around  
(common/sched/null.c#unit_deassign+0x1c3/0x2ec):
(XEN) 41 5e 41 5f 5d c3 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 49 8b 04 24 0f b7 00 66
(XEN) Xen stack trace from rsp=83007ffdfd48:
(XEN) 83007ffdfd88 82d04023961c 0004 83005ce1cc50
(XEN) 0002 83007ffdfd90 83005ce1c790 82d0405c9298
(XEN) 83007f7fd508 83005ce1c850 82d0405e2680 83007ffdfde0
(XEN) 82d04024f889 83007ffb7850 83005dd63000 83005ce1c790
(XEN) 83005845ab28 83005845a000  83007ffdfe00
(XEN) 82d040253326 83005dd63000  83007ffdfe38
(XEN) 82d04020506b 83007a881080  
(XEN)  82d0405d6f80 83007ffdfe70 82d04022d9e5
(XEN) 00110003 82d0405cf100 82d0405cf100 
(XEN) 82d0405cef80 83007ffdfea8 82d04022e14b 0003
(XEN) 82d0405cf100 7fff 0003 0003
(XEN) 83007ffdfeb8 82d04022e1e6 83007ffdfef0 82d0403172b4
(XEN) 82d04031721d 83007fec1000 83007ffb6000 0003
(XEN) 83007ffcc000 83007ffdfe18  
(XEN)   0003 0003
(XEN) 0246 0003  1bf9dde5
(XEN)  810023aa 0003 deadbeefdeadf00d
(XEN) deadbeefdeadf00d 0100 810023aa e033
(XEN) 0246 c900400a3ea8 e02b 7ffdff707fffd140
(XEN) 00017fe37a6c 7ffe8010  e013
(XEN) Xen call trace:
(XEN) [] R common/sched/null.c#unit_deassign+0x1c3/0x2ec
(XEN) [] F common/sched/null.c#null_unit_remove+0xfc/0x136
(XEN) [] F sched_destroy_vcpu+0xca/0x199
(XEN) [] F common/domain.c#complete_domain_destroy+0x68/0x13f
(XEN) [] F common/rcupdate.c#rcu_process_callbacks+0xdb/0x24b
(XEN) [] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN) [] F do_softirq+0x13/0x15
(XEN) [] F arch/x86/domain.c#idle_loop+0x97/0xee
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 3:
(XEN) Assertion 'npc->unit == unit' failed at null.c:377
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

This is the line of the assertion that triggers the reboot: 
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/common/sched/null.c;h=82d5d1baab853d24fcbb455fb3f3e8263c871277;hb=HEAD#l377

To reproduce the vulnerability, I took the following steps:

  *   Install XEN; only 4.15+ seem to be vulnerable
  *   Use the null scheduler (depends on your setup): edit /etc/default/grub 
adding at the end of the file: GRUB_CMDLINE_XEN="sched=null" and update grub
  *   Reboot into xen
  *   Create a file guest.cfg with the following contents
name="guest"
builder="hvm"
memory=512

serial = [ 'file:/tmp/log', 'pty' ]

disk = [ '/home/user/boot.iso,,hdc,cdrom' ]

on_reboot = "destroy"

vcpus=1

Make sure that the file /home/user/boot.iso does not exist

  *   Create a guest with this configuration: xl create -c guest.cfg
CC’ing Dario, Stefano and Julien to whom I’ve shown this.

Thanks

Daniele

Re: [XEN PATCH v6 25/31] build: remove unneeded deps of x86_emulate.o

On Fri, Aug 06, 2021 at 06:06:37PM +0200, Jan Beulich wrote:
> On 01.07.2021 16:10, Anthony PERARD wrote:
> > Those two dependencies already exist so make doesn't need to know
> > about them. The dependency will be generated by $(CC).
> > 
> > Signed-off-by: Anthony PERARD 
> 
> Unless I'm mistaken this was actually an omission by 68b1230ae393
> ("Auto-build dependency files in hypervisor build tree"), which
> would again suggest this can go in independently of all of the
> earlier patches? In any event

That sound right. Yes.

> Reviewed-by: Jan Beulich 

Thanks,

-- 
Anthony PERARD

Re: [XEN PATCH v6 21/31] build: set XEN_BUILD_EFI earlier

On Thu, Aug 05, 2021 at 09:27:18AM +0200, Jan Beulich wrote:
> On 01.07.2021 16:10, Anthony PERARD wrote:
> > We are going to need the variable XEN_BUILD_EFI earlier.
> > 
> > This early check is using "try-run" to allow to have a temporary
> > output file in case it is needed for $(CC) to build the *.c file.
> > 
> > The "efi/check.o" file is still needed in "arch/x86/Makefile" so the
> > check is currently duplicated.
> 
> Why is this? Can't you ...
> 
> > --- a/xen/arch/x86/Makefile
> > +++ b/xen/arch/x86/Makefile
> > @@ -126,7 +126,7 @@ $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
> >  ifneq ($(efi-y),)
> >  
> >  # Check if the compiler supports the MS ABI.
> > -export XEN_BUILD_EFI := $(shell $(CC) $(XEN_CFLAGS) -c efi/check.c -o 
> > efi/check.o 2>/dev/null && echo y)
> > +XEN_BUILD_EFI := $(shell $(CC) $(XEN_CFLAGS) -c efi/check.c -o efi/check.o 
> > 2>/dev/null && echo y)
> >  CFLAGS-$(XEN_BUILD_EFI) += -DXEN_BUILD_EFI
> 
> ... use here what you ...
> 
> > --- a/xen/arch/x86/arch.mk
> > +++ b/xen/arch/x86/arch.mk
> > @@ -60,5 +60,10 @@ ifeq ($(CONFIG_UBSAN),y)
> >  $(call cc-option-add,CFLAGS_UBSAN,CC,-fno-sanitize=alignment)
> >  endif
> >  
> > +ifneq ($(CONFIG_PV_SHIM_EXCLUSIVE),y)
> > +# Check if the compiler supports the MS ABI.
> > +export XEN_BUILD_EFI := $(call try-run,$(CC) $(CFLAGS) -c 
> > arch/x86/efi/check.c -o "$$TMPO",y)
> > +endif
> 
> ... export here?

The problem with the check for EFI support is that there several step,
with a step depending on the binary produced by the previous one.

XEN_BUILD_EFI
In addition to check "__ms_abi__" attribute is supported by $CC, the
file "efi/check.o" is produced.
XEN_BUILD_PE
It is using "efi/check.o" to check for PE support and produce
"efi/check.efi".
"efi/check.efi" is also used by the Makefile for additional checks
(mkreloc).

So, if I let the duplicated check for $(XEN_BUILD_EFI) is that it felt
wrong to produce "efi/check.o" in "arch/x86/arch.mk" and then later use
it in "arch/x86/Makefile". I could maybe move the command that create
efi/check.o in the $(XEN_BUILD_PE) check, or I could try to move most of
the checks done for EFI into x86/arch.mk. Or maybe just creating the
"efi/check.o" file in x86/arch.mk and use it in x86/Makefile, with a
comment.

What do you think?

Thanks,

-- 
Anthony PERARD

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space

Hi Oleksandr,

On 07/08/2021 18:03, Oleksandr wrote:

On 06.08.21 03:30, Stefano Stabellini wrote:

Hi Stefano

On Wed, 4 Aug 2021, Julien Grall wrote:
+#define GUEST_SAFE_RANGE_BASE xen_mk_ullong(0xDE) /*
128GB */

+#define GUEST_SAFE_RANGE_SIZE xen_mk_ullong(0x02)

While the possible new DT bindings has not been agreed yet, I re-used
existing "reg" property under the hypervisor node to pass safe range
as a

second region,
https://elixir.bootlin.com/linux/v5.14-rc4/source/Documentation/devicetree/bindings/arm/xen.txt#L10:

So a single region works for a guest today, but for dom0 we will need
multiple
regions because it is may be difficult to find enough contiguous
space for a

single region.

That said, as dom0 is mapped 1:1 (including some guest mapping),
there is also
the question where to allocate the safe region. For grant table, we
so far
re-use the Xen address space because it is assumed it will space will
always

be bigger than the grant table.

I am not sure yet where we could allocate the safe regions. Stefano,
do you

have any ideas?

The safest choice would be the address range corresponding to memory
(/memory) not already allocated to Dom0.

For instance from my last boot logs:
(XEN) Allocating 1:1 mappings totalling 1600MB for dom0:
(XEN) BANK[0] 0x001000-0x007000 (1536MB)
(XEN) BANK[1] 0x007800-0x007c00 (64MB)

All the other ranges could be given as unallocated space:

- 0x0 - 0x1000
- 0x7000 - 0x7800
- 0x8__ - 0x8_8000_

Thank you for the ideas.

If I got the idea correctly, yes, as these ranges represent the real
RAM, so no I/O would be in conflict with them and as the result - no
overlaps would be expected.
But, I wonder, would this work if we have IOMMU enabled for Dom0 and
need to establish 1:1 mapping for the DMA devices to work with grant
mappings...
In arm_iommu_map_page() we call guest_physmap_add_entry() with gfn =
mfn, so the question is could we end up with this new gfn replacing the
valid mapping

(with gfn allocated from the safe region)?

Right, when we enable the IOMMU for dom0, Xen will add an extra mapping
with GFN == MFN for foreign and grant pages. This is because Linux is
not aware that whether a device is protected by an IOMMU. Therefore it
is assuming it is not and will use the MFN to configure for DMA transaction.

We can't remove the mapping without significant changes in Linux and
Xen. I would not mandate them for this work.

That said, I think it would be acceptable to have different way to find
the region depending on the dom0 configuration. So we could use the RAM
not used by dom0 when the IOMMU is turned off.

The second best choice would be an hole: an address range not used by
anybody else (no reg property) and also not even mappable by a bus (not
covered by a ranges property). This is not the best choice because there
can cases where physical resources appear afterwards.

Are you saying that the original device-tree doesn't even describe them
in any way (i.e. reserved...)?

Unfortunately, yes.

So the decision where the safe region is located will be done by Xen.
There is no involvement of the domain (it will discover the region from
the DT). Therefore, I don't think we need to think about everything
right now as we could adapt this is exact region is not part of the
stable ABI.

The hotplug is one I would defer because this is not supported (and
quite likely not working) in Xen upstream today.

Now regarding the case where dom0 is using the IOMMU. The assumption is
Xen will be able to figure out all the regions used from the firmware
table (ACPI or DT).

AFAIK, this assumption would be correct for DT. However, for ACPI, I
remember we were not able to find all the MMIOs region in Xen (see [1]
and [2]). So even this solution would not work for ACPI.

If I am not mistaken, we don't support IOMMU with ACPI yet. So we could
defer the problem to when this is going to be supported.

Cheers,

[1] https://marc.info/?l=linux-arm-kernel&m=148469169210500&w=2
[2] Xen commit 80f9c316708400cea4417e36337267d3b26591db

--
Julien Grall

Re: [PATCH v3 00/21] .map_sg() error cleanup

2021-08-09 Thread Christoph Hellwig

Thanks,

I've applied this to the dma-mapping tree with a few minor cosmetic
tweaks.

[PATCH 4/4] libs/gnttab: Use XEN_PAGE_* definitions

These changes refine the changes in d1b32abd which added a dependency to
xenctrl library. We use the XEN_PAGE_* definitions instead of the XC_PAGE_*
definitions and therefore we get rid of the unnecessary dependency.

Signed-off-by: Costin Lupu 
---
 tools/libs/gnttab/freebsd.c | 20 ++--
 tools/libs/gnttab/linux.c   | 20 ++--
 tools/libs/gnttab/netbsd.c  | 20 ++--
 3 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/tools/libs/gnttab/freebsd.c b/tools/libs/gnttab/freebsd.c
index e42ac3fbf3..7ecb0e3b38 100644
--- a/tools/libs/gnttab/freebsd.c
+++ b/tools/libs/gnttab/freebsd.c
@@ -28,9 +28,9 @@
 #include 
 #include 
 
+#include 
 #include 
 
-#include 
 #include 
 
 #include "private.h"
@@ -74,7 +74,7 @@ void *osdep_gnttab_grant_map(xengnttab_handle *xgt,
 int domids_stride;
 unsigned int refs_size = ROUNDUP(count *
  sizeof(struct ioctl_gntdev_grant_ref),
- XC_PAGE_SHIFT);
+ XEN_PAGE_SHIFT);
 int os_page_size = getpagesize();
 
 domids_stride = (flags & XENGNTTAB_GRANT_MAP_SINGLE_DOMAIN) ? 0 : 1;
@@ -105,7 +105,7 @@ void *osdep_gnttab_grant_map(xengnttab_handle *xgt,
 goto out;
 }
 
-addr = mmap(NULL, XC_PAGE_SIZE * count, prot, MAP_SHARED, fd,
+addr = mmap(NULL, XEN_PAGE_SIZE * count, prot, MAP_SHARED, fd,
 map.index);
 if ( addr != MAP_FAILED )
 {
@@ -114,7 +114,7 @@ void *osdep_gnttab_grant_map(xengnttab_handle *xgt,
 
 notify.index = map.index;
 notify.action = 0;
-if ( notify_offset < XC_PAGE_SIZE * count )
+if ( notify_offset < XEN_PAGE_SIZE * count )
 {
 notify.index += notify_offset;
 notify.action |= UNMAP_NOTIFY_CLEAR_BYTE;
@@ -129,7 +129,7 @@ void *osdep_gnttab_grant_map(xengnttab_handle *xgt,
 if ( rv )
 {
 GTERROR(xgt->logger, "ioctl SET_UNMAP_NOTIFY failed");
-munmap(addr, count * XC_PAGE_SIZE);
+munmap(addr, count * XEN_PAGE_SIZE);
 addr = MAP_FAILED;
 }
 }
@@ -187,7 +187,7 @@ int osdep_gnttab_unmap(xengnttab_handle *xgt,
 }
 
 /* Next, unmap the memory. */
-if ( (rc = munmap(start_address, count * XC_PAGE_SIZE)) )
+if ( (rc = munmap(start_address, count * XEN_PAGE_SIZE)) )
 return rc;
 
 /* Finally, unmap the driver slots used to store the grant information. */
@@ -254,7 +254,7 @@ void *osdep_gntshr_share_pages(xengntshr_handle *xgs,
 goto out;
 }
 
-area = mmap(NULL, count * XC_PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
+area = mmap(NULL, count * XEN_PAGE_SIZE, PROT_READ | PROT_WRITE, 
MAP_SHARED,
 fd, gref_info.index);
 
 if ( area == MAP_FAILED )
@@ -266,7 +266,7 @@ void *osdep_gntshr_share_pages(xengntshr_handle *xgs,
 
 notify.index = gref_info.index;
 notify.action = 0;
-if ( notify_offset < XC_PAGE_SIZE * count )
+if ( notify_offset < XEN_PAGE_SIZE * count )
 {
 notify.index += notify_offset;
 notify.action |= UNMAP_NOTIFY_CLEAR_BYTE;
@@ -281,7 +281,7 @@ void *osdep_gntshr_share_pages(xengntshr_handle *xgs,
 if ( err )
 {
 GSERROR(xgs->logger, "ioctl SET_UNMAP_NOTIFY failed");
-munmap(area, count * XC_PAGE_SIZE);
+munmap(area, count * XEN_PAGE_SIZE);
 area = NULL;
 }
 
@@ -304,7 +304,7 @@ void *osdep_gntshr_share_pages(xengntshr_handle *xgs,
 int osdep_gntshr_unshare(xengntshr_handle *xgs,
  void *start_address, uint32_t count)
 {
-return munmap(start_address, count * XC_PAGE_SIZE);
+return munmap(start_address, count * XEN_PAGE_SIZE);
 }
 
 /*
diff --git a/tools/libs/gnttab/linux.c b/tools/libs/gnttab/linux.c
index 5628fd5719..11f1acb771 100644
--- a/tools/libs/gnttab/linux.c
+++ b/tools/libs/gnttab/linux.c
@@ -29,10 +29,10 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
-#include 
 #include 
 
 #include "private.h"
@@ -101,7 +101,7 @@ void *osdep_gnttab_grant_map(xengnttab_handle *xgt,
 map = alloca(map_size);
 else
 {
-map_size = ROUNDUP(map_size, XC_PAGE_SHIFT);
+map_size = ROUNDUP(map_size, XEN_PAGE_SHIFT);
 map = mmap(NULL, map_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_POPULATE, -1, 0);
 if ( map == MAP_FAILED )
@@ -125,7 +125,7 @@ void *osdep_gnttab_grant_map(xengnttab_handle *xgt,
 }
 
  retry:
-addr = mmap(NULL, XC_PAGE_SIZE * count, prot, MAP_SHARED, fd,
+addr = mmap(NULL, XEN_PAGE_SIZE * count, prot, MAP_SHARED, fd,
 map->index);
 
 if (addr == MAP_FAILED && errno == EAGAIN)
@@ -150,7 +150,7 @@ void *osdep_gnttab_grant_map(xengnttab_handle *xgt,
 struct ioctl_gntdev_unmap_notify notify;
 notify.index = map->index;
 notify.action = 0;
-if (notify_offset < XC_PAGE_SIZE * co

[PATCH 3/4] libs/foreignmemory: Use XEN_PAGE_* definitions

These changes refine the changes in 0dbb4be7 which added a dependency to
xenctrl library. We use the XEN_PAGE_* definitions instead of the XC_PAGE_*
definitions and therefore we get rid of the unnecessary dependency.

Signed-off-by: Costin Lupu 
---
 tools/libs/foreignmemory/core.c|  2 +-
 tools/libs/foreignmemory/freebsd.c | 10 +-
 tools/libs/foreignmemory/linux.c   | 18 +-
 tools/libs/foreignmemory/minios.c  | 10 +-
 tools/libs/foreignmemory/netbsd.c  | 10 +-
 tools/libs/foreignmemory/private.h |  2 +-
 tools/libs/foreignmemory/solaris.c |  6 +++---
 7 files changed, 25 insertions(+), 33 deletions(-)

diff --git a/tools/libs/foreignmemory/core.c b/tools/libs/foreignmemory/core.c
index 7edc6f0dbf..ad1ad9fc67 100644
--- a/tools/libs/foreignmemory/core.c
+++ b/tools/libs/foreignmemory/core.c
@@ -202,7 +202,7 @@ int xenforeignmemory_resource_size(
 if ( rc )
 return rc;
 
-*size = fres.nr_frames << XC_PAGE_SHIFT;
+*size = fres.nr_frames << XEN_PAGE_SHIFT;
 return 0;
 }
 
diff --git a/tools/libs/foreignmemory/freebsd.c 
b/tools/libs/foreignmemory/freebsd.c
index 2cf0fa1c38..9439c4ca6a 100644
--- a/tools/libs/foreignmemory/freebsd.c
+++ b/tools/libs/foreignmemory/freebsd.c
@@ -63,7 +63,7 @@ void *osdep_xenforeignmemory_map(xenforeignmemory_handle 
*fmem,
 privcmd_mmapbatch_t ioctlx;
 int rc;
 
-addr = mmap(addr, num << XC_PAGE_SHIFT, prot, flags | MAP_SHARED, fd, 0);
+addr = mmap(addr, num << XEN_PAGE_SHIFT, prot, flags | MAP_SHARED, fd, 0);
 if ( addr == MAP_FAILED )
 return NULL;
 
@@ -78,7 +78,7 @@ void *osdep_xenforeignmemory_map(xenforeignmemory_handle 
*fmem,
 {
 int saved_errno = errno;
 
-(void)munmap(addr, num << XC_PAGE_SHIFT);
+(void)munmap(addr, num << XEN_PAGE_SHIFT);
 errno = saved_errno;
 return NULL;
 }
@@ -89,7 +89,7 @@ void *osdep_xenforeignmemory_map(xenforeignmemory_handle 
*fmem,
 int osdep_xenforeignmemory_unmap(xenforeignmemory_handle *fmem,
  void *addr, size_t num)
 {
-return munmap(addr, num << XC_PAGE_SHIFT);
+return munmap(addr, num << XEN_PAGE_SHIFT);
 }
 
 int osdep_xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
@@ -101,7 +101,7 @@ int osdep_xenforeignmemory_restrict(xenforeignmemory_handle 
*fmem,
 int osdep_xenforeignmemory_unmap_resource(xenforeignmemory_handle *fmem,
 xenforeignmemory_resource_handle *fres)
 {
-return fres ? munmap(fres->addr, fres->nr_frames << XC_PAGE_SHIFT) : 0;
+return fres ? munmap(fres->addr, fres->nr_frames << XEN_PAGE_SHIFT) : 0;
 }
 
 int osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
@@ -120,7 +120,7 @@ int 
osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
 /* Request for resource size.  Skip mmap(). */
 goto skip_mmap;
 
-fres->addr = mmap(fres->addr, fres->nr_frames << XC_PAGE_SHIFT,
+fres->addr = mmap(fres->addr, fres->nr_frames << XEN_PAGE_SHIFT,
   fres->prot, fres->flags | MAP_SHARED, fmem->fd, 0);
 if ( fres->addr == MAP_FAILED )
 return -1;
diff --git a/tools/libs/foreignmemory/linux.c b/tools/libs/foreignmemory/linux.c
index 9062117407..9dabf28cae 100644
--- a/tools/libs/foreignmemory/linux.c
+++ b/tools/libs/foreignmemory/linux.c
@@ -134,7 +134,7 @@ static int retry_paged(int fd, uint32_t dom, void *addr,
 /* At least one gfn is still in paging state */
 ioctlx.num = 1;
 ioctlx.dom = dom;
-ioctlx.addr = (unsigned long)addr + (inr_frames << XC_PAGE_SHIFT) : 0;
+return fres ? munmap(fres->addr, fres->nr_frames << XEN_PAGE_SHIFT) : 0;
 }
 
 int osdep_xenforeignmemory_map_resource(
@@ -313,7 +313,7 @@ int osdep_xenforeignmemory_map_resource(
 /* Request for resource size.  Skip mmap(). */
 goto skip_mmap;
 
-fres->addr = mmap(fres->addr, fres->nr_frames << XC_PAGE_SHIFT,
+fres->addr = mmap(fres->addr, fres->nr_frames << XEN_PAGE_SHIFT,
   fres->prot, fres->flags | MAP_SHARED, fmem->fd, 0);
 if ( fres->addr == MAP_FAILED )
 return -1;
diff --git a/tools/libs/foreignmemory/minios.c 
b/tools/libs/foreignmemory/minios.c
index f2f4dfb2be..2454eb9af3 100644
--- a/tools/libs/foreignmemory/minios.c
+++ b/tools/libs/foreignmemory/minios.c
@@ -17,14 +17,6 @@
  * Copyright 2007-2008 Samuel Thibault .
  */
 
-/*
- * xenctrl.h currently defines __XEN_TOOLS__ which affects what is
- * exposed by Xen headers. As the define needs to be set consistently,
- * we want to include xenctrl.h before the mini-os headers (they include
- * public headers).
- */
-#include 
-
 #include 
 #include 
 #include 
@@ -63,7 +55,7 @@ void *osdep_xenforeignmemory_map(xenforeignmemory_handle 
*fmem,
 int osdep_xenforeignmemory_unmap(xenforeignmemory_handle *fmem,
  void *addr, size_t num)
 {
-return munmap(addr, n

Re: [RFC PATCH] xen/memory: Introduce a hypercall to provide unallocated space


Hi,

I am writing down here what we discussed on another thread and on IRC. 
This will be easier to track in a single thread.


On 04/08/2021 23:00, Julien Grall wrote:

On 04/08/2021 21:56, Oleksandr wrote:
Now, I am wondering, would it be possible to update/clarify the 
current "reg" purpose and use it to pass a safe unallocated space for 
any Xen specific mappings (grant, foreign, whatever) instead of just 
for the grant table region. In case, it is not allowed for any reason 
(compatibility PoV, etc), would it be possible to extend a property by 
passing an extra range separately, something similar to how I 
described above?


I think it should be fine to re-use the same region so long the size of 
the first bank is at least the size of the original region.


While answering to the DT binding question on the DT ML, I realized that 
this is probably not going to be fine because there is a bug in Xen when 
mapping grant-table frame.


The function gnttab_map_frame() is used to map the grant table frame. If 
there is an old mapping, it will first remove it.


The function is using the helper gnttab_map_frame() to find the 
corresponding GFN or return INVALID_GFN if not mapped.


On Arm, gnttab_map_frame() is implementing using an array index by the 
grant table frame number. The trouble is we don't update the array when 
the page is unmapped. So if the GFN is re-used before the grant-table is 
remapped, then we will end up to remove whatever was mapped there (this 
could be a foreign page...).


This behavior already happens today as the toolstack will use the first 
GFN of the region if Linux doesn't support the acquire resource 
interface. We are getting away in the Linux because the toolstack only 
map the first grant table frame and:
 - Newer Linux will not used the region provided by the DT and nothing 
will be mapped there.
 - Older Linux will use the region but still map the grant table frame 
0 to the same GFN.


I am not sure about U-boot and other OSes here.

This is not new but it is going to be become a bigger source of problem 
(read more chance to hit it) as we try to re-use the first region.


This means the first region should exclusively used for the grant-table 
(in a specific order) until the issue is properly fixed.


A potential fix is to update the array in p2m_put_l3_page(). The default 
max size of the array is 1024, so it might be fine to just walk it (it 
would be simply a comparison).


Note that this is not a problem on x86 because the is using the M2P. So 
when a mapping is removed, the mapping MFN -> GFN will also be removed.


Cheers,

--
Julien Grall

[PATCH 1/4] public: Add page related definitions for accessing guests memory

These changes introduce the page related definitions needed for mapping and
accessing guests memory. These values are intended to be used by any toolstack
component that needs to map guests memory. Until now, the values were defined
by the xenctrl.h header, therefore whenever a component had to use them it also
had to add a dependency for the xenctrl library.

For this patch we set the same values for both x86 and ARM architectures.

Signed-off-by: Costin Lupu 
---
 xen/include/public/arch-arm/page.h | 34 ++
 xen/include/public/arch-x86/page.h | 34 ++
 xen/include/public/page.h  | 38 ++
 3 files changed, 106 insertions(+)
 create mode 100644 xen/include/public/arch-arm/page.h
 create mode 100644 xen/include/public/arch-x86/page.h
 create mode 100644 xen/include/public/page.h

diff --git a/xen/include/public/arch-arm/page.h 
b/xen/include/public/arch-arm/page.h
new file mode 100644
index 00..e970feb49c
--- /dev/null
+++ b/xen/include/public/arch-arm/page.h
@@ -0,0 +1,34 @@
+/**
+ * page.h
+ *
+ * Page definitions for accessing guests memory on ARM
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2021, Costin Lupu
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_ARM_PAGE_H__
+#define __XEN_PUBLIC_ARCH_ARM_PAGE_H__
+
+#define XEN_PAGE_SHIFT   12
+#define XEN_PAGE_SIZE(1UL << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK(~(XEN_PAGE_SIZE - 1))
+
+#endif /* __XEN_PUBLIC_ARCH_ARM_PAGE_H__ */
diff --git a/xen/include/public/arch-x86/page.h 
b/xen/include/public/arch-x86/page.h
new file mode 100644
index 00..b1924ea3cb
--- /dev/null
+++ b/xen/include/public/arch-x86/page.h
@@ -0,0 +1,34 @@
+/**
+ * page.h
+ *
+ * Page definitions for accessing guests memory on x86
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2021, Costin Lupu
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_PAGE_H__
+#define __XEN_PUBLIC_ARCH_X86_PAGE_H__
+
+#define XEN_PAGE_SHIFT   12
+#define XEN_PAGE_SIZE(1UL << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK(~(XEN_PAGE_SIZE - 1))
+
+#endif /* __XEN_PUBLIC_ARCH_X86_PAGE_H__ */
diff --git a/xen/include/public/page.h b/xen/include/public/page.h
new file mode 100644
index 00..d3e95fdb4a
--- /dev/null
+++ b/xen/include/public/page.h
@@ -0,0 +1,38 @@
+/**
+ * page.h
+ *
+ * Page definitions for accessing guests memory
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the

[PATCH 2/4] libs/ctrl: Use Xen values for XC_PAGE_* definitions

We use the values provided by the Xen public interface for defining the
XC_PAGE_* macros.

Signed-off-by: Costin Lupu 
---
 tools/include/xenctrl.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 14adaa0c10..90bb969fa0 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -54,10 +54,11 @@
 #include 
 #include 
 #endif
+#include 
 
-#define XC_PAGE_SHIFT   12
-#define XC_PAGE_SIZE(1UL << XC_PAGE_SHIFT)
-#define XC_PAGE_MASK(~(XC_PAGE_SIZE-1))
+#define XC_PAGE_SHIFT   XEN_PAGE_SHIFT
+#define XC_PAGE_SIZEXEN_PAGE_SIZE
+#define XC_PAGE_MASKXEN_PAGE_MASK
 
 #define INVALID_MFN  (~0UL)
 
-- 
2.20.1

[PATCH 0/4] Introduce XEN_PAGE_* definitions for mapping guests memory

This series tries to fix a side-effect introduced by commits 0dbb4be7 and
d1b32abd which added a dependency to xenctrl for foreignmemory and gnntab
libraries library only because they needed to use the XC_PAGE_* values.

These changes introduce the XEN_PAGE_* definitions that will be used by any
toolstack component that doesn't need a dependency to xenctrl library.  

Costin Lupu (4):
  public: Add page related definitions for accessing guests memory
  libs/ctrl: Use Xen values for XC_PAGE_* definitions
  libs/foreignmemory: Use XEN_PAGE_* definitions
  libs/gnttab: Use XEN_PAGE_* definitions

 tools/include/xenctrl.h|  7 +++---
 tools/libs/foreignmemory/core.c|  2 +-
 tools/libs/foreignmemory/freebsd.c | 10 
 tools/libs/foreignmemory/linux.c   | 18 +++---
 tools/libs/foreignmemory/minios.c  | 10 +---
 tools/libs/foreignmemory/netbsd.c  | 10 
 tools/libs/foreignmemory/private.h |  2 +-
 tools/libs/foreignmemory/solaris.c |  6 ++---
 tools/libs/gnttab/freebsd.c| 20 
 tools/libs/gnttab/linux.c  | 20 
 tools/libs/gnttab/netbsd.c | 20 
 xen/include/public/arch-arm/page.h | 34 ++
 xen/include/public/arch-x86/page.h | 34 ++
 xen/include/public/page.h  | 38 ++
 14 files changed, 165 insertions(+), 66 deletions(-)
 create mode 100644 xen/include/public/arch-arm/page.h
 create mode 100644 xen/include/public/arch-x86/page.h
 create mode 100644 xen/include/public/page.h

-- 
2.20.1

[PATCH] xen/bitmap: Make bitmap_long_to_byte() and bitmap_byte_to_long() static

2021-08-09 Thread Jane Malalane

Functions made static as there are no external callers.

Suggested-by: Andrew Cooper 
Signed-off-by: Jane Malalane 
---
CC: Andrew Cooper 
CC: George Dunlap 
CC: Ian Jackson 
CC: Jan Beulich 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Wei Liu 
---
 xen/common/bitmap.c  | 8 
 xen/include/xen/bitmap.h | 3 ---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/xen/common/bitmap.c b/xen/common/bitmap.c
index b7aa2db7c7..bbc3554ae1 100644
--- a/xen/common/bitmap.c
+++ b/xen/common/bitmap.c
@@ -338,7 +338,7 @@ EXPORT_SYMBOL(bitmap_allocate_region);
 
 #ifdef __BIG_ENDIAN
 
-void bitmap_long_to_byte(uint8_t *bp, const unsigned long *lp, int nbits)
+static void bitmap_long_to_byte(uint8_t *bp, const unsigned long *lp, int 
nbits)
 {
unsigned long l;
int i, j, b;
@@ -354,7 +354,7 @@ void bitmap_long_to_byte(uint8_t *bp, const unsigned long 
*lp, int nbits)
clamp_last_byte(bp, nbits);
 }
 
-void bitmap_byte_to_long(unsigned long *lp, const uint8_t *bp, int nbits)
+static void bitmap_byte_to_long(unsigned long *lp, const uint8_t *bp, int 
nbits)
 {
unsigned long l;
int i, j, b;
@@ -371,13 +371,13 @@ void bitmap_byte_to_long(unsigned long *lp, const uint8_t 
*bp, int nbits)
 
 #elif defined(__LITTLE_ENDIAN)
 
-void bitmap_long_to_byte(uint8_t *bp, const unsigned long *lp, int nbits)
+static void bitmap_long_to_byte(uint8_t *bp, const unsigned long *lp, int 
nbits)
 {
memcpy(bp, lp, (nbits+7)/8);
clamp_last_byte(bp, nbits);
 }
 
-void bitmap_byte_to_long(unsigned long *lp, const uint8_t *bp, int nbits)
+static void bitmap_byte_to_long(unsigned long *lp, const uint8_t *bp, int 
nbits)
 {
/* We may need to pad the final longword with zeroes. */
if (nbits & (BITS_PER_LONG-1))
diff --git a/xen/include/xen/bitmap.h b/xen/include/xen/bitmap.h
index 634a259645..e9175ab54a 100644
--- a/xen/include/xen/bitmap.h
+++ b/xen/include/xen/bitmap.h
@@ -270,9 +270,6 @@ static inline void bitmap_clear(unsigned long *map, 
unsigned int start,
 #undef bitmap_switch
 #undef bitmap_bytes
 
-void bitmap_long_to_byte(uint8_t *bp, const unsigned long *lp, int nbits);
-void bitmap_byte_to_long(unsigned long *lp, const uint8_t *bp, int nbits);
-
 struct xenctl_bitmap;
 int xenctl_bitmap_to_bitmap(unsigned long *bitmap,
 const struct xenctl_bitmap *xenctl_bitmap,
-- 
2.11.0

Re: [XEN PATCH v6 20/31] build: generate "include/xen/compile.h" with filechk

On Thu, Aug 05, 2021 at 09:20:10AM +0200, Jan Beulich wrote:
> On 01.07.2021 16:10, Anthony PERARD wrote:
> > This will always try regenerate the content of compile.h, but if it
> > didn't change the file isn't updated.
> > 
> > Also, as it's currently the case, the file isn't regenerated during
> > `sudo make install` if it exist and does belong to a different user.
> > 
> > Thus, we can remove the target "delete-unfresh-files".
> > Target $(TARGET) still need a phony dependency, so add FORCE.
> > 
> > This patch imports the macro 'filechk' from Linux v5.12.
> 
> Would you mind clarifying why $(if_changed ...) cannot be used here
> (unlike for .banner in the earlier patch)?

if_changed can be used instead of filechk. I probably use "filechk"
because I was looking for an excuse to use it, so I've used it here.

filechk advantage over if_changed is that the output of the command is
compared so there is no need to have an extra dependency (.*.cmd) file
generated. That probably mostly an advantage when the generated file
changed often, or when the command is simple enough.

But it seems that "filechk" is only used once in this patch series, in
this patch. So I can rework the patch to use "if_changed" instead, that
would avoid the need to import another macro from Linux, and avoid the
weird need to have the command "cat" the target when update isn't
wanted.

Thanks,

-- 
Anthony PERARD

[ovmf test] 164142: all pass - PUSHED

flight 164142 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164142/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf d02dbb53cd78de799e6afaa237e98771fb5148db
baseline version:
 ovmf 4de77ae9890d241271f543e9195ab3516f3abec6

Last test of basis   164139  2021-08-09 03:11:13 Z0 days
Testing same since   164142  2021-08-09 11:42:25 Z0 days1 attempts


People who touched revisions under test:
  DunTan 
  Rodrigo Gonzalez del Cueto 
  Zhiguang Liu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   4de77ae989..d02dbb53cd  d02dbb53cd78de799e6afaa237e98771fb5148db -> 
xen-tested-master

Re: [XEN PATCH v6 19/31] build: rework .banner generation

On Thu, Aug 05, 2021 at 09:09:13AM +0200, Jan Beulich wrote:
> On 01.07.2021 16:09, Anthony PERARD wrote:
> > Avoid depending on Makefile but still allow to rebuild the banner when
> > $(XEN_FULLVERSION) changes.
> > 
> > Also add a dependency on tools/xen.flf, even if not expected to
> > change.
> > 
> > Signed-off-by: Anthony PERARD 
> 
> Reviewed-by: Jan Beulich 
> 
> This looks to be independent of earlier patches in this series? If so,

Yes, it's independent.

> I'd be happy to commit without waiting for earlier patches to get
> review comments addressed.

Thanks,

-- 
Anthony PERARD

Re: [XEN PATCH v6 18/31] xen: move include/asm-* to include/arch-*/asm

On Thu, Aug 05, 2021 at 09:04:18AM +0200, Jan Beulich wrote:
> On 01.07.2021 16:09, Anthony PERARD wrote:
> > This avoid the need to create the symbolic link "include/asm".
> > 
> > Signed-off-by: Anthony PERARD 
> > ---
> > 
> > Other possible locations that I could think of:
> > include/arch/*/asm
> > arch/*/include/asm
> 
> I thought it was always the plan to follow Linux (and kind of XTF) in
> this regard, using the latter of these options?

I'm not sure what the plan was, but putting the arch specific headers
in arch/ sound good. I'll rework the patch.

> > --- a/xen/include/xen/bitmap.h
> > +++ b/xen/include/xen/bitmap.h
> > @@ -14,7 +14,7 @@
> >   *
> >   * Function implementations generic to all architectures are in
> >   * lib/bitmap.c.  Functions implementations that are architecture
> > - * specific are in various include/asm-/bitops.h headers
> > + * specific are in various include/arch-/asm/bitops.h headers
> 
> Then, just to take this as an example, referring to just asm/bitops.h
> in comments might be enough (limiting churn on some of the ones that
> you're altering)?

Sound good.

Thanks,

-- 
Anthony PERARD

[linux-linus test] 164138: regressions - FAIL

flight 164138 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164138/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-arm64-arm64-xl-thunderx 13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm  14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1  13 debian-fixup fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 20 guest-stop   fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2  13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm 13 debian-fixup fail REGR. vs. 152332
 test-amd64-amd64-qemuu-freebsd11-amd64 21 guest-start/freebsd.repeat fail 
REGR. vs. 152332
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 21 leak-check/check fail 
REGR. vs. 152332

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152332
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saveresto

[xen-unstable test] 164137: trouble: broken/fail/pass

flight 164137 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164137/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-xsm  broken

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-xsm   5 host-install(5)  broken pass in 164129

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm 15 migrate-support-check fail in 164129 never pass
 test-arm64-arm64-xl-xsm 16 saverestore-support-check fail in 164129 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 164129
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 164129
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 164129
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 164129
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 164129
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 164129
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 164129
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 164129
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 164129
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 164129
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 164129
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  2b45ff60301a988badec526846e77b538383ae63
baseline version:
 xen  2b45ff60301a988badec526846e77b538383ae63

Last test of basis   164137  2021-08-09 01:53:59 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

[ovmf test] 164139: all pass - PUSHED

flight 164139 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164139/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 4de77ae9890d241271f543e9195ab3516f3abec6
baseline version:
 ovmf 97fdcbda4e69d6f085ec3f2bd9d29a04af2b50a4

Last test of basis   164114  2021-08-05 21:40:02 Z3 days
Testing same since   164139  2021-08-09 03:11:13 Z0 days1 attempts


People who touched revisions under test:
  Jason Lou 
  Lou, Yun 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   97fdcbda4e..4de77ae989  4de77ae9890d241271f543e9195ab3516f3abec6 -> 
xen-tested-master

[libvirt test] 164140: regressions - FAIL