Re: [linux-linus test] 183794: regressions - FAIL
Hi Juergen, On 23/11/2023 05:57, Juergen Gross wrote: On 23.11.23 00:07, Stefano Stabellini wrote: On Wed, 22 Nov 2023, Juergen Gross wrote: On 22.11.23 04:07, Stefano Stabellini wrote: On Mon, 20 Nov 2023, Stefano Stabellini wrote: On Mon, 20 Nov 2023, Juergen Gross wrote: On 20.11.23 03:21, osstest service owner wrote: flight 183794 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/183794/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-examine 8 reboot fail REGR. vs. 183766 I'm seeing the following in the serial log: Nov 20 00:25:41.586712 [ 0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164! Nov 20 00:25:41.598711 [ 0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP The related source code lines in the kernel are: err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu), &info); BUG_ON(err); I suspect commit 20f3b8eafe0ba to be the culprit. Stefano, could you please have a look? The good news and bad news is that I cannot repro this neither with nor without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but I cannot see anything wrong with it. Looking at the register dump, from: x0 : fffa I am guessing the error was -ENXIO which is returned from map_guest_area in Xen. Could it be that the struct is crossing a page boundary? Or that it is not 64-bit aligned? Do we need to do something like the following? diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 9afdc4c4a5dc..5326070c5dc0 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -484,7 +485,7 @@ static int __init xen_guest_init(void) * for secondary CPUs as they are brought up. * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. */ - xen_vcpu_info = alloc_percpu(struct vcpu_info); + xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE); if (xen_vcpu_info == NULL) return -ENOMEM; May I suggest to use a smaller alignment? What about: 1 << fls(sizeof(struct vcpu_info) - 1) See below --- [PATCH] arm/xen: fix xen_vcpu_info allocation alignment Stefano, are you going to submit the patch formally? xen_vcpu_info is a percpu area than needs to be mapped by Xen. Currently, it could cross a page boundary resulting in Xen being unable to map it: [ 0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164! [ 0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP Fix the issue by using __alloc_percpu and requesting alignment for the memory allocation. Signed-off-by: Stefano Stabellini I am guessing we want to backport it. So should this contain a tag to indicate the intention? diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 9afdc4c4a5dc..09eb74a07dfc 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -484,7 +484,8 @@ static int __init xen_guest_init(void) * for secondary CPUs as they are brought up. * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. */ - xen_vcpu_info = alloc_percpu(struct vcpu_info); + xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info), + 1 << fls(sizeof(struct vcpu_info) - 1)); Nit: one tab less, please (can be fixed while committing). if (xen_vcpu_info == NULL) return -ENOMEM; Reviewed-by: Juergen Gross Juergen, looking at the x86 code, you seem to use DEFINE_PER_CPU(). So what guarantees that this is not going to cross a page? Cheers, -- Julien Grall
Re: [linux-linus test] 183794: regressions - FAIL
On 23.11.23 00:07, Stefano Stabellini wrote: On Wed, 22 Nov 2023, Juergen Gross wrote: On 22.11.23 04:07, Stefano Stabellini wrote: On Mon, 20 Nov 2023, Stefano Stabellini wrote: On Mon, 20 Nov 2023, Juergen Gross wrote: On 20.11.23 03:21, osstest service owner wrote: flight 183794 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/183794/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-examine 8 reboot fail REGR. vs. 183766 I'm seeing the following in the serial log: Nov 20 00:25:41.586712 [0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164! Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP The related source code lines in the kernel are: err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu), &info); BUG_ON(err); I suspect commit 20f3b8eafe0ba to be the culprit. Stefano, could you please have a look? The good news and bad news is that I cannot repro this neither with nor without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but I cannot see anything wrong with it. Looking at the register dump, from: x0 : fffa I am guessing the error was -ENXIO which is returned from map_guest_area in Xen. Could it be that the struct is crossing a page boundary? Or that it is not 64-bit aligned? Do we need to do something like the following? diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 9afdc4c4a5dc..5326070c5dc0 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -484,7 +485,7 @@ static int __init xen_guest_init(void) * for secondary CPUs as they are brought up. * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. */ - xen_vcpu_info = alloc_percpu(struct vcpu_info); + xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE); if (xen_vcpu_info == NULL) return -ENOMEM; May I suggest to use a smaller alignment? What about: 1 << fls(sizeof(struct vcpu_info) - 1) See below --- [PATCH] arm/xen: fix xen_vcpu_info allocation alignment xen_vcpu_info is a percpu area than needs to be mapped by Xen. Currently, it could cross a page boundary resulting in Xen being unable to map it: [0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164! [0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP Fix the issue by using __alloc_percpu and requesting alignment for the memory allocation. Signed-off-by: Stefano Stabellini diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 9afdc4c4a5dc..09eb74a07dfc 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -484,7 +484,8 @@ static int __init xen_guest_init(void) * for secondary CPUs as they are brought up. * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. */ - xen_vcpu_info = alloc_percpu(struct vcpu_info); + xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info), + 1 << fls(sizeof(struct vcpu_info) - 1)); Nit: one tab less, please (can be fixed while committing). if (xen_vcpu_info == NULL) return -ENOMEM; Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [linux-linus test] 183794: regressions - FAIL
On Wed, 22 Nov 2023, Juergen Gross wrote: > On 22.11.23 04:07, Stefano Stabellini wrote: > > On Mon, 20 Nov 2023, Stefano Stabellini wrote: > > > On Mon, 20 Nov 2023, Juergen Gross wrote: > > > > On 20.11.23 03:21, osstest service owner wrote: > > > > > flight 183794 linux-linus real [real] > > > > > http://logs.test-lab.xenproject.org/osstest/logs/183794/ > > > > > > > > > > Regressions :-( > > > > > > > > > > Tests which did not succeed and are blocking, > > > > > including tests which could not be run: > > > > >test-arm64-arm64-examine 8 reboot fail REGR. > > > > > vs. > > > > > 183766 > > > > > > > > I'm seeing the following in the serial log: > > > > > > > > Nov 20 00:25:41.586712 [0.567318] kernel BUG at > > > > arch/arm64/xen/../../arm/xen/enlighten.c:164! > > > > Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG: > > > > f2000800 [#1] PREEMPT SMP > > > > > > > > The related source code lines in the kernel are: > > > > > > > > err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, > > > > xen_vcpu_nr(cpu), > > > > &info); > > > > BUG_ON(err); > > > > > > > > I suspect commit 20f3b8eafe0ba to be the culprit. > > > > > > > > Stefano, could you please have a look? > > > > The good news and bad news is that I cannot repro this neither with nor > > without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but > > I cannot see anything wrong with it. Looking at the register dump, from: > > > > x0 : fffa > > > > I am guessing the error was -ENXIO which is returned from map_guest_area > > in Xen. > > > > Could it be that the struct is crossing a page boundary? Or that it is > > not 64-bit aligned? Do we need to do something like the following? > > > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > > index 9afdc4c4a5dc..5326070c5dc0 100644 > > --- a/arch/arm/xen/enlighten.c > > +++ b/arch/arm/xen/enlighten.c > > @@ -484,7 +485,7 @@ static int __init xen_guest_init(void) > > * for secondary CPUs as they are brought up. > > * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. > > */ > > - xen_vcpu_info = alloc_percpu(struct vcpu_info); > > + xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE); > > if (xen_vcpu_info == NULL) > > return -ENOMEM; > > > > May I suggest to use a smaller alignment? What about: > > 1 << fls(sizeof(struct vcpu_info) - 1) See below --- [PATCH] arm/xen: fix xen_vcpu_info allocation alignment xen_vcpu_info is a percpu area than needs to be mapped by Xen. Currently, it could cross a page boundary resulting in Xen being unable to map it: [0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164! [0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP Fix the issue by using __alloc_percpu and requesting alignment for the memory allocation. Signed-off-by: Stefano Stabellini diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 9afdc4c4a5dc..09eb74a07dfc 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -484,7 +484,8 @@ static int __init xen_guest_init(void) * for secondary CPUs as they are brought up. * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. */ - xen_vcpu_info = alloc_percpu(struct vcpu_info); + xen_vcpu_info = __alloc_percpu(sizeof(struct vcpu_info), + 1 << fls(sizeof(struct vcpu_info) - 1)); if (xen_vcpu_info == NULL) return -ENOMEM;
Re: [linux-linus test] 183794: regressions - FAIL
On 22.11.23 04:07, Stefano Stabellini wrote: On Mon, 20 Nov 2023, Stefano Stabellini wrote: On Mon, 20 Nov 2023, Juergen Gross wrote: On 20.11.23 03:21, osstest service owner wrote: flight 183794 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/183794/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-examine 8 reboot fail REGR. vs. 183766 I'm seeing the following in the serial log: Nov 20 00:25:41.586712 [0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164! Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP The related source code lines in the kernel are: err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu), &info); BUG_ON(err); I suspect commit 20f3b8eafe0ba to be the culprit. Stefano, could you please have a look? The good news and bad news is that I cannot repro this neither with nor without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but I cannot see anything wrong with it. Looking at the register dump, from: x0 : fffa I am guessing the error was -ENXIO which is returned from map_guest_area in Xen. Could it be that the struct is crossing a page boundary? Or that it is not 64-bit aligned? Do we need to do something like the following? diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 9afdc4c4a5dc..5326070c5dc0 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -484,7 +485,7 @@ static int __init xen_guest_init(void) * for secondary CPUs as they are brought up. * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. */ - xen_vcpu_info = alloc_percpu(struct vcpu_info); + xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE); if (xen_vcpu_info == NULL) return -ENOMEM; May I suggest to use a smaller alignment? What about: 1 << fls(sizeof(struct vcpu_info) - 1) Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [linux-linus test] 183794: regressions - FAIL
On Mon, 20 Nov 2023, Stefano Stabellini wrote: > On Mon, 20 Nov 2023, Juergen Gross wrote: > > On 20.11.23 03:21, osstest service owner wrote: > > > flight 183794 linux-linus real [real] > > > http://logs.test-lab.xenproject.org/osstest/logs/183794/ > > > > > > Regressions :-( > > > > > > Tests which did not succeed and are blocking, > > > including tests which could not be run: > > > test-arm64-arm64-examine 8 reboot fail REGR. vs. > > > 183766 > > > > I'm seeing the following in the serial log: > > > > Nov 20 00:25:41.586712 [0.567318] kernel BUG at > > arch/arm64/xen/../../arm/xen/enlighten.c:164! > > Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG: > > f2000800 [#1] PREEMPT SMP > > > > The related source code lines in the kernel are: > > > > err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, > > xen_vcpu_nr(cpu), > > &info); > > BUG_ON(err); > > > > I suspect commit 20f3b8eafe0ba to be the culprit. > > > > Stefano, could you please have a look? The good news and bad news is that I cannot repro this neither with nor without CONFIG_UNMAP_KERNEL_AT_EL0. I looked at commit 20f3b8eafe0ba but I cannot see anything wrong with it. Looking at the register dump, from: x0 : fffa I am guessing the error was -ENXIO which is returned from map_guest_area in Xen. Could it be that the struct is crossing a page boundary? Or that it is not 64-bit aligned? Do we need to do something like the following? diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 9afdc4c4a5dc..5326070c5dc0 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -484,7 +485,7 @@ static int __init xen_guest_init(void) * for secondary CPUs as they are brought up. * For uniformity we use VCPUOP_register_vcpu_info even on cpu0. */ - xen_vcpu_info = alloc_percpu(struct vcpu_info); + xen_vcpu_info = __alloc_percpu(struct vcpu_info, PAGE_SIZE); if (xen_vcpu_info == NULL) return -ENOMEM;
Re: [linux-linus test] 183794: regressions - FAIL
On Mon, 20 Nov 2023, Juergen Gross wrote: > On 20.11.23 03:21, osstest service owner wrote: > > flight 183794 linux-linus real [real] > > http://logs.test-lab.xenproject.org/osstest/logs/183794/ > > > > Regressions :-( > > > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > test-arm64-arm64-examine 8 reboot fail REGR. vs. > > 183766 > > I'm seeing the following in the serial log: > > Nov 20 00:25:41.586712 [0.567318] kernel BUG at > arch/arm64/xen/../../arm/xen/enlighten.c:164! > Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG: > f2000800 [#1] PREEMPT SMP > > The related source code lines in the kernel are: > > err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu), > &info); > BUG_ON(err); > > I suspect commit 20f3b8eafe0ba to be the culprit. > > Stefano, could you please have a look? The original email somehow escaped my email filters and managed to skip my inbox. Hence, this is the first time I am seeing this commit. Today I ran out of time but I'll look at it tomorrow.
Re: [linux-linus test] 183794: regressions - FAIL
On 20.11.23 03:21, osstest service owner wrote: flight 183794 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/183794/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-examine 8 reboot fail REGR. vs. 183766 I'm seeing the following in the serial log: Nov 20 00:25:41.586712 [0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164! Nov 20 00:25:41.598711 [0.574002] Internal error: Oops - BUG: f2000800 [#1] PREEMPT SMP The related source code lines in the kernel are: err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, xen_vcpu_nr(cpu), &info); BUG_ON(err); I suspect commit 20f3b8eafe0ba to be the culprit. Stefano, could you please have a look? Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature