date:20240531

[xen-unstable-smoke test] 186216: tolerable all pass - PUSHED

2024-05-31 Thread osstest service owner

flight 186216 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186216/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  5f7606c048f7cca1a4301b321af70791c1d22378
baseline version:
 xen  03147e6837ff045dbc328be876b9600f7040c771

Last test of basis   186213  2024-05-31 22:02:09 Z0 days
Testing same since   186216  2024-06-01 02:00:22 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Oleksii Kurochko 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   03147e6837..5f7606c048  5f7606c048f7cca1a4301b321af70791c1d22378 -> smoke

[linux-linus test] 186212: tolerable FAIL - PUSHED

2024-05-31 Thread osstest service owner

flight 186212 linux-linus real [real]
flight 186215 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186212/
http://logs.test-lab.xenproject.org/osstest/logs/186215/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-credit2   8 xen-bootfail pass in 186215-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 186215 never pass
 test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 186215 never 
pass
 test-armhf-armhf-xl   8 xen-boot fail  like 186196
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 186196
 test-armhf-armhf-libvirt  8 xen-boot fail  like 186196
 test-armhf-armhf-libvirt-vhd  8 xen-boot fail  like 186196
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186202
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186202
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186202
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186202
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186202
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linuxd8ec19857b095b39d114ae299713bd8ea6c1e66a
baseline version:
 linux4a4be1ad3a6efea16c56615f31117590fd881358

Last test of basis   186202  2024-05-30 22:43:12 Z1 days
Testing same since   186212  2024-05-31 19:10:39 Z0 days1 attempts


People who touched revisions under test:
  Alexander Lobakin 
  Alexander Maltsev 
  Alexander Mikhalitsyn 
  Alexei Starovoitov 
  Andrii Nakryiko 
  Arun Ramadoss 
  Carolina Jubran 
  Christian Brauner 
  Daniel Borkmann 
  Dave Ertman 
  David S. Miller 
  Edward Adam Davis 
  Emil Renner Berthing 
  Eric Dumazet 
  Eric Garver 
  Florian Westphal 
  Friedrich Vock 
  Gal Pressman 
  Geliang Tang 
  Hariprasad Kelam 
  Hengqi Chen 
  Horatiu Vultur 
  Hui Wang 
  Ido Schimmel 
  Jacob Keller 
  Jakub Kicinski 
  Jakub Sitnicki 
  Jerry Ray 
  Jiri Olsa 
  John Fastabend 
  Jozsef Kadlecsik 
  Karim Ben Houcine 
  Kory Maincent 
  Krishneil Singh 
  Kuniyuki Iwashima 
  Linus Torvalds 
  Maher Sanalla 
  Mathieu Othacehe 
  Matt Jan

Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-31 Thread Andrew Cooper

On 28/05/2024 2:12 pm, Jan Beulich wrote:
> On 28.05.2024 14:30, Andrew Cooper wrote:
>> On 27/05/2024 2:37 pm, Jan Beulich wrote:
>>> On 27.05.2024 15:27, Jan Beulich wrote:
 On 24.05.2024 22:03, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>  
>  static always_inline unsigned int arch_ffs(unsigned int x)
>  {
> -int r;
> +unsigned int r;
> +
> +if ( __builtin_constant_p(x > 0) && x > 0 )
> +{
> +/* Safe, when the compiler knows that x is nonzero. */
> +asm ( "bsf %[val], %[res]"
> +  : [res] "=r" (r)
> +  : [val] "rm" (x) );
> +}
 In patch 11 relevant things are all in a single patch, making it easier
 to spot that this is dead code: The sole caller already has a
 __builtin_constant_p(), hence I don't see how the one here could ever
 return true. With that the respective part of the description is then
 questionable, too, I'm afraid: Where did you observe any actual effect
 from this? Or if you did - what am I missing?
>>> Hmm, thinking about it: I suppose that's why you have
>>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
>>> I'm (positively) surprised that the former may return true when the latter
>>> doesn't.
>> So was I, but this recommendation came straight from the GCC mailing
>> list.  And it really does work, even back in obsolete versions of GCC.
>>
>> __builtin_constant_p() operates on an expression not a value, and is
>> documented as such.
> Of course.
>
>>>  Nevertheless I'm inclined to think this deserves a brief comment.
>> There is a comment, and it's even visible in the snippet.
> The comment is about the asm(); it is neither placed to clearly relate
> to __builtin_constant_p(), nor is it saying anything about this specific
> property of it. You said you were equally surprised; don't you think
> that when both of us are surprised, a specific (even if brief) comment
> is warranted?

Spell it out for me like I'm an idiot.

Because I'm looking at the patch I submitted, and at your request for "a
brief comment", and I still have no idea what you think is wrong at the
moment.

I'm also not included to write a comment saying "go and read the GCC
manual more carefully".

>
>>> As an aside, to better match the comment inside the if()'s body, how about
>>>
>>> if ( __builtin_constant_p(!!x) && x )
>>>
>>> ? That also may make a little more clear that this isn't just a style
>>> choice, but actually needed for the intended purpose.
>> I am not changing the logic.
>>
>> Apart from anything else, your suggestion is trivially buggy.  I care
>> about whether the RHS collapses to a constant, and the only way of doing
>> that correctly is asking the compiler about the *exact* expression. 
>> Asking about some other expression which you hope - but do not know -
>> that the compiler will treat equivalently is bogus.  It would be
>> strictly better to only take the else clause, than to have both halves
>> emitted.
>>
>> This is the form I've tested extensively.  It's also the clearest form
>> IMO.  You can experiment with alternative forms when we're not staring
>> down code freeze of 4.19.
> "Clearest form" is almost always a matter of taste. To me, comparing
> unsigned values with > or < against 0 is generally at least suspicious.
> Using != is typically better (again: imo), and simply omitting the != 0
> then is shorter with no difference in effect. Except in peculiar cases
> like this one, where indeed it took me some time to figure why the
> comparison operator may not be omitted.
>
> All that said: I'm not going to insist on any change; the R-b previously
> offered still stands. I would highly appreciate though if the (further)
> comment asked for could be added.
>
> What I definitely dislike here is you - not for the first time - turning
> down remarks because a change of yours is late.

Actually it's not to do with the release.  I'd reject it at any point
because it's an unreasonable request to make; to me, or to anyone else.

It would be a matter of taste (which again you have a singular view on),
if it wasn't for the fact that what you actually said was:

"I don't like it, and you should discard all the careful analysis you
did because here's a form I prefer, that I haven't tested concerning a
behaviour I didn't even realise until this email."

and even if it wasn't a buggy suggestion to begin with, it's still toxic
maintainer feedback.


Frankly, I'd have more time to review other peoples patches if I wasn't
wasting all of my time on premium grade manure like this, while trying
to help Oleksii who's had it far worse this release trying to clean up
droppings of maintainers-past.

~Andrew

[xen-unstable-smoke test] 186213: tolerable all pass - PUSHED

2024-05-31 Thread osstest service owner

flight 186213 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186213/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  03147e6837ff045dbc328be876b9600f7040c771
baseline version:
 xen  1250c73c1ae2eec0308b4efe3e345127e9dbdb2b

Last test of basis   186204  2024-05-31 01:02:04 Z0 days
Testing same since   186213  2024-05-31 22:02:09 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Julien Grall 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   1250c73c1a..03147e6837  03147e6837ff045dbc328be876b9600f7040c771 -> smoke

Re: NULL pointer dereference in xenbus_thread->...

2024-05-31 Thread Marek Marczykowski-Górecki

On Tue, Mar 26, 2024 at 11:00:50AM +, Julien Grall wrote:
> Hi Marek,
> 
> +Juergen for visibility
> 
> When sending a bug report, I would suggest to CC relevant people as
> otherwise it can get lost (not may people monitors Xen devel if they are not
> CCed).
> 
> Cheers,
> 
> On 25/03/2024 16:17, Marek Marczykowski-Górecki wrote:
> > On Sun, Oct 22, 2023 at 04:14:30PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Aug 28, 2023 at 11:50:36PM +0200, Marek Marczykowski-Górecki 
> > > wrote:
> > > > Hi,
> > > > 
> > > > I've noticed in Qubes's CI failure like this:
> > > > 
> > > > [  871.271292] BUG: kernel NULL pointer dereference, address: 
> > > > 
> > > > [  871.275290] #PF: supervisor read access in kernel mode
> > > > [  871.277282] #PF: error_code(0x) - not-present page
> > > > [  871.279182] PGD 106fdb067 P4D 106fdb067 PUD 106fdc067 PMD 0
> > > > [  871.281071] Oops:  [#1] PREEMPT SMP NOPTI
> > > > [  871.282698] CPU: 1 PID: 28 Comm: xenbus Not tainted 
> > > > 6.1.43-1.qubes.fc37.x86_64 #1
> > > > [  871.285222] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> > > > [  871.23] RIP: e030:__wake_up_common+0x4c/0x180
> > > > [  871.292838] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 
> > > > a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 
> > > > 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> > > > [  871.299776] RSP: e02b:c900400f7e10 EFLAGS: 00010082
> > > > [  871.301656] RAX:  RBX: 88810541ce98 RCX: 
> > > > 
> > > > [  871.304255] RDX: 0001 RSI: 0003 RDI: 
> > > > 88810541ce90
> > > > [  871.306714] RBP: c900400f0280 R08: ffe8 R09: 
> > > > c900400f7e68
> > > > [  871.309937] R10: 7ff0 R11: 888100ad3000 R12: 
> > > > c900400f7e68
> > > > [  871.312326] R13:  R14:  R15: 
> > > > 
> > > > [  871.314647] FS:  () GS:88813ff0() 
> > > > knlGS:
> > > > [  871.317677] CS:  1e030 DS:  ES:  CR0: 80050033
> > > > [  871.319644] CR2:  CR3: 0001067fe000 CR4: 
> > > > 00040660
> > > > [  871.321973] Call Trace:
> > > > [  871.322782]  
> > > > [  871.323494]  ? show_trace_log_lvl+0x1d3/0x2ef
> > > > [  871.324901]  ? show_trace_log_lvl+0x1d3/0x2ef
> > > > [  871.326310]  ? show_trace_log_lvl+0x1d3/0x2ef
> > > > [  871.327721]  ? __wake_up_common_lock+0x82/0xd0
> > > > [  871.329147]  ? __die_body.cold+0x8/0xd
> > > > [  871.330378]  ? page_fault_oops+0x163/0x1a0
> > > > [  871.331691]  ? exc_page_fault+0x70/0x170
> > > > [  871.332946]  ? asm_exc_page_fault+0x22/0x30
> > > > [  871.334454]  ? __wake_up_common+0x4c/0x180
> > > > [  871.335777]  __wake_up_common_lock+0x82/0xd0
> > > > [  871.337183]  ? process_writes+0x240/0x240
> > > > [  871.338461]  process_msg+0x18e/0x2f0
> > > > [  871.339627]  xenbus_thread+0x165/0x1c0
> > > > [  871.340830]  ? cpuusage_read+0x10/0x10
> > > > [  871.342032]  kthread+0xe9/0x110
> > > > [  871.343317]  ? kthread_complete_and_exit+0x20/0x20
> > > > [  871.345020]  ret_from_fork+0x22/0x30
> > > > [  871.346239]  
> > > > [  871.347060] Modules linked in: snd_hda_codec_generic ledtrig_audio 
> > > > snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec 
> > > > snd_hda_core snd_hwdep snd_seq snd_seq_device joydev snd_pcm 
> > > > intel_rapl_msr ppdev intel_rapl_common snd_timer pcspkr e1000e snd 
> > > > soundcore i2c_piix4 parport_pc parport loop fuse xenfs dm_crypt 
> > > > crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni 
> > > > polyval_generic floppy ghash_clmulni_intel sha512_ssse3 serio_raw 
> > > > virtio_scsi virtio_console bochs xhci_pci xhci_pci_renesas xhci_hcd 
> > > > qemu_fw_cfg drm_vram_helper drm_ttm_helper ttm ata_generic pata_acpi 
> > > > xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn 
> > > > scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
> > > > [  871.368892] CR2: 
> > > > [  871.370160] ---[ end trace  ]---
> > > > [  871.371719] RIP: e030:__wake_up_common+0x4c/0x180
> > > > [  871.373273] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 
> > > > a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 
> > > > 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> > > > [  871.379866] RSP: e02b:c900400f7e10 EFLAGS: 00010082
> > > > [  871.381689] RAX:  RBX: 88810541ce98 RCX: 
> > > > 
> > > > [  871.383971] RDX: 0001 RSI: 0003 RDI: 
> > > > 88810541ce90
> > > > [  871.386235] RBP: c900400f0280 R08: ffe8 R09: 
> > > > c900400f7e68
> > > > [  871.388521] R10: 7ff0 R11: 888100ad3000 R12: 
> > > >

Re: [PATCH v2 02/13] xen/bitops: Cleanup ahead of rearrangements

2024-05-31 Thread Andrew Cooper

On 27/05/2024 9:24 am, Jan Beulich wrote:
> On 24.05.2024 22:03, Andrew Cooper wrote:
>>  * Rename __attribute_pure__ to just __pure before it gains users.
>>  * Introduce __constructor which is going to be used in lib/, and is
>>unconditionally cf_check.
>>  * Identify the areas of xen/bitops.h which are a mess.
>>  * Introduce xen/boot-check.h as helpers for compile and boot time testing.
>>This provides a statement of the ABI, and a confirmation that 
>> arch-specific
>>implementations behave as expected.
>>
>> Sadly Clang 7 and older isn't happy with the compile time checks.  Skip them,
>> and just rely on the runtime checks.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks.

>
> Further remarks, though:
>
>> ---
>>  xen/include/xen/bitops.h | 13 ++--
>>  xen/include/xen/boot-check.h | 60 
>>  xen/include/xen/compiler.h   |  3 +-
>>  3 files changed, 72 insertions(+), 4 deletions(-)
>>  create mode 100644 xen/include/xen/boot-check.h
> The bulk of the changes isn't about bitops; it's just that you're intending
> to first use it for testing there. The subject prefix therefore is somewhat
> misleading.

I'll change to "Cleanup and infrastructure ahead ..." but the bitops
aspect is still reasonably important.
>> --- /dev/null
>> +++ b/xen/include/xen/boot-check.h
>> @@ -0,0 +1,60 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +
>> +/*
>> + * Helpers for boot-time checks of basic logic, including confirming that
>> + * examples which should be calculated by the compiler are.
>> + */
>> +#ifndef XEN_BOOT_CHECK_H
>> +#define XEN_BOOT_CHECK_H

Given that CONFIG_SELF_TESTS was subsequently approved, I've renamed
this file to match.

>> +
>> +#include 
>> +
>> +/* Hide a value from the optimiser. */
>> +#define HIDE(x) \
>> +({ typeof(x) _x = (x); asm volatile ( "" : "+r" (_x) ); _x; })
> In principle this is a macro that could be of use elsewhere. That's also
> reflected in its entirely generic name. It therefore feels mis-placed in
> this header.

I'd forgotten that we several variations of this already.  compiler.h
has both OPTIMIZER_HIDE_VAR() and RELOC_HIDE().

>  Otoh though the use of "+r" is more restricting than truly
> necessary: While I'm not sure if "+g" would work, i.e. if that wouldn't
> cause issues with literals,

OPTIMIZER_HIDE_VAR() is indeed buggy using "+g", and RELOC_HIDE() even
explains how "g" tickles a bug in a compiler we probably don't care
about any more.

[Slightly out of order] the use of OPTIMIZER_HIDE_VAR() in gsi_vioapic()
is bogus AFAICT, and is actively creating the problem the commit message
says it was trying to avoid.

>  pretty surely "+rm" ought to work, removing
> the strict requirement for the compiler to put a certain value in a
> register.

"+rm" would be ideal in theory, we can't use it in practice because
Clang will (still!) interpret it as "+m" and force a spill.

While that's not necessarily a problem for the SELF_TESTS, it really is
a problem in array_index_mask_nospec(), which is latently buggy even now.

If the compiler really uses the flexibility offered by
OPTIMIZER_HIDE_VAR() to spill the value, array_index_mask_nospec() has
entirely failed at its purpose.

> Assuming you may have reservations against "+g" / "+rm" (and hence the
> construct wants keeping here), maybe rename to e.g. BOOT_CHECK_HIDE()?
> Alternatively, if generalized, moving to xen/macros.h would seem
> appropriate to me.

I've moved it to macros.h (because we should consolidate around it), but
kept as "+r" for both Clang and array_index_mask_nospec() reasons.

I don't expect HIDE() is ever actually going to be used in a case where
letting the value stay in memory is a useful thing overall.  But if you
still feel strongly about it, we can debate further when consolidating
the other users.

~Andrew

[ovmf test] 186211: all pass - PUSHED

2024-05-31 Thread osstest service owner

flight 186211 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186211/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 3b36aa96de1d5f7a4660bec5c0cbad2616183dd6
baseline version:
 ovmf 7c584bb04874bb5bad16fcf3996f5a893cc81a1c

Last test of basis   186210  2024-05-31 12:44:34 Z0 days
Testing same since   186211  2024-05-31 16:14:27 Z0 days1 attempts


People who touched revisions under test:
  Shang Qingyu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   7c584bb048..3b36aa96de  3b36aa96de1d5f7a4660bec5c0cbad2616183dd6 -> 
xen-tested-master

Re: [RFC PATCH v2] arm: dom0less: add TEE support

2024-05-31 Thread Volodymyr Babchuk



Hello all,

Volodymyr Babchuk  writes:

> Extend TEE mediator interface with two functions :
>
>  - tee_get_type_from_dts() returns TEE type based on input string
>  - tee_make_dtb_node() creates a DTB entry for the selected
>TEE mediator
>
[..]

>  bool __init is_dom0less_mode(void)
>  {
> @@ -650,6 +651,10 @@ static int __init prepare_dtb_domU(struct domain *d, 
> struct kernel_info *kinfo)
>  if ( ret )
>  goto err;
>

I forgot to add

#ifdef CONFIG_TEE

> +/* We are making assumption that every mediator sets d->arch.tee */
> +if ( d->arch.tee )
> +tee_make_dtb_node(kinfo->fdt);
> +
#endif

So build fails if TEE is disabled.

I'll fix this in the next version. Anyways, this is RFC.

>  /*
>   * domain_handle_dtb_bootmodule has to be called before the rest of
>   * the device tree is generated because it depends on the value of
> @@ -871,6 +876,7 @@ void __init create_domUs(void)
>  unsigned int flags = 0U;
>  uint32_t val;
>  int rc;
> +const char *tee_name;

[...]

-- 
WBR, Volodymyr

[RFC PATCH v2] arm: dom0less: add TEE support

2024-05-31 Thread Volodymyr Babchuk

Extend TEE mediator interface with two functions :

 - tee_get_type_from_dts() returns TEE type based on input string
 - tee_make_dtb_node() creates a DTB entry for the selected
   TEE mediator

Use those new functions to parse "xen,tee" DTS property for dom0less
guests and enable appropriate TEE mediator.

Signed-off-by: Volodymyr Babchuk 

---

This is still RFC because I am not happy how I decide if
tee_make_dtb_node() needs to be called.

TEE type is stored in d_cfg, but d_cfg is not passed to
construct_domU()->prepare_dtb_domU(). So right now I am relying on
fact that every TEE mediator initilizes d->arch.tee.

Also I am sorry about previous completely botched version of this
patch. I really messed it up, including absence of [RFC] tag :(

---
 docs/misc/arm/device-tree/booting.txt | 17 +
 xen/arch/arm/dom0less-build.c | 19 +++
 xen/arch/arm/include/asm/tee/tee.h| 13 ++
 xen/arch/arm/tee/ffa.c|  8 ++
 xen/arch/arm/tee/optee.c  | 35 +++
 xen/arch/arm/tee/tee.c| 21 
 6 files changed, 113 insertions(+)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..711a6080b5 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -231,6 +231,23 @@ with the following properties:
 In the future other possible property values might be added to
 enable only selected interfaces.
 
+- xen,tee
+
+A string property that describes what TEE mediator should be enabled
+for the domain. Possible property values are:
+
+- "none" (or missing property value)
+No TEE will be available in the VM.
+
+- "OP-TEE"
+VM will have access to the OP-TEE using classic OP-TEE SMC interface.
+
+- "FF-A"
+VM will have access to a TEE using generic FF-A interface.
+
+In the future other TEE mediators may be added, extending possible
+values for this property.
+
 - xen,domain-p2m-mem-mb
 
 Optional. A 32-bit integer specifying the amount of megabytes of RAM
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index fb63ec6fd1..13fdd44eef 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 bool __init is_dom0less_mode(void)
 {
@@ -650,6 +651,10 @@ static int __init prepare_dtb_domU(struct domain *d, 
struct kernel_info *kinfo)
 if ( ret )
 goto err;
 
+/* We are making assumption that every mediator sets d->arch.tee */
+if ( d->arch.tee )
+tee_make_dtb_node(kinfo->fdt);
+
 /*
  * domain_handle_dtb_bootmodule has to be called before the rest of
  * the device tree is generated because it depends on the value of
@@ -871,6 +876,7 @@ void __init create_domUs(void)
 unsigned int flags = 0U;
 uint32_t val;
 int rc;
+const char *tee_name;
 
 if ( !dt_device_is_compatible(node, "xen,domain") )
 continue;
@@ -881,6 +887,19 @@ void __init create_domUs(void)
 if ( dt_find_property(node, "xen,static-mem", NULL) )
 flags |= CDF_staticmem;
 
+tee_name = dt_get_property(node, "xen,tee", NULL);
+if ( tee_name )
+{
+rc = tee_get_type_from_dts(tee_name);
+if ( rc < 0)
+panic("Can't enable requested TEE for domain: %d\n", rc);
+d_cfg.arch.tee_type = rc;
+}
+else
+{
+d_cfg.arch.tee_type = XEN_DOMCTL_CONFIG_TEE_NONE;
+}
+
 if ( dt_property_read_bool(node, "direct-map") )
 {
 if ( !(flags & CDF_staticmem) )
diff --git a/xen/arch/arm/include/asm/tee/tee.h 
b/xen/arch/arm/include/asm/tee/tee.h
index da324467e1..9626667545 100644
--- a/xen/arch/arm/include/asm/tee/tee.h
+++ b/xen/arch/arm/include/asm/tee/tee.h
@@ -36,6 +36,9 @@ struct tee_mediator_ops {
 int (*domain_init)(struct domain *d);
 int (*domain_teardown)(struct domain *d);
 
+/* Make DTB node that describes TEE. Used when creating a dom0less domain 
*/
+int (*make_dtb_node)(void *fdt);
+
 /*
  * Called during domain destruction to relinquish resources used
  * by mediator itself. This function can return -ERESTART to indicate
@@ -65,7 +68,9 @@ bool tee_handle_call(struct cpu_user_regs *regs);
 int tee_domain_init(struct domain *d, uint16_t tee_type);
 int tee_domain_teardown(struct domain *d);
 int tee_relinquish_resources(struct domain *d);
+int tee_make_dtb_node(void *fdt);
 uint16_t tee_get_type(void);
+int tee_get_type_from_dts(const char* prop_value);
 
 #define REGISTER_TEE_MEDIATOR(_name, _namestr, _type, _ops) \
 static const struct tee_mediator_desc __tee_desc_##_name __used \
@@ -105,6 +110,14 @@ static inline uint16_t tee_get_type(void)
 return XEN_DOMCTL_CONFIG_TEE_NONE;
 }
 
+static inline int

[xen-unstable test] 186208: tolerable FAIL - PUSHED

2024-05-31 Thread osstest service owner

flight 186208 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186208/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186200
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186200
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186200
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186200
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186200
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186200
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1250c73c1ae2eec0308b4efe3e345127e9dbdb2b
baseline version:
 xen  9a905d7dc65883af082532b4dc91ce0131e54047

Last test of basis   186200  2024-05-30 18:38:49 Z0 days
Testing same since   186208  2024-05-31 05:23:30 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Oleksii Kurochko 
  Roger Pau Monne 
  Roger Pau Monné 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386

Re: [XEN PATCH v5 7/7] xen/arm: ffa: support notification

2024-05-31 Thread Bertrand Marquis

Hi Jens,

> On 29 May 2024, at 09:25, Jens Wiklander  wrote:
> 
> Add support for FF-A notifications, currently limited to an SP (Secure
> Partition) sending an asynchronous notification to a guest.
> 
> Guests and Xen itself are made aware of pending notifications with an
> interrupt. The interrupt handler triggers a tasklet to retrieve the
> notifications using the FF-A ABI and deliver them to their destinations.
> 
> Update ffa_partinfo_domain_init() to return error code like
> ffa_notif_domain_init().
> 
> Signed-off-by: Jens Wiklander 
> ---
> v4->v5:
> - Move the freeing of d->arch.tee to the new TEE mediator free_domain_ctx
>  callback to make the context accessible during rcu_lock_domain_by_id()
>  from a tasklet
> - Initialize interrupt handlers for secondary CPUs from the new TEE mediator
>  init_interrupt() callback
> - Restore the ffa_probe() from v3, that is, remove the
>  presmp_initcall(ffa_init) approach and use ffa_probe() as usual now that we
>  have the init_interrupt callback.
> - A tasklet is added to handle the Schedule Receiver interrupt. The tasklet
>  finds each relevant domain with rcu_lock_domain_by_id() which now is enough
>  to guarantee that the FF-A context can be accessed.
> - The notification interrupt handler only schedules the notification
>  tasklet mentioned above
> 
> v3->v4:
> - Add another note on FF-A limitations
> - Clear secure_pending in ffa_handle_notification_get() if both SP and SPM
>  bitmaps are retrieved
> - ASSERT that ffa_rcu_lock_domain_by_vm_id() isn't passed the vm_id FF-A
>  uses for Xen itself
> - Replace the get_domain_by_id() call done via ffa_get_domain_by_vm_id() in
>  notif_irq_handler() with a call to rcu_lock_live_remote_domain_by_id() via
>  ffa_rcu_lock_domain_by_vm_id()
> - Remove spinlock in struct ffa_ctx_notif and use atomic functions as needed
>  to access and update the secure_pending field
> - In notif_irq_handler(), look for the first online CPU instead of assuming
>  that the first CPU is online
> - Initialize FF-A via presmp_initcall() before the other CPUs are online,
>  use register_cpu_notifier() to install the interrupt handler
>  notif_irq_handler()
> - Update commit message to reflect recent updates
> 
> v2->v3:
> - Add a GUEST_ prefix and move FFA_NOTIF_PEND_INTR_ID and
>  FFA_SCHEDULE_RECV_INTR_ID to public/arch-arm.h
> - Register the Xen SRI handler on each CPU using on_selected_cpus() and
>  setup_irq()
> - Check that the SGI ID retrieved with FFA_FEATURE_SCHEDULE_RECV_INTR
>  doesn't conflict with static SGI handlers
> 
> v1->v2:
> - Addressing review comments
> - Change ffa_handle_notification_{bind,unbind,set}() to take struct
>  cpu_user_regs *regs as argument.
> - Update ffa_partinfo_domain_init() and ffa_notif_domain_init() to return
>  an error code.
> - Fixing a bug in handle_features() for FFA_FEATURE_SCHEDULE_RECV_INTR.
> ---
> xen/arch/arm/tee/Makefile   |   1 +
> xen/arch/arm/tee/ffa.c  |  72 +-
> xen/arch/arm/tee/ffa_notif.c| 409 
> xen/arch/arm/tee/ffa_partinfo.c |   9 +-
> xen/arch/arm/tee/ffa_private.h  |  56 -
> xen/arch/arm/tee/tee.c  |   2 +-
> xen/include/public/arch-arm.h   |  14 ++
> 7 files changed, 548 insertions(+), 15 deletions(-)
> create mode 100644 xen/arch/arm/tee/ffa_notif.c
> 
> diff --git a/xen/arch/arm/tee/Makefile b/xen/arch/arm/tee/Makefile
> index f0112a2f922d..7c0f46f7f446 100644
> --- a/xen/arch/arm/tee/Makefile
> +++ b/xen/arch/arm/tee/Makefile
> @@ -2,5 +2,6 @@ obj-$(CONFIG_FFA) += ffa.o
> obj-$(CONFIG_FFA) += ffa_shm.o
> obj-$(CONFIG_FFA) += ffa_partinfo.o
> obj-$(CONFIG_FFA) += ffa_rxtx.o
> +obj-$(CONFIG_FFA) += ffa_notif.o
> obj-y += tee.o
> obj-$(CONFIG_OPTEE) += optee.o
> diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
> index 5209612963e1..d644ee97cd5b 100644
> --- a/xen/arch/arm/tee/ffa.c
> +++ b/xen/arch/arm/tee/ffa.c
> @@ -39,6 +39,12 @@
>  *   - at most 32 shared memory regions per guest
>  * o FFA_MSG_SEND_DIRECT_REQ:
>  *   - only supported from a VM to an SP
> + * o FFA_NOTIFICATION_*:
> + *   - only supports global notifications, that is, per vCPU notifications
> + * are not supported
> + *   - doesn't support signalling the secondary scheduler of pending
> + * notification for secure partitions
> + *   - doesn't support notifications for Xen itself
>  *
>  * There are some large locked sections with ffa_tx_buffer_lock and
>  * ffa_rx_buffer_lock. Especially the ffa_tx_buffer_lock spinlock used
> @@ -194,6 +200,8 @@ out:
> 
> static void handle_features(struct cpu_user_regs *regs)
> {
> +struct domain *d = current->domain;
> +struct ffa_ctx *ctx = d->arch.tee;
> uint32_t a1 = get_user_reg(regs, 1);
> unsigned int n;
> 
> @@ -240,6 +248,30 @@ static void handle_features(struct cpu_user_regs *regs)
> BUILD_BUG_ON(PAGE_SIZE != FFA_PAGE_SIZE);
> ffa_set_regs_success(regs, 0, 0);
> break;
> +case FFA_FEATURE_NOTIF_PEND_INTR:
> +if (

[ovmf test] 186210: all pass - PUSHED

2024-05-31 Thread osstest service owner

flight 186210 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186210/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 7c584bb04874bb5bad16fcf3996f5a893cc81a1c
baseline version:
 ovmf 746cc5cc40bef22d606cd22d1feb10d73a7b3d11

Last test of basis   186209  2024-05-31 08:13:05 Z0 days
Testing same since   186210  2024-05-31 12:44:34 Z0 days1 attempts


People who touched revisions under test:
  Qingyu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   746cc5cc40..7c584bb048  7c584bb04874bb5bad16fcf3996f5a893cc81a1c -> 
xen-tested-master

Re: [XEN PATCH v5 6/7] xen/arm: add and call tee_free_domain_ctx()

2024-05-31 Thread Bertrand Marquis

Hi Jens,

> On 29 May 2024, at 09:25, Jens Wiklander  wrote:
> 
> Add tee_free_domain_ctx() to the TEE mediator framework.
> tee_free_domain_ctx() is called from arch_domain_destroy() to allow late
> freeing of the d->arch.tee context. This will simplify access to
> d->arch.tee for domains retrieved with rcu_lock_domain_by_id().
> 
> Signed-off-by: Jens Wiklander 

Reviewed-by: Bertrand Marquis 

Cheers
Bertrand

> ---
> xen/arch/arm/domain.c  | 1 +
> xen/arch/arm/include/asm/tee/tee.h | 6 ++
> xen/arch/arm/tee/tee.c | 6 ++
> 3 files changed, 13 insertions(+)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 34cbfe699a68..61e46a157ccc 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -837,6 +837,7 @@ int arch_domain_teardown(struct domain *d)
> 
> void arch_domain_destroy(struct domain *d)
> {
> +tee_free_domain_ctx(d);
> /* IOMMU page table is shared with P2M, always call
>  * iommu_domain_destroy() before p2m_final_teardown().
>  */
> diff --git a/xen/arch/arm/include/asm/tee/tee.h 
> b/xen/arch/arm/include/asm/tee/tee.h
> index eda8a8aa38f2..2e99a38184be 100644
> --- a/xen/arch/arm/include/asm/tee/tee.h
> +++ b/xen/arch/arm/include/asm/tee/tee.h
> @@ -38,6 +38,7 @@ struct tee_mediator_ops {
>  */
> int (*domain_init)(struct domain *d);
> int (*domain_teardown)(struct domain *d);
> +void (*free_domain_ctx)(struct domain *d);
> 
> /*
>  * Called during domain destruction to relinquish resources used
> @@ -70,6 +71,7 @@ int tee_domain_teardown(struct domain *d);
> int tee_relinquish_resources(struct domain *d);
> uint16_t tee_get_type(void);
> void init_tee_interrupt(void);
> +void tee_free_domain_ctx(struct domain *d);
> 
> #define REGISTER_TEE_MEDIATOR(_name, _namestr, _type, _ops) \
> static const struct tee_mediator_desc __tee_desc_##_name __used \
> @@ -113,6 +115,10 @@ static inline void init_tee_interrupt(void)
> {
> }
> 
> +static inline void tee_free_domain_ctx(struct domain *d)
> +{
> +}
> +
> #endif  /* CONFIG_TEE */
> 
> #endif /* __ARCH_ARM_TEE_TEE_H__ */
> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> index 8117fd55123e..cb65f187f51f 100644
> --- a/xen/arch/arm/tee/tee.c
> +++ b/xen/arch/arm/tee/tee.c
> @@ -102,6 +102,12 @@ void __init init_tee_interrupt(void)
> cur_mediator->ops->init_interrupt();
> }
> 
> +void tee_free_domain_ctx(struct domain *d)
> +{
> +if ( cur_mediator && cur_mediator->ops->free_domain_ctx)
> +cur_mediator->ops->free_domain_ctx(d);
> +}
> +
> /*
>  * Local variables:
>  * mode: C
> -- 
> 2.34.1
>

Re: [XEN PATCH v5 5/7] xen/arm: add and call init_tee_interrupt()

2024-05-31 Thread Bertrand Marquis

Hi Jens,

> On 29 May 2024, at 09:25, Jens Wiklander  wrote:
> 
> Add init_tee_interrupt() to the TEE mediator framework and call it from
> start_secondary() late enough that per-cpu interrupts can be configured
> on CPUs as they are initialized. This is needed in later patches.

Just a NIT:
The function you are adding is in fact just an init on secondary cores and
wether you are using to initialise interrupts or something does not really
matter so I would suggest to rename it to "init_secondary" instead to allow
future use cases that are possible.

With that done, feel free to add my R-b on the next version.

Cheers
Bertrand

> 
> Signed-off-by: Jens Wiklander 
> ---
> xen/arch/arm/include/asm/tee/tee.h | 8 
> xen/arch/arm/smpboot.c | 2 ++
> xen/arch/arm/tee/tee.c | 6 ++
> 3 files changed, 16 insertions(+)
> 
> diff --git a/xen/arch/arm/include/asm/tee/tee.h 
> b/xen/arch/arm/include/asm/tee/tee.h
> index da324467e130..eda8a8aa38f2 100644
> --- a/xen/arch/arm/include/asm/tee/tee.h
> +++ b/xen/arch/arm/include/asm/tee/tee.h
> @@ -28,6 +28,9 @@ struct tee_mediator_ops {
>  */
> bool (*probe)(void);
> 
> +/* Initialize eventual interrupt handlers on the secondary CPUs */
> +void (*init_interrupt)(void);
> +
> /*
>  * Called during domain construction if toolstack requests to enable
>  * TEE support so mediator can inform TEE about new
> @@ -66,6 +69,7 @@ int tee_domain_init(struct domain *d, uint16_t tee_type);
> int tee_domain_teardown(struct domain *d);
> int tee_relinquish_resources(struct domain *d);
> uint16_t tee_get_type(void);
> +void init_tee_interrupt(void);
> 
> #define REGISTER_TEE_MEDIATOR(_name, _namestr, _type, _ops) \
> static const struct tee_mediator_desc __tee_desc_##_name __used \
> @@ -105,6 +109,10 @@ static inline uint16_t tee_get_type(void)
> return XEN_DOMCTL_CONFIG_TEE_NONE;
> }
> 
> +static inline void init_tee_interrupt(void)
> +{
> +}
> +
> #endif  /* CONFIG_TEE */
> 
> #endif /* __ARCH_ARM_TEE_TEE_H__ */
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 93a10d7721b4..e1c1e20efd98 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -29,6 +29,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> /* Override macros from asm/page.h to make them work with mfn_t */
> #undef virt_to_mfn
> @@ -401,6 +402,7 @@ void asmlinkage start_secondary(void)
>  */
> init_maintenance_interrupt();
> init_timer_interrupt();
> +init_tee_interrupt();
> 
> local_abort_enable();
> 
> diff --git a/xen/arch/arm/tee/tee.c b/xen/arch/arm/tee/tee.c
> index ddd17506a9ff..8117fd55123e 100644
> --- a/xen/arch/arm/tee/tee.c
> +++ b/xen/arch/arm/tee/tee.c
> @@ -96,6 +96,12 @@ static int __init tee_init(void)
> 
> __initcall(tee_init);
> 
> +void __init init_tee_interrupt(void)
> +{
> +if ( cur_mediator && cur_mediator->ops->init_interrupt)
> +cur_mediator->ops->init_interrupt();
> +}
> +
> /*
>  * Local variables:
>  * mode: C
> -- 
> 2.34.1
>

Re: [XEN PATCH v5 4/7] xen/arm: allow dynamically assigned SGI handlers

2024-05-31 Thread Bertrand Marquis

Hi Jens,

> On 29 May 2024, at 09:25, Jens Wiklander  wrote:
> 
> Updates so request_irq() can be used with a dynamically assigned SGI irq
> as input. This prepares for a later patch where an FF-A schedule
> receiver interrupt handler is installed for an SGI generated by the
> secure world.
> 
> From the Arm Base System Architecture v1.0C [1]:
> "The system shall implement at least eight Non-secure SGIs, assigned to
> interrupt IDs 0-7."
> 
> gic_route_irq_to_xen() don't gic_set_irq_type() for SGIs since they are
> always edge triggered.
> 
> gic_interrupt() is updated to route the dynamically assigned SGIs to
> do_IRQ() instead of do_sgi(). The latter still handles the statically
> assigned SGI handlers like for instance GIC_SGI_CALL_FUNCTION.
> 
> [1] https://developer.arm.com/documentation/den0094/
> 
> Signed-off-by: Jens Wiklander 

Reviewed-by: Bertrand Marquis 

Cheers
Bertrand

Re: [PATCH] arm: dom0less: add TEE support

2024-05-31 Thread Julien Grall


Hi Bertrand,

On 30/05/2024 14:22, Bertrand Marquis wrote:

On 30 May 2024, at 12:35, Julien Grall  wrote:

Hi Bertrand,

On 30/05/2024 10:40, Bertrand Marquis wrote:

But we are making assumption that all TEE implementation will have its
node inside "/firmware/". I am not 100% sure that this is correct. For
example I saw that Google Trusty uses "/trusty" node (directly inside
the DTS root). On other hand, it is not defined in dts bindings, as far
as I know.

Regarding the firmware part you can easily handle that by looking for /firmware
and create it if it does not exist before creating your sub-node and this should
be node in the optee node creation function not in tee.c.


This would work if the node /firmware doesn't exist. But how would you handle 
the case where it is already present?

I looked at the libfdt API and AFAICT the DTB creation needs to be linear. IOW, 
you can't add a subnode to an already created node.

There is an helper to create a placeholder, but AFAIK this is only for a 
property. You also need to know the size in advance.


I thought it was possible but i definitely can be wrong.

As right now we have only one need for the node, we could delay a possible 
solution and just create it in the optee driver.


I am fine with that.

Cheers,

--
Julien Grall

Re: [PATCH 2/2] arch/irq: Centralise no_irq_type

2024-05-31 Thread Julien Grall


Hi Andrew,

On 30/05/2024 19:40, Andrew Cooper wrote:

Having no_irq_type defined per arch, but using common callbacks is a mess, and
particualrly hard to bootstrap a new architecture with.

Now that the ack()/end() hooks have been exported suitably, move the
definition of no_irq_type into common/irq.c, and into .rodata for good
measure.

No functional change, but a whole lot less tangled.

Signed-off-by: Andrew Cooper 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH 1/2] arch/irq: Make irq_ack_none() mandatory

2024-05-31 Thread Julien Grall


Hi,

On 30/05/2024 19:40, Andrew Cooper wrote:

Any non-stub implementation of these is going to have to do something here.

irq_end_none() is more complicated and has arch-specific interactions with
irq_ack_none(), so make it optional.

For PPC, introduce a stub irq_ack_none().

For ARM and x86, export the existing {ack,end}_none() helpers, gaining an irq_
prefix for consisntency with everything else in no_irq_type.

No functional change.

Signed-off-by: Andrew Cooper 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: convert the SCSI ULDs to the atomic queue limits API v2

2024-05-31 Thread Christoph Hellwig

On Fri, May 31, 2024 at 08:07:54AM -0400, Martin K. Petersen wrote:
> If you have other block layer changes depending on this series we'll
> probably need a shared branch. I'll need to make several changes to sd.c
> to fix reported issues, including a couple in the zeroing/discard
> department.

Yes, I also want to move the various feature flags that currently
abususe queue->flags into the queue limits.  In the worst case I could
delay that for another merge window, but a shared branch would be
a lot nicer.

Re: convert the SCSI ULDs to the atomic queue limits API v2

2024-05-31 Thread Martin K. Petersen



Christoph,

> The patches are against Jens' block-6.10 tree.  Due to the amount of
> block layer changes in here, and other that will depend on it, it
> would be good if this could eventually be merged through the block
> tree, or at least a shared branch between the SCSI and block trees.

If you have other block layer changes depending on this series we'll
probably need a shared branch. I'll need to make several changes to sd.c
to fix reported issues, including a couple in the zeroing/discard
department.

No objections to your series so far. Just trying to reconcile your
changes with mine...

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 14/14] block: add special APIs for run-time disabling of discard and friends

2024-05-31 Thread Nitesh Shetty


On 31/05/24 09:48AM, Christoph Hellwig wrote:

A few drivers optimistically try to support discard, write zeroes and
secure erase and disable the features from the I/O completion handler
if the hardware can't support them.  This disable can't be done using
the atomic queue limits API because the I/O completion handlers can't
take sleeping locks or freeze the queue.  Keep the existing clearing
of the relevant field to zero, but replace the old blk_queue_max_*
APIs with new disable APIs that force the value to 0.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---

Reviewed-by: Nitesh Shetty

Re: [PATCH 1/3] CI: Remove CI_COMMIT_REF_PROTECTED requirement for HW jobs

2024-05-31 Thread Marek Marczykowski-Górecki

On Thu, May 30, 2024 at 05:43:12PM -0700, Stefano Stabellini wrote:
> On Thu, 30 May 2024, Marek Marczykowski-Górecki wrote:
> > On Wed, May 29, 2024 at 03:19:43PM +0100, Andrew Cooper wrote:
> > > This restriction doesn't provide any security because anyone with suitable
> > > permissions on the HW runners can bypass it with this local patch.
> > > 
> > > Requiring branches to be protected hampers usability of transient testing
> > > branches (specifically, can't delete branches except via the Gitlab UI).
> > >
> > > Drop the requirement.
> > > 
> > > Fixes: 746774cd1786 ("automation: introduce a dom0less test run on Xilinx 
> > > hardware")
> > > Fixes: 0ab316e7e15f ("automation: add a smoke and suspend test on an 
> > > Alder Lake system")
> > > Signed-off-by: Andrew Cooper 
> > 
> > Runners used to be set to run only on protected branches. I think it
> > isn't the case anymore from what I see, but it needs checking (I don't
> > see specific settings in all the projects). If it were still the case,
> > removing variable check would result in jobs forever pending.
> 
> Andrew, thank you so much for pointing this out.
> 
> I think the idea was that we can specify the individual users with
> access to protected branches. We cannot add restrictions for unprotected
> branches. So if we set the gitlab runner to only run protected jobs,
> then the $CI_COMMIT_REF_PROTECTED check makes sense. Not for security,
> but to prevent the jobs from getting stuck waiting for a runner that
> will never arrive.
> 
> However, like Marek said, now the gitlab runners don't have the
> "Protected" check set, so it is all useless :-(
> 
> I would prefer to set "Protected" in the gitlab runners settings so that
> it becomes easier to specify users that can and cannot trigger the jobs.

Owners of subprojects can control branch protection rules, so this
feature doesn't help with limiting access to runners added to the whole
group. Qubes runners are not group runners, they are project runners
added only to select projects.

I don't remember why exactly runners got "protected" disabled, but AFAIR
there was some issue with that setting.

> Then, we'll need the $CI_COMMIT_REF_PROTECTED check, not for security,
> but to avoid pipelines getting stuck for unprotected branches.
> 
> It is really difficult to restrict users from triggering jobs in other
> way because they are all automatically added to all subprojects.
> 
> 
> Would you guys be OK if I set "Protected" in the Xilinx and Qubes gitlab
> runners as soon as possible?


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH 11/14] sd: convert to the atomic queue limits API

2024-05-31 Thread John Garry


On 31/05/2024 08:48, Christoph Hellwig wrote:

Assign all queue limits through a local queue_limits variable and
queue_limits_commit_update so that we can't race updating them from
multiple places, and freeze the queue when updating them so that
in-progress I/O submissions don't see half-updated limits.

Signed-off-by: Christoph Hellwig
Reviewed-by: Damien Le Moal


FWIW,

Reviewed-by: John Garry

Re: convert the SCSI ULDs to the atomic queue limits API v2

2024-05-31 Thread John Garry


On 31/05/2024 08:47, Christoph Hellwig wrote:

Hi all,


Just saying that b4 for some reason does not create a .mbox to apply for 
this series when using "b4 am". It also no longer does for v1 (when it 
did previously). I guess that the series versioning confused it.




this series converts the SCSI upper level drivers to the atomic queue
limits API.

The first patch is a bug fix for ubd that later patches depend on and
might be worth picking up for 6.10.

The second patch changes the max_sectors calculation to take the optimal
I/O size into account so that sd, nbd and rbd don't have to mess with
the user max_sector value.  I'd love to see a careful review from the
nbd and rbd maintainers for this one!

The following patches clean up a few lose ends in the sd driver, and
then convert sd and sr to the atomic queue limits API.  The final
patches remove the now unused block APIs, and convert a few to be
specific to their now more narrow use case.

The patches are against Jens' block-6.10 tree.  Due to the amount of
block layer changes in here, and other that will depend on it, it
would be good if this could eventually be merged through the block
tree, or at least a shared branch between the SCSI and block trees.

Changes since v1:
  - change the io_opt value for rbd
  - fix a left-over direct assignent to q->limits
  - add a new patch to refactor the ubd interrupt handler
  - use an else if to micro-optimize the ubd error handling
  - also remove disk_set_max_open_zones and disk_set_max_active_zones
  - use SECTOR_SHIFT in one more place
  - various spelling fixes
  - comment formating fix

Diffstat:
  arch/um/drivers/ubd_kern.c   |   50 +++--
  block/blk-settings.c |  238 
+--
  drivers/block/nbd.c  |2
  drivers/block/rbd.c  |3
  drivers/block/xen-blkfront.c |4
  drivers/scsi/sd.c|  222 
  drivers/scsi/sd.h|6 -
  drivers/scsi/sd_zbc.c|   27 ++--
  drivers/scsi/sr.c|   42 ---
  include/linux/blkdev.h   |   52 +++--
  10 files changed, 210 insertions(+), 436 deletions(-)

Re: [PATCH 14/14] block: add special APIs for run-time disabling of discard and friends

2024-05-31 Thread John Garry


On 31/05/2024 08:48, Christoph Hellwig wrote:

A few drivers optimistically try to support discard, write zeroes and
secure erase and disable the features from the I/O completion handler
if the hardware can't support them.  This disable can't be done using
the atomic queue limits API because the I/O completion handlers can't
take sleeping locks or freeze the queue.  Keep the existing clearing
of the relevant field to zero, but replace the old blk_queue_max_*
APIs with new disable APIs that force the value to 0.

Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Reviewed-by: Damien Le Moal



Reviewed-by: John Garry

Re: [PATCH 13/14] block: remove unused queue limits API

2024-05-31 Thread John Garry


On 31/05/2024 08:48, Christoph Hellwig wrote:

Remove all APIs that are unused now that sd and sr have been converted
to the atomic queue limits API.

Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche


Reviewed-by: John Garry

[libvirt test] 186206: tolerable all pass - PUSHED

2024-05-31 Thread osstest service owner

flight 186206 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186206/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186190
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 libvirt  6f293f1fadc08660ba470d2cd3a91fde58cef617
baseline version:
 libvirt  2ea493598fd5a3efe4f25d842b2f09bc5c74c6c1

Last test of basis   186190  2024-05-30 04:22:11 Z1 days
Testing same since   186206  2024-05-31 04:22:35 Z0 days1 attempts


People who touched revisions under test:
  Fedora Weblate Translation 
  Göran Uddeborg 
  Michal Privoznik 
  Weblate 
  Yuri Chornoivan 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-amd64-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-amd64-amd64-libvirt-raw pass
 test-arm64-arm64-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/libvirt.git
   2ea493598f..6f293f1fad  6f293f1fadc08660ba470d2cd3a91fde58cef617 -> 
xen-tested-master

Re: [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area

2024-05-31 Thread Roger Pau Monné

On Thu, May 30, 2024 at 12:08:26PM +0100, Andrew Cooper wrote:
> On 29/05/2024 3:32 pm, Alejandro Vallejo wrote:
> > diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> > index f033d22785be..b70b22d55fcf 100644
> > --- a/xen/lib/x86/policy.c
> > +++ b/xen/lib/x86/policy.c
> > @@ -2,6 +2,17 @@
> >  
> >  #include 
> >  
> > +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t 
> > id)
> > +{
> > +/*
> > + * TODO: Derive x2APIC ID from the topology information inside `p`
> > + *   rather than from the vCPU ID alone. This bodge is a temporary
> > + *   measure until all infra is in place to retrieve or derive the
> > + *   initial x2APIC ID from migrated domains.
> > + */
> > +return id * 2;
> > +}
> > +
> 
> I'm afraid it's nonsensical to try and derive x2APIC ID from a
> policy+vcpu_id.
> 
> Take a step back, and think the data through.
> 
> A VM has:
> * A unique APIC_ID for each vCPU
> * Info in CPUID describing how to decompose the APIC_ID into topology
> 
> Right now, because this is all completely broken, we have:
> * Hardcoded APIC_ID = vCPU_ID * 2
> * Total nonsense in CPUID
> 
> 
> When constructing a VM, the toolstack (given suitable admin
> guidance/defaults) *must* choose both:
> * The APIC_ID themselves
> * The CPUID topo data to match
> 
> i.e. this series should be editing the toolstack's call to
> xc_domain_hvm_setcontext().
> 
> It's not, because AFAICT you're depending on the migration compatibility
> logic and inserting a new hardcoded assumption about symmetry of the layout.
> 
> 
> The data flows we need are:
> 
> (New) create:
> * Toolstack chooses both parts of topo information
> * Xen needs a default, which reasonably can be APIC_ID=vCPU_ID when the
> rest of the data flow has been cleaned up.  But this is needs to be
> explicit in vcpu_create() and without reference to the policy.

Doesn't using APIC_ID=vCPU_ID limits us to only being able to expose
certain typologies? (as vCPU IDs are contiguous).  For example
exposing a topology with 3 cores per package won't be possible?

Not saying it's a bad move to start this way, but if we want to
support exposing more exotic topology sooner or later we will need
some kind of logic that assigns the APIC IDs based on the knowledge of
the expected topology.  Whether is gets such knowledge from the CPU
policy or directly from the toolstack is another question.

Thanks, Roger.

[ANNOUNCE] Join us for free virtually at Xen Summit 2024 + FAQs!

2024-05-31 Thread Kelly Choi

Hello Xen Community,

Join us virtually at Xen Summit (4-6th June), for free using Jitsi!

We want to encourage as many of you to participate in the Xen Summit as
possible. Just because you are not physically attending, doesn't mean you
can't get involved.
*(Please note, there will be no professional AV or audio equipment, and it
is a community effort to enable others to get involved.) *

On the day, please find the *schedule of talks listed here.*

I've also included some FAQs at the bottom of this email.

*Instructions:*

   - Our team will help host the main presentation talk sessions
   using Jitsi
   - Either myself or one attendee volunteer will help host each design
   session using Jitsi
   - Ensure you are using the Lisbon timezone
   - All sessions on Tuesday 4th June will be streamed using:
  - *SESSION PRESENTATIONS LINK *.
   - All sessions on Wednesday 5th June and Thursday 6th June will use the
   following links:
   - *DESIGN SESSION A  *(Liberdade
  Room)
  - *DESIGN SESSION B  *(Augusta
  Room)
  - Please look out on the schedule for the time and which session room
  it takes place in


   - Thursday design sessions will be finalized on the schedule by the end
   of day on Wednesday
   - The same links will be used throughout talks and sessions
   - (Optional) Join our Xen Summit matrix channel for updates on the day:
   https://matrix.to/#/#xen-project-summit:matrix.org

*Some ground rules to follow:*

   - Enter your full name on Jitsi so everyone knows who you are
   - Please mute yourself upon joining
   - Turning on cameras is optional, but we encourage doing this for design
   sessions
   - Do *not* shout out your questions during session presentations,
   instead ask these on the chat function and we will do our best to ask on
   behalf of you
   - During design sessions, we encourage you to unmute and participate
   freely
   - If multiple people wish to speak, please use the 'raise hand' function
   on Jitsi or chat
   - Should there be a need, moderators will have permission to remove
   anyone who is disruptive in sessions on Jitsi
   - If you face issues on the day, please let us know via Matrix - we will
   do our best to help, but please note this is a community effort

*Jitsi links:*

Session presentation link:

https://meet.jit.si/XenSummit24Main

Design Session A (Liberdade Room) link:
https://meet.jit.si/XenDesignSessionsA

Design Session B (Augusta Room) link: https://meet.jit.si/XenDesignSessionsB

See meeting dial-in numbers:
https://meet.jit.si/static/dialInInfo.html?room=XenSummit24Main

If also dialing-in through a room phone, join without connecting to audio:
https://meet.jit.si/XenSummit24Main#config.startSilent=true

See meeting dial-in numbers:
https://meet.jit.si/static/dialInInfo.html?room=XenDesignSessionsA

If also dialing-in through a room phone, join without connecting to audio:
https://meet.jit.si/XenDesignSessionsA#config.startSilent=true

See meeting dial-in numbers:
https://meet.jit.si/static/dialInInfo.html?room=XenDesignSessionsB

If also dialing-in through a room phone, join without connecting to audio:
https://meet.jit.si/XenDesignSessionsB#config.startSilent=true

Schedule Example:

Tuesday 4th & Wednesday 5th June 2024

(Lisbon timezone)

Schedule Example: Wednesday 5th & Thursday 6th June 2024

(Lisbon timezone)

Schedule Example:  Wednesday 5th & Thursday 6th June 2024

(Lisbon timezone)

09:00
Welcome & Opening Remarks - Kelly Choi, Community Manager, Cloud Software
Group, XenServer

09:10

Xen Project 2024 Weather Report - Kelly Choi, XenServer, Cloud Software
Group.

LIBERDADE I

09:10

Challenges and Status of Enabling TrenchBoot in Xen Hypervisor - Michał
Żygowski & Piotr Król, 3mdeb

13:45

The future of Xen Project physical events


LIBERDADE I

14:35

IOMMU paravirtualization and Xen IOMMU subsystem rework


LIBERDADE I
09:10

Using Xenalyze for Performance Analysis - George Dunlap, Xen ServerAUGUSTA I





13:45

Downstream working group


AUGUSTA I

14:35

Xen Safety Requirements upstreaming


AUGUSTA I



*How to filter the schedule:*


*FAQs:*

   - *My company would like to sponsor the next Xen Summit, how do I get
   involved?*
  - *Please email *@communitymanager
* with
  your interest *
   - *Are sessions recorded?*
  - *Yes, all talks are recorded and will be available on YouTube after
  the event. Design sessions on Day 3 are not recorded. *
   - *Can I write

Re: [PATCH v2 7/7] hw/xen: Register framebuffer backend via xen_backend_init()

2024-05-31 Thread Paul Durrant


On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote:

Align the framebuffer backend with the other legacy ones,
register it via xen_backend_init() when '-vga xenfb' is
used. It is safe because MODULE_INIT_XEN_BACKEND is called
in xen_bus_realize(), long after CLI processing initialized
the vga_interface_type variable.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/xen/xen-legacy-backend.h | 3 ---
  hw/display/xenfb.c  | 9 +++--
  hw/xenpv/xen_machine_pv.c   | 2 --
  3 files changed, 7 insertions(+), 7 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 6/7] hw/xen: register legacy backends via xen_backend_init

2024-05-31 Thread Paul Durrant


On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote:

From: Paolo Bonzini 

It is okay to register legacy backends in the middle of xen_bus_init().
All that the registration does is record the existence of the backend
in xenstore.

This makes it possible to remove them from the build without introducing
undefined symbols in xen_be_init().  It also removes the need for the
backend_register callback, whose only purpose is to avoid registering
nonfunctional backends.

Signed-off-by: Paolo Bonzini 
Message-ID: <20240509170044.190795-8-pbonz...@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/xen/xen-legacy-backend.h | 11 ++-
  include/hw/xen/xen_pvdev.h  |  1 -
  hw/9pfs/xen-9p-backend.c|  8 +++-
  hw/display/xenfb.c  |  8 +++-
  hw/usb/xen-usb.c| 14 --
  hw/xen/xen-legacy-backend.c | 16 
  6 files changed, 20 insertions(+), 38 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 5/7] hw/xen: initialize legacy backends from xen_bus_init()

2024-05-31 Thread Paul Durrant


On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote:

From: Paolo Bonzini 

Prepare for moving the calls to xen_be_register() under the
control of xen_bus_init(), using the normal xen_backend_init()
method that is used by the "modern" backends.

This requires the xenstore global variable to be initialized,
which is done by xen_be_init().  To ensure that everything is
ready at the time the xen_backend_init() functions are called,
remove the xen_be_init() function from all the boards and
place it directly in xen_bus_init().

Signed-off-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Message-ID: <20240509170044.190795-7-pbonz...@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/i386/pc.c  | 1 -
  hw/xen/xen-bus.c  | 4 
  hw/xen/xen-hvm-common.c   | 2 --
  hw/xenpv/xen_machine_pv.c | 5 +
  4 files changed, 5 insertions(+), 7 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 4/7] hw/xen: Make XenDevOps structures const

2024-05-31 Thread Paul Durrant


On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote:

Keep XenDevOps structures in .rodata.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/xen/xen-legacy-backend.h | 8 
  hw/9pfs/xen-9p-backend.c| 2 +-
  hw/display/xenfb.c  | 4 ++--
  hw/usb/xen-usb.c| 4 ++--
  4 files changed, 9 insertions(+), 9 deletions(-)



Reviewed-by: Paul Durrant

[ovmf test] 186209: all pass - PUSHED

2024-05-31 Thread osstest service owner

flight 186209 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186209/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 746cc5cc40bef22d606cd22d1feb10d73a7b3d11
baseline version:
 ovmf 5f68a363d0d95bd0d383861ae21886d9824a8cd4

Last test of basis   186205  2024-05-31 02:11:12 Z0 days
Testing same since   186209  2024-05-31 08:13:05 Z0 days1 attempts


People who touched revisions under test:
  Shang Qingyu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   5f68a363d0..746cc5cc40  746cc5cc40bef22d606cd22d1feb10d73a7b3d11 -> 
xen-tested-master

Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway

2024-05-31 Thread Roger Pau Monné

On Fri, May 31, 2024 at 10:33:58AM +0200, Jan Beulich wrote:
> On 31.05.2024 09:31, Roger Pau Monné wrote:
> > On Fri, May 31, 2024 at 09:02:20AM +0200, Jan Beulich wrote:
> >> On 29.05.2024 18:14, Roger Pau Monné wrote:
> >>> On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote:
>  On 29.05.2024 17:03, Roger Pau Monné wrote:
> > On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote:
> >> On 29.05.2024 11:01, Roger Pau Monne wrote:
> >>> Due to the current rwlock logic, if the CPU calling get_cpu_maps() 
> >>> does so from
> >>> a cpu_hotplug_{begin,done}() region the function will still return 
> >>> success,
> >>> because a CPU taking the rwlock in read mode after having taken it in 
> >>> write
> >>> mode is allowed.  Such behavior however defeats the purpose of 
> >>> get_cpu_maps(),
> >>> as it should always return false when called with a CPU hot{,un}plug 
> >>> operation
> >>> is in progress.
> >>
> >> I'm not sure I can agree with this. The CPU doing said operation ought 
> >> to be
> >> aware of what it is itself doing. And all other CPUs will get back 
> >> false from
> >> get_cpu_maps().
> >
> > Well, the CPU is aware in the context of cpu_{up,down}(), but not in
> > the interrupts that might be handled while that operation is in
> > progress, see below for a concrete example.
> >
> >>>  Otherwise the logic in send_IPI_mask() for example is wrong,
> >>> as it could decide to use the shorthand even when a CPU operation is 
> >>> in
> >>> progress.
> >>
> >> It's also not becoming clear what's wrong there: As long as a CPU isn't
> >> offline enough to not be in cpu_online_map anymore, it may well need 
> >> to still
> >> be the target of IPIs, and targeting it with a shorthand then is still 
> >> fine.
> >
> > The issue is in the online path: there's a window where the CPU is
> > online (and the lapic active), but cpu_online_map hasn't been updated
> > yet.  A specific example would be time_calibration() being executed on
> > the CPU that is running cpu_up().  That could result in a shorthand
> > IPI being used, but the mask in r.cpu_calibration_map not containing
> > the CPU that's being brought up online because it's not yet added to
> > cpu_online_map.  Then the number of CPUs actually running
> > time_calibration_rendezvous_fn won't match the weight of the cpumask
> > in r.cpu_calibration_map.
> 
>  I see, but maybe only partly. Prior to the CPU having its bit set in
>  cpu_online_map, can it really take interrupts already? Shouldn't it be
>  running with IRQs off until later, thus preventing it from making it
>  into the rendezvous function in the first place? But yes, I can see
>  how the IRQ (IPI) then being delivered later (once IRQs are enabled)
>  might cause problems, too.
> >>>
> >>> The interrupt will get set in IRR and handled when interrupts are
> >>> enabled.
> >>>
> 
>  Plus, with how the rendezvous function is invoked (via
>  on_selected_cpus() with the mask copied from cpu_online_map), the
>  first check in smp_call_function_interrupt() ought to prevent the
>  function from being called on the CPU being onlined. A problem would
>  arise though if the IPI arrived later and call_data was already
>  (partly or fully) overwritten with the next request.
> >>>
> >>> Yeah, there's a small window where the fields in call_data are out of
> >>> sync.
> >>>
> >> In any event this would again affect only the CPU leading the CPU 
> >> operation,
> >> which should clearly know at which point(s) it is okay to send IPIs. 
> >> Are we
> >> actually sending any IPIs from within CPU-online or CPU-offline paths?
> >
> > Yes, I've seen the time rendezvous happening while in the middle of a
> > hotplug operation, and the CPU coordinating the rendezvous being the
> > one doing the CPU hotplug operation, so get_cpu_maps() returning true.
> 
>  Right, yet together with ...
> 
> >> Together with the earlier paragraph the critical window would be 
> >> between the
> >> CPU being taken off of cpu_online_map and the CPU actually going 
> >> "dead" (i.e.
> >> on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And 
> >> even
> >> then the question would be what bad, if any, would happen to that CPU 
> >> if an
> >> IPI was still targeted at it by way of using the shorthand. I'm pretty 
> >> sure
> >> it runs with IRQs off at that time, so no ordinary IRQ could be 
> >> delivered.
> >>
> >>> Adjust the logic in get_cpu_maps() to return false when the CPUs lock 
> >>> is
> >>> already hold in write mode by the current CPU, as read_trylock() would
> >>> otherwise return true.
> >>>
> >>> Fixes: 868a01021c6f ('rwlock: allow

Re: [PATCH 12/12] block: add special APIs for run-time disabling of discard and friends

2024-05-31 Thread John Garry


On 29/05/2024 06:04, Christoph Hellwig wrote:

A few drivers optimistically try to support discard, write zeroes and
secure erase and disable the features from the I/O completion handler
if the hardware can't support them.  This disable can't be done using
the atomic queue limits API because the I/O completion handlers can't
take sleeping locks or freezer the queue.  Keep the existing clearing
of the relevant field to zero, but replace the old blk_queue_max_*
APIs with new disable APIs that force the value to 0.

Signed-off-by: Christoph Hellwig


It would be nice to improve this eventually (to use the atomic queue 
limits API). Anyway:

Reviewed-by: John Garry

Re: [PATCH 11/12] block: remove unused queue limits API

2024-05-31 Thread John Garry


On 29/05/2024 06:04, Christoph Hellwig wrote:

Remove all APIs that are unused now that sd and sr have been converted
to the atomic queue limits API.

Signed-off-by: Christoph Hellwig
---



Reviewed-by: John Garry

Re: [PATCH 04/14] block: take io_opt and io_min into account for max_sectors

2024-05-31 Thread Ilya Dryomov

On Fri, May 31, 2024 at 9:48 AM Christoph Hellwig  wrote:
>
> The soft max_sectors limit is normally capped by the hardware limits and
> an arbitrary upper limit enforced by the kernel, but can be modified by
> the user.  A few drivers want to increase this limit (nbd, rbd) or
> adjust it up or down based on hardware capabilities (sd).
>
> Change blk_validate_limits to default max_sectors to the optimal I/O
> size, or upgrade it to the preferred minimal I/O size if that is
> larger than the kernel default if no optimal I/O size is provided based
> on the logic in the SD driver.
>
> This keeps the existing kernel default for drivers that do not provide
> an io_opt or very big io_min value, but picks a much more useful
> default for those who provide these hints, and allows to remove the
> hacks to set the user max_sectors limit in nbd, rbd and sd.
>
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Bart Van Assche 
> Reviewed-by: Damien Le Moal 
> ---
>  block/blk-settings.c |  7 +++
>  drivers/block/nbd.c  |  2 +-

For rbd

>  drivers/block/rbd.c  |  1 -

Acked-by: Ilya Dryomov 

Thanks,

Ilya

Re: [PATCH v2 3/7] hw/xen: Constify xenstore_be::XenDevOps

2024-05-31 Thread Paul Durrant


On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote:

XenDevOps @ops is not updated, mark it const.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/xen/xen-legacy-backend.h | 2 +-
  hw/xen/xen-legacy-backend.c | 6 +++---
  2 files changed, 4 insertions(+), 4 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 2/7] hw/xen: Constify XenLegacyDevice::XenDevOps

2024-05-31 Thread Paul Durrant


On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote:

XenDevOps @ops is not updated, mark it const.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/xen/xen_pvdev.h  | 2 +-
  hw/xen/xen-legacy-backend.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH 03/14] rbd: increase io_opt again

2024-05-31 Thread Ilya Dryomov

On Fri, May 31, 2024 at 9:48 AM Christoph Hellwig  wrote:
>
> Commit 16d80c54ad42 ("rbd: set io_min, io_opt and discard_granularity to
> alloc_size") lowered the io_opt size for rbd from objset_bytes which is
> 4MB for typical setup to alloc_size which is typically 64KB.
>
> The commit mostly talks about discard behavior and does mention io_min
> in passing.  Reducing io_opt means reducing the readahead size, which
> seems counter-intuitive given that rbd currently abuses the user
> max_sectors setting to actually increase the I/O size.  Switch back
> to the old setting to allow larger reads (the readahead size despite it's
> name actually limits the size of any buffered read) and to prepare
> for using io_opt in the max_sectors calculation and getting drivers out
> of the business of overriding the max_user_sectors value.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/rbd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 26ff5cd2bf0abc..46dc487ccc17eb 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -4955,8 +4955,8 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
> struct queue_limits lim = {
> .max_hw_sectors = objset_bytes >> SECTOR_SHIFT,
> .max_user_sectors   = objset_bytes >> SECTOR_SHIFT,
> +   .io_opt = objset_bytes,
> .io_min = rbd_dev->opts->alloc_size,
> -   .io_opt = rbd_dev->opts->alloc_size,
> .max_segments   = USHRT_MAX,
> .max_segment_size   = UINT_MAX,
> };
> --
> 2.43.0
>

Acked-by: Ilya Dryomov 

Thanks,

Ilya

Re: [PATCH v2 1/7] hw/xen: Remove declarations left over in 'xen-legacy-backend.h'

2024-05-31 Thread Paul Durrant


On 10/05/2024 11:49, Philippe Mathieu-Daudé wrote:

'xen_blkdev_ops' was removed in commit 19f87870ba ("xen: remove
the legacy 'xen_disk' backend"), 'xen_netdev_ops' in commit
25967ff69f ("hw/xen: update Xen PV NIC to XenDevice model") and
'xen_console_ops' in commit 9b77374690 ("hw/xen: update Xen
console to XenDevice model"). Remove them.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/xen/xen-legacy-backend.h | 3 ---
  1 file changed, 3 deletions(-)



Reviewed-by: Paul Durrant

Re: [PATCH v2 06/13] xen/bitops: Implement ffs() in common logic

2024-05-31 Thread Andrew Cooper

On 31/05/2024 9:34 am, Andrew Cooper wrote:
> On 31/05/2024 7:56 am, Nicola Vetrini wrote:
>> On 2024-05-31 03:14, Stefano Stabellini wrote:
>>> On Fri, 24 May 2024, Andrew Cooper wrote:
 Perform constant-folding unconditionally, rather than having it
 implemented
 inconsistency between architectures.

 Confirm the expected behaviour with compile time and boot time tests.

 For non-constant inputs, use arch_ffs() if provided but fall back to
 generic_ffsl() if not.  In particular, RISC-V doesn't have a builtin
 that
 works in all configurations.

 For x86, rename ffs() to arch_ffs() and adjust the prototype.

 For PPC, __builtin_ctz() is 1/3 of the size of size of the transform to
 generic_fls().  Drop the definition entirely.  ARM too benefits in
 the general
 case by using __builtin_ctz(), but less dramatically because it using
 optimised asm().

 Signed-off-by: Andrew Cooper 
>>> This patch made me realize that we should add __builtin_ctz,
>>> __builtin_constant_p and always_inline to
>>> docs/misra/C-language-toolchain.rst as they don't seem to be currently
>>> documented and they are not part of the C standard
>>>
>>> Patch welcome :-)
>>>
>> I can send a patch for the builtins.
> That's very kind of you.
>
> In total by the end of this series, we've got __builtin_constant_p() 
> (definitely used elsewhere already), and __builtin_{ffs,ctz,clz}{,l}() 
> (3x primitives, 2x input types).
>
> If we're going for a list of the primitive operations, lets add
> __builtin_popcnt{,l}() too right away, because if it weren't for 4.19
> code freeze, I'd have cleaned up the hweight() helpers too.

Oh, and it's worth noting that __builtin_{ctz,clz}{,l}() have explicit
UB if given an input of 0.  (Sadly, even on architectures where the
underlying instruction emitted is safe with a 0 input. [0])

This is why every patch in the series using them checks for nonzero input.

UBSAN (with an adequate compiler) will instrument this, and Xen has
__ubsan_handle_invalid_builtin() to diagnose these.

~Andrew

[0] It turns out that Clang has a 2-argument form of the builtin with
the second being the "value forwarded" in case the first is 0.  I've not
investigated whether GCC has the same.

[linux-linus test] 186202: tolerable FAIL - PUSHED

2024-05-31 Thread osstest service owner

flight 186202 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186202/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-raw   8 xen-boot fail  like 186174
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186174
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186174
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186174
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186174
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186174
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186174
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux4a4be1ad3a6efea16c56615f31117590fd881358
baseline version:
 linuxe0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc

Last test of basis   186174  2024-05-28 18:10:31 Z2 days
Testing same since   186185  2024-05-29 17:42:32 Z1 days4 attempts


People who touched revisions under test:
  Dominique Martinet 
  Herbert Xu 
  Linus Torvalds 
  Nikita Zhandarovich 
  Nícolas F. R. A. Prado 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt

Re: [PATCH v2 06/13] xen/bitops: Implement ffs() in common logic

2024-05-31 Thread Andrew Cooper

On 31/05/2024 7:56 am, Nicola Vetrini wrote:
> On 2024-05-31 03:14, Stefano Stabellini wrote:
>> On Fri, 24 May 2024, Andrew Cooper wrote:
>>> Perform constant-folding unconditionally, rather than having it
>>> implemented
>>> inconsistency between architectures.
>>>
>>> Confirm the expected behaviour with compile time and boot time tests.
>>>
>>> For non-constant inputs, use arch_ffs() if provided but fall back to
>>> generic_ffsl() if not.  In particular, RISC-V doesn't have a builtin
>>> that
>>> works in all configurations.
>>>
>>> For x86, rename ffs() to arch_ffs() and adjust the prototype.
>>>
>>> For PPC, __builtin_ctz() is 1/3 of the size of size of the transform to
>>> generic_fls().  Drop the definition entirely.  ARM too benefits in
>>> the general
>>> case by using __builtin_ctz(), but less dramatically because it using
>>> optimised asm().
>>>
>>> Signed-off-by: Andrew Cooper 
>>
>> This patch made me realize that we should add __builtin_ctz,
>> __builtin_constant_p and always_inline to
>> docs/misra/C-language-toolchain.rst as they don't seem to be currently
>> documented and they are not part of the C standard
>>
>> Patch welcome :-)
>>
>
> I can send a patch for the builtins.

That's very kind of you.

In total by the end of this series, we've got __builtin_constant_p() 
(definitely used elsewhere already), and __builtin_{ffs,ctz,clz}{,l}() 
(3x primitives, 2x input types).

If we're going for a list of the primitive operations, lets add
__builtin_popcnt{,l}() too right away, because if it weren't for 4.19
code freeze, I'd have cleaned up the hweight() helpers too.

~Andrew

Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway

2024-05-31 Thread Jan Beulich

On 31.05.2024 09:31, Roger Pau Monné wrote:
> On Fri, May 31, 2024 at 09:02:20AM +0200, Jan Beulich wrote:
>> On 29.05.2024 18:14, Roger Pau Monné wrote:
>>> On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote:
 On 29.05.2024 17:03, Roger Pau Monné wrote:
> On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote:
>> On 29.05.2024 11:01, Roger Pau Monne wrote:
>>> Due to the current rwlock logic, if the CPU calling get_cpu_maps() does 
>>> so from
>>> a cpu_hotplug_{begin,done}() region the function will still return 
>>> success,
>>> because a CPU taking the rwlock in read mode after having taken it in 
>>> write
>>> mode is allowed.  Such behavior however defeats the purpose of 
>>> get_cpu_maps(),
>>> as it should always return false when called with a CPU hot{,un}plug 
>>> operation
>>> is in progress.
>>
>> I'm not sure I can agree with this. The CPU doing said operation ought 
>> to be
>> aware of what it is itself doing. And all other CPUs will get back false 
>> from
>> get_cpu_maps().
>
> Well, the CPU is aware in the context of cpu_{up,down}(), but not in
> the interrupts that might be handled while that operation is in
> progress, see below for a concrete example.
>
>>>  Otherwise the logic in send_IPI_mask() for example is wrong,
>>> as it could decide to use the shorthand even when a CPU operation is in
>>> progress.
>>
>> It's also not becoming clear what's wrong there: As long as a CPU isn't
>> offline enough to not be in cpu_online_map anymore, it may well need to 
>> still
>> be the target of IPIs, and targeting it with a shorthand then is still 
>> fine.
>
> The issue is in the online path: there's a window where the CPU is
> online (and the lapic active), but cpu_online_map hasn't been updated
> yet.  A specific example would be time_calibration() being executed on
> the CPU that is running cpu_up().  That could result in a shorthand
> IPI being used, but the mask in r.cpu_calibration_map not containing
> the CPU that's being brought up online because it's not yet added to
> cpu_online_map.  Then the number of CPUs actually running
> time_calibration_rendezvous_fn won't match the weight of the cpumask
> in r.cpu_calibration_map.

 I see, but maybe only partly. Prior to the CPU having its bit set in
 cpu_online_map, can it really take interrupts already? Shouldn't it be
 running with IRQs off until later, thus preventing it from making it
 into the rendezvous function in the first place? But yes, I can see
 how the IRQ (IPI) then being delivered later (once IRQs are enabled)
 might cause problems, too.
>>>
>>> The interrupt will get set in IRR and handled when interrupts are
>>> enabled.
>>>

 Plus, with how the rendezvous function is invoked (via
 on_selected_cpus() with the mask copied from cpu_online_map), the
 first check in smp_call_function_interrupt() ought to prevent the
 function from being called on the CPU being onlined. A problem would
 arise though if the IPI arrived later and call_data was already
 (partly or fully) overwritten with the next request.
>>>
>>> Yeah, there's a small window where the fields in call_data are out of
>>> sync.
>>>
>> In any event this would again affect only the CPU leading the CPU 
>> operation,
>> which should clearly know at which point(s) it is okay to send IPIs. Are 
>> we
>> actually sending any IPIs from within CPU-online or CPU-offline paths?
>
> Yes, I've seen the time rendezvous happening while in the middle of a
> hotplug operation, and the CPU coordinating the rendezvous being the
> one doing the CPU hotplug operation, so get_cpu_maps() returning true.

 Right, yet together with ...

>> Together with the earlier paragraph the critical window would be between 
>> the
>> CPU being taken off of cpu_online_map and the CPU actually going "dead" 
>> (i.e.
>> on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And 
>> even
>> then the question would be what bad, if any, would happen to that CPU if 
>> an
>> IPI was still targeted at it by way of using the shorthand. I'm pretty 
>> sure
>> it runs with IRQs off at that time, so no ordinary IRQ could be 
>> delivered.
>>
>>> Adjust the logic in get_cpu_maps() to return false when the CPUs lock is
>>> already hold in write mode by the current CPU, as read_trylock() would
>>> otherwise return true.
>>>
>>> Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already 
>>> locked in write mode')
>>
>> I'm puzzled by this as well: Prior to that and the change referenced by 
>> its
>> Fixes: tag, recursive spin locks were used. For the purposes here that's 
>> the
>> same as permitting read

Re: [XEN PATCH v2 13/15] x86/ioreq: guard VIO_realmode_completion with CONFIG_VMX

2024-05-31 Thread Jan Beulich

On 31.05.2024 10:05, Sergiy Kibrik wrote:
> 16.05.24 15:11, Jan Beulich:
>> On 15.05.2024 11:24, Sergiy Kibrik wrote:
>>> --- a/xen/arch/x86/hvm/emulate.c
>>> +++ b/xen/arch/x86/hvm/emulate.c
>>> @@ -2667,7 +2667,9 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt 
>>> *hvmemul_ctxt,
>>>   break;
>>>   
>>>   case VIO_mmio_completion:
>>> +#ifdef CONFIG_VMX
>>>   case VIO_realmode_completion:
>>> +#endif
>>>   BUILD_BUG_ON(sizeof(hvio->mmio_insn) < 
>>> sizeof(hvmemul_ctxt->insn_buf));
>>>   hvio->mmio_insn_bytes = hvmemul_ctxt->insn_buf_bytes;
>>>   memcpy(hvio->mmio_insn, hvmemul_ctxt->insn_buf, 
>>> hvio->mmio_insn_bytes);
>>
>> This change doesn't buy us anything, does it?
> 
> why not? Code won't compile w/o it.
> Or do you mean hiding the whole VIO_realmode_completion enum under 
> CONFIG_VMX wasn't really useful?

That's what I meant, by implication. To me it's extra #ifdef-ary without
real gain.

Jan

Re: [PATCH 1/2] arch/irq: Make irq_ack_none() mandatory

2024-05-31 Thread Andrew Cooper

On 31/05/2024 7:42 am, Jan Beulich wrote:
> On 30.05.2024 20:40, Andrew Cooper wrote:
>> Any non-stub implementation of these is going to have to do something here.
> For whatever definition of "something", seeing ...
>
>> --- a/xen/arch/arm/irq.c
>> +++ b/xen/arch/arm/irq.c
>> @@ -31,12 +31,12 @@ struct irq_guest
>>  unsigned int virq;
>>  };
>>  
>> -static void ack_none(struct irq_desc *irq)
>> +void irq_ack_none(struct irq_desc *irq)
>>  {
>>  printk("unexpected IRQ trap at irq %02x\n", irq->irq);
>>  }
> ... this, which - perhaps apart from the word "trap" - is entirely Arm-
> independent, and could hence quite well live in a common code fallback
> implementation.

Not really.

On ARM, ack()+end() are both mandatory and it's end() which is taking
the useful action.

On x86, ack() has the effect and end() is optional (and has a different
prototype even!)


>  Nevertheless with patch 2 clearly being an improvement,
> both patches:
> Reviewed-by: Jan Beulich 

Thanks.

~Andrew

[XEN PATCH RFC] ioreq: make arch_vcpu_ioreq_completion() an optional callback

2024-05-31 Thread Sergiy Kibrik

For the most cases arch_vcpu_ioreq_completion() routine is just an empty stub,
except when handling VIO_realmode_completion, which only happens on HVM
domains running on VT-x machine. When VT-x is disabled in build configuration,
both x86 & arm version of routine become empty stubs.
To dispose of these useless stubs we can do optional call to arch-specific
ioreq completion handler, if it's present, and drop arm and generic x86 
handlers.
Actual handling of VIO_realmore_completion can be done by VMX code then.

Signed-off-by: Sergiy Kibrik 
---
 xen/arch/arm/ioreq.c   |  6 --
 xen/arch/x86/hvm/ioreq.c   | 25 -
 xen/arch/x86/hvm/vmx/vmx.c | 15 +++
 xen/common/ioreq.c |  5 -
 xen/include/xen/ioreq.h|  2 +-
 5 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 5df755b48b..2e829d2e7f 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -135,12 +135,6 @@ bool arch_ioreq_complete_mmio(void)
 return false;
 }
 
-bool arch_vcpu_ioreq_completion(enum vio_completion completion)
-{
-ASSERT_UNREACHABLE();
-return true;
-}
-
 /*
  * The "legacy" mechanism of mapping magic pages for the IOREQ servers
  * is x86 specific, so the following hooks don't need to be implemented on Arm:
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index b37bbd660b..088650e007 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -29,31 +29,6 @@ bool arch_ioreq_complete_mmio(void)
 return handle_mmio();
 }
 
-bool arch_vcpu_ioreq_completion(enum vio_completion completion)
-{
-switch ( completion )
-{
-#ifdef CONFIG_VMX
-case VIO_realmode_completion:
-{
-struct hvm_emulate_ctxt ctxt;
-
-hvm_emulate_init_once(, NULL, guest_cpu_user_regs());
-vmx_realmode_emulate_one();
-hvm_emulate_writeback();
-
-break;
-}
-#endif
-
-default:
-ASSERT_UNREACHABLE();
-break;
-}
-
-return true;
-}
-
 static gfn_t hvm_alloc_legacy_ioreq_gfn(struct ioreq_server *s)
 {
 struct domain *d = s->target;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 8ba996546f..4f9be50fe7 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2749,6 +2749,20 @@ static void cf_check vmx_set_reg(struct vcpu *v, 
unsigned int reg, uint64_t val)
 vmx_vmcs_exit(v);
 }
 
+bool realmode_vcpu_ioreq_completion(enum vio_completion completion)
+{
+struct hvm_emulate_ctxt ctxt;
+
+if ( completion != VIO_realmode_completion )
+ASSERT_UNREACHABLE();
+
+hvm_emulate_init_once(, NULL, guest_cpu_user_regs());
+vmx_realmode_emulate_one();
+hvm_emulate_writeback();
+
+return true;
+}
+
 static struct hvm_function_table __initdata_cf_clobber vmx_function_table = {
 .name = "VMX",
 .cpu_up_prepare   = vmx_cpu_up_prepare,
@@ -3070,6 +3084,7 @@ const struct hvm_function_table * __init start_vmx(void)
 lbr_tsx_fixup_check();
 ler_to_fixup_check();
 
+arch_vcpu_ioreq_completion = realmode_vcpu_ioreq_completion;
 return _function_table;
 }
 
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 1257a3d972..94fde97ece 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -33,6 +33,8 @@
 #include 
 #include 
 
+bool (*arch_vcpu_ioreq_completion)(enum vio_completion completion) = NULL;
+
 void ioreq_request_mapcache_invalidate(const struct domain *d)
 {
 struct vcpu *v = current;
@@ -244,7 +246,8 @@ bool vcpu_ioreq_handle_completion(struct vcpu *v)
 break;
 
 default:
-res = arch_vcpu_ioreq_completion(completion);
+if ( arch_vcpu_ioreq_completion )
+res = arch_vcpu_ioreq_completion(completion);
 break;
 }
 
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index cd399adf17..880214ec41 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -111,7 +111,7 @@ void ioreq_domain_init(struct domain *d);
 int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op);
 
 bool arch_ioreq_complete_mmio(void);
-bool arch_vcpu_ioreq_completion(enum vio_completion completion);
+extern bool (*arch_vcpu_ioreq_completion)(enum vio_completion completion);
 int arch_ioreq_server_map_pages(struct ioreq_server *s);
 void arch_ioreq_server_unmap_pages(struct ioreq_server *s);
 void arch_ioreq_server_enable(struct ioreq_server *s);
-- 
2.25.1

Re: [XEN PATCH v2 13/15] x86/ioreq: guard VIO_realmode_completion with CONFIG_VMX

2024-05-31 Thread Sergiy Kibrik


16.05.24 15:11, Jan Beulich:

On 15.05.2024 11:24, Sergiy Kibrik wrote:

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2667,7 +2667,9 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt 
*hvmemul_ctxt,
  break;
  
  case VIO_mmio_completion:

+#ifdef CONFIG_VMX
  case VIO_realmode_completion:
+#endif
  BUILD_BUG_ON(sizeof(hvio->mmio_insn) < 
sizeof(hvmemul_ctxt->insn_buf));
  hvio->mmio_insn_bytes = hvmemul_ctxt->insn_buf_bytes;
  memcpy(hvio->mmio_insn, hvmemul_ctxt->insn_buf, 
hvio->mmio_insn_bytes);


This change doesn't buy us anything, does it?


why not? Code won't compile w/o it.
Or do you mean hiding the whole VIO_realmode_completion enum under 
CONFIG_VMX wasn't really useful?



--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -33,6 +33,7 @@ bool arch_vcpu_ioreq_completion(enum vio_completion 
completion)
  {
  switch ( completion )
  {
+#ifdef CONFIG_VMX
  case VIO_realmode_completion:
  {
  struct hvm_emulate_ctxt ctxt;
@@ -43,6 +44,7 @@ bool arch_vcpu_ioreq_completion(enum vio_completion 
completion)
  
  break;

  }
+#endif
  
  default:

  ASSERT_UNREACHABLE();


And while this change is needed for the goal of the series, I wonder whether
it wouldn't better be arch_vcpu_ioreq_completion() as whole that then gets
stubbed out.



I'll post a patch to this thread to confirm whether I got your point 
correctly.


 -Sergiy

[PATCH 12/14] sr: convert to the atomic queue limits API

2024-05-31 Thread Christoph Hellwig

Assign all queue limits through a local queue_limits variable and
queue_limits_commit_update so that we can't race updating them from
multiple places, and free the queue when updating them so that
in-progress I/O submissions don't see half-updated limits.

Also use the chance to clean up variable names to standard ones.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sr.c | 42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 7ab000942b97fc..3f491019103e0c 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -111,7 +111,7 @@ static struct lock_class_key sr_bio_compl_lkclass;
 static int sr_open(struct cdrom_device_info *, int);
 static void sr_release(struct cdrom_device_info *);
 
-static void get_sectorsize(struct scsi_cd *);
+static int get_sectorsize(struct scsi_cd *);
 static int get_capabilities(struct scsi_cd *);
 
 static unsigned int sr_check_events(struct cdrom_device_info *cdi,
@@ -473,15 +473,15 @@ static blk_status_t sr_init_command(struct scsi_cmnd 
*SCpnt)
return BLK_STS_IOERR;
 }
 
-static void sr_revalidate_disk(struct scsi_cd *cd)
+static int sr_revalidate_disk(struct scsi_cd *cd)
 {
struct scsi_sense_hdr sshdr;
 
/* if the unit is not ready, nothing more to do */
if (scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, ))
-   return;
+   return 0;
sr_cd_check(>cdi);
-   get_sectorsize(cd);
+   return get_sectorsize(cd);
 }
 
 static int sr_block_open(struct gendisk *disk, blk_mode_t mode)
@@ -494,13 +494,16 @@ static int sr_block_open(struct gendisk *disk, blk_mode_t 
mode)
return -ENXIO;
 
scsi_autopm_get_device(sdev);
-   if (disk_check_media_change(disk))
-   sr_revalidate_disk(cd);
+   if (disk_check_media_change(disk)) {
+   ret = sr_revalidate_disk(cd);
+   if (ret)
+   goto out;
+   }
 
mutex_lock(>lock);
ret = cdrom_open(>cdi, mode);
mutex_unlock(>lock);
-
+out:
scsi_autopm_put_device(sdev);
if (ret)
scsi_device_put(cd->device);
@@ -685,7 +688,9 @@ static int sr_probe(struct device *dev)
blk_pm_runtime_init(sdev->request_queue, dev);
 
dev_set_drvdata(dev, cd);
-   sr_revalidate_disk(cd);
+   error = sr_revalidate_disk(cd);
+   if (error)
+   goto unregister_cdrom;
 
error = device_add_disk(>sdev_gendev, disk, NULL);
if (error)
@@ -714,13 +719,14 @@ static int sr_probe(struct device *dev)
 }
 
 
-static void get_sectorsize(struct scsi_cd *cd)
+static int get_sectorsize(struct scsi_cd *cd)
 {
+   struct request_queue *q = cd->device->request_queue;
static const u8 cmd[10] = { READ_CAPACITY };
unsigned char buffer[8] = { };
-   int the_result;
+   struct queue_limits lim;
+   int err;
int sector_size;
-   struct request_queue *queue;
struct scsi_failure failure_defs[] = {
{
.result = SCMD_FAILURE_RESULT_ANY,
@@ -736,10 +742,10 @@ static void get_sectorsize(struct scsi_cd *cd)
};
 
/* Do the command and wait.. */
-   the_result = scsi_execute_cmd(cd->device, cmd, REQ_OP_DRV_IN, buffer,
+   err = scsi_execute_cmd(cd->device, cmd, REQ_OP_DRV_IN, buffer,
  sizeof(buffer), SR_TIMEOUT, MAX_RETRIES,
  _args);
-   if (the_result) {
+   if (err) {
cd->capacity = 0x1f;
sector_size = 2048; /* A guess, just in case */
} else {
@@ -789,10 +795,12 @@ static void get_sectorsize(struct scsi_cd *cd)
set_capacity(cd->disk, cd->capacity);
}
 
-   queue = cd->device->request_queue;
-   blk_queue_logical_block_size(queue, sector_size);
-
-   return;
+   lim = queue_limits_start_update(q);
+   lim.logical_block_size = sector_size;
+   blk_mq_freeze_queue(q);
+   err = queue_limits_commit_update(q, );
+   blk_mq_unfreeze_queue(q);
+   return err;
 }
 
 static int get_capabilities(struct scsi_cd *cd)
-- 
2.43.0

[PATCH 13/14] block: remove unused queue limits API

2024-05-31 Thread Christoph Hellwig

Remove all APIs that are unused now that sd and sr have been converted
to the atomic queue limits API.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
---
 block/blk-settings.c   | 190 -
 include/linux/blkdev.h |  24 --
 2 files changed, 214 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index a49abdb3554834..0b038729608f4b 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -293,24 +293,6 @@ int queue_limits_set(struct request_queue *q, struct 
queue_limits *lim)
 }
 EXPORT_SYMBOL_GPL(queue_limits_set);
 
-/**
- * blk_queue_chunk_sectors - set size of the chunk for this queue
- * @q:  the request queue for the device
- * @chunk_sectors:  chunk sectors in the usual 512b unit
- *
- * Description:
- *If a driver doesn't want IOs to cross a given chunk size, it can set
- *this limit and prevent merging across chunks. Note that the block layer
- *must accept a page worth of data at any offset. So if the crossing of
- *chunks is a hard limitation in the driver, it must still be prepared
- *to split single page bios.
- **/
-void blk_queue_chunk_sectors(struct request_queue *q, unsigned int 
chunk_sectors)
-{
-   q->limits.chunk_sectors = chunk_sectors;
-}
-EXPORT_SYMBOL(blk_queue_chunk_sectors);
-
 /**
  * blk_queue_max_discard_sectors - set max sectors for a single discard
  * @q:  the request queue for the device
@@ -352,139 +334,6 @@ void blk_queue_max_write_zeroes_sectors(struct 
request_queue *q,
 }
 EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 
-/**
- * blk_queue_max_zone_append_sectors - set max sectors for a single zone append
- * @q:  the request queue for the device
- * @max_zone_append_sectors: maximum number of sectors to write per command
- *
- * Sets the maximum number of sectors allowed for zone append commands. If
- * Specifying 0 for @max_zone_append_sectors indicates that the queue does
- * not natively support zone append operations and that the block layer must
- * emulate these operations using regular writes.
- **/
-void blk_queue_max_zone_append_sectors(struct request_queue *q,
-   unsigned int max_zone_append_sectors)
-{
-   unsigned int max_sectors = 0;
-
-   if (WARN_ON(!blk_queue_is_zoned(q)))
-   return;
-
-   if (max_zone_append_sectors) {
-   max_sectors = min(q->limits.max_hw_sectors,
- max_zone_append_sectors);
-   max_sectors = min(q->limits.chunk_sectors, max_sectors);
-
-   /*
-* Signal eventual driver bugs resulting in the max_zone_append
-* sectors limit being 0 due to the chunk_sectors limit (zone
-* size) not set or the max_hw_sectors limit not set.
-*/
-   WARN_ON_ONCE(!max_sectors);
-   }
-
-   q->limits.max_zone_append_sectors = max_sectors;
-}
-EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors);
-
-/**
- * blk_queue_logical_block_size - set logical block size for the queue
- * @q:  the request queue for the device
- * @size:  the logical block size, in bytes
- *
- * Description:
- *   This should be set to the lowest possible block size that the
- *   storage device can address.  The default of 512 covers most
- *   hardware.
- **/
-void blk_queue_logical_block_size(struct request_queue *q, unsigned int size)
-{
-   struct queue_limits *limits = >limits;
-
-   limits->logical_block_size = size;
-
-   if (limits->discard_granularity < limits->logical_block_size)
-   limits->discard_granularity = limits->logical_block_size;
-
-   if (limits->physical_block_size < size)
-   limits->physical_block_size = size;
-
-   if (limits->io_min < limits->physical_block_size)
-   limits->io_min = limits->physical_block_size;
-
-   limits->max_hw_sectors =
-   round_down(limits->max_hw_sectors, size >> SECTOR_SHIFT);
-   limits->max_sectors =
-   round_down(limits->max_sectors, size >> SECTOR_SHIFT);
-}
-EXPORT_SYMBOL(blk_queue_logical_block_size);
-
-/**
- * blk_queue_physical_block_size - set physical block size for the queue
- * @q:  the request queue for the device
- * @size:  the physical block size, in bytes
- *
- * Description:
- *   This should be set to the lowest possible sector size that the
- *   hardware can operate on without reverting to read-modify-write
- *   operations.
- */
-void blk_queue_physical_block_size(struct request_queue *q, unsigned int size)
-{
-   q->limits.physical_block_size = size;
-
-   if (q->limits.physical_block_size < q->limits.logical_block_size)
-   q->limits.physical_block_size = q->limits.logical_block_size;
-
-   if (q->limits.discard_granularity < q->limits.physical_block_size)
-   q->limits.discard_granularity = q->limits.physical_block_size;
-
-   if (q->limits.io_min < q->limits.physical_block_size)

[PATCH 11/14] sd: convert to the atomic queue limits API

2024-05-31 Thread Christoph Hellwig

Assign all queue limits through a local queue_limits variable and
queue_limits_commit_update so that we can't race updating them from
multiple places, and freeze the queue when updating them so that
in-progress I/O submissions don't see half-updated limits.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sd.c | 130 --
 drivers/scsi/sd.h |   6 +-
 drivers/scsi/sd_zbc.c |  15 ++---
 3 files changed, 85 insertions(+), 66 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 39eddfac09ef8f..049071b5681989 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -101,12 +101,13 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
 #define SD_MINORS  16
 
-static void sd_config_discard(struct scsi_disk *, unsigned int);
-static void sd_config_write_same(struct scsi_disk *);
+static void sd_config_discard(struct scsi_disk *sdkp, struct queue_limits *lim,
+   unsigned int mode);
+static void sd_config_write_same(struct scsi_disk *sdkp,
+   struct queue_limits *lim);
 static int  sd_revalidate_disk(struct gendisk *);
 static void sd_unlock_native_capacity(struct gendisk *disk);
 static void sd_shutdown(struct device *);
-static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
 static void scsi_disk_release(struct device *cdev);
 
 static DEFINE_IDA(sd_index_ida);
@@ -456,7 +457,8 @@ provisioning_mode_store(struct device *dev, struct 
device_attribute *attr,
 {
struct scsi_disk *sdkp = to_scsi_disk(dev);
struct scsi_device *sdp = sdkp->device;
-   int mode;
+   struct queue_limits lim;
+   int mode, err;
 
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -472,8 +474,13 @@ provisioning_mode_store(struct device *dev, struct 
device_attribute *attr,
if (mode < 0)
return -EINVAL;
 
-   sd_config_discard(sdkp, mode);
-
+   lim = queue_limits_start_update(sdkp->disk->queue);
+   sd_config_discard(sdkp, , mode);
+   blk_mq_freeze_queue(sdkp->disk->queue);
+   err = queue_limits_commit_update(sdkp->disk->queue, );
+   blk_mq_unfreeze_queue(sdkp->disk->queue);
+   if (err)
+   return err;
return count;
 }
 static DEVICE_ATTR_RW(provisioning_mode);
@@ -556,6 +563,7 @@ max_write_same_blocks_store(struct device *dev, struct 
device_attribute *attr,
 {
struct scsi_disk *sdkp = to_scsi_disk(dev);
struct scsi_device *sdp = sdkp->device;
+   struct queue_limits lim;
unsigned long max;
int err;
 
@@ -577,8 +585,13 @@ max_write_same_blocks_store(struct device *dev, struct 
device_attribute *attr,
sdkp->max_ws_blocks = max;
}
 
-   sd_config_write_same(sdkp);
-
+   lim = queue_limits_start_update(sdkp->disk->queue);
+   sd_config_write_same(sdkp, );
+   blk_mq_freeze_queue(sdkp->disk->queue);
+   err = queue_limits_commit_update(sdkp->disk->queue, );
+   blk_mq_unfreeze_queue(sdkp->disk->queue);
+   if (err)
+   return err;
return count;
 }
 static DEVICE_ATTR_RW(max_write_same_blocks);
@@ -827,17 +840,15 @@ static void sd_disable_discard(struct scsi_disk *sdkp)
blk_queue_max_discard_sectors(sdkp->disk->queue, 0);
 }
 
-static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
+static void sd_config_discard(struct scsi_disk *sdkp, struct queue_limits *lim,
+   unsigned int mode)
 {
-   struct request_queue *q = sdkp->disk->queue;
unsigned int logical_block_size = sdkp->device->sector_size;
unsigned int max_blocks = 0;
 
-   q->limits.discard_alignment =
-   sdkp->unmap_alignment * logical_block_size;
-   q->limits.discard_granularity =
-   max(sdkp->physical_block_size,
-   sdkp->unmap_granularity * logical_block_size);
+   lim->discard_alignment = sdkp->unmap_alignment * logical_block_size;
+   lim->discard_granularity = max(sdkp->physical_block_size,
+   sdkp->unmap_granularity * logical_block_size);
sdkp->provisioning_mode = mode;
 
switch (mode) {
@@ -875,7 +886,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
break;
}
 
-   blk_queue_max_discard_sectors(q, max_blocks * (logical_block_size >> 
9));
+   lim->max_hw_discard_sectors = max_blocks *
+   (logical_block_size >> SECTOR_SHIFT);
 }
 
 static void *sd_set_special_bvec(struct request *rq, unsigned int data_len)
@@ -1010,9 +1022,9 @@ static void sd_disable_write_same(struct scsi_disk *sdkp)
blk_queue_max_write_zeroes_sectors(sdkp->disk->queue, 0);
 }
 
-static void sd_config_write_same(struct scsi_disk *sdkp)
+static void sd_config_write_same(struct scsi_disk *sdkp,
+   struct queue_limits *lim)
 {
-   struct request_queue *q = sdkp->disk->queue;
unsigned int

[PATCH 14/14] block: add special APIs for run-time disabling of discard and friends

2024-05-31 Thread Christoph Hellwig

A few drivers optimistically try to support discard, write zeroes and
secure erase and disable the features from the I/O completion handler
if the hardware can't support them.  This disable can't be done using
the atomic queue limits API because the I/O completion handlers can't
take sleeping locks or freeze the queue.  Keep the existing clearing
of the relevant field to zero, but replace the old blk_queue_max_*
APIs with new disable APIs that force the value to 0.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 arch/um/drivers/ubd_kern.c   |  4 ++--
 block/blk-settings.c | 41 
 drivers/block/xen-blkfront.c |  4 ++--
 drivers/scsi/sd.c|  4 ++--
 include/linux/blkdev.h   | 28 ++--
 5 files changed, 28 insertions(+), 53 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 093c87879d08ba..cdcb75a68989dd 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -451,9 +451,9 @@ static void ubd_end_request(struct io_thread_req *io_req)
 {
if (io_req->error == BLK_STS_NOTSUPP) {
if (req_op(io_req->req) == REQ_OP_DISCARD)
-   blk_queue_max_discard_sectors(io_req->req->q, 0);
+   blk_queue_disable_discard(io_req->req->q);
else if (req_op(io_req->req) == REQ_OP_WRITE_ZEROES)
-   blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
+   blk_queue_disable_write_zeroes(io_req->req->q);
}
blk_mq_end_request(io_req->req, io_req->error);
kfree(io_req);
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 0b038729608f4b..996f247fc98e80 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -293,47 +293,6 @@ int queue_limits_set(struct request_queue *q, struct 
queue_limits *lim)
 }
 EXPORT_SYMBOL_GPL(queue_limits_set);
 
-/**
- * blk_queue_max_discard_sectors - set max sectors for a single discard
- * @q:  the request queue for the device
- * @max_discard_sectors: maximum number of sectors to discard
- **/
-void blk_queue_max_discard_sectors(struct request_queue *q,
-   unsigned int max_discard_sectors)
-{
-   struct queue_limits *lim = >limits;
-
-   lim->max_hw_discard_sectors = max_discard_sectors;
-   lim->max_discard_sectors =
-   min(max_discard_sectors, lim->max_user_discard_sectors);
-}
-EXPORT_SYMBOL(blk_queue_max_discard_sectors);
-
-/**
- * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
- * @q:  the request queue for the device
- * @max_sectors: maximum number of sectors to secure_erase
- **/
-void blk_queue_max_secure_erase_sectors(struct request_queue *q,
-   unsigned int max_sectors)
-{
-   q->limits.max_secure_erase_sectors = max_sectors;
-}
-EXPORT_SYMBOL(blk_queue_max_secure_erase_sectors);
-
-/**
- * blk_queue_max_write_zeroes_sectors - set max sectors for a single
- *  write zeroes
- * @q:  the request queue for the device
- * @max_write_zeroes_sectors: maximum number of sectors to write per command
- **/
-void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
-   unsigned int max_write_zeroes_sectors)
-{
-   q->limits.max_write_zeroes_sectors = max_write_zeroes_sectors;
-}
-EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
-
 void disk_update_readahead(struct gendisk *disk)
 {
blk_apply_bdi_limits(disk->bdi, >queue->limits);
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index fd7c0ff2139cee..9b4ec3e4908cce 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1605,8 +1605,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
blkif_req(req)->error = BLK_STS_NOTSUPP;
info->feature_discard = 0;
info->feature_secdiscard = 0;
-   blk_queue_max_discard_sectors(rq, 0);
-   blk_queue_max_secure_erase_sectors(rq, 0);
+   blk_queue_disable_discard(rq);
+   blk_queue_disable_secure_erase(rq);
}
break;
case BLKIF_OP_FLUSH_DISKCACHE:
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 049071b5681989..d957e29b17a98a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -837,7 +837,7 @@ static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd 
*scmd,
 static void sd_disable_discard(struct scsi_disk *sdkp)
 {
sdkp->provisioning_mode = SD_LBP_DISABLE;
-   blk_queue_max_discard_sectors(sdkp->disk->queue, 0);
+   blk_queue_disable_discard(sdkp->disk->queue);
 }
 
 static void sd_config_discard(struct scsi_disk *sdkp, struct queue_limits *lim,
@@ -1019,7 +1019,7 @@ static void

[PATCH 09/14] sd: factor out a sd_discard_mode helper

2024-05-31 Thread Christoph Hellwig

Split the logic to pick the right discard mode into a little helper
to prepare for further changes.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sd.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 0dbc6eb7a7cac3..39eddfac09ef8f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3201,6 +3201,25 @@ static void sd_read_app_tag_own(struct scsi_disk *sdkp, 
unsigned char *buffer)
return;
 }
 
+static unsigned int sd_discard_mode(struct scsi_disk *sdkp)
+{
+   if (!sdkp->lbpvpd) {
+   /* LBP VPD page not provided */
+   if (sdkp->max_unmap_blocks)
+   return SD_LBP_UNMAP;
+   return SD_LBP_WS16;
+   }
+
+   /* LBP VPD page tells us what to use */
+   if (sdkp->lbpu && sdkp->max_unmap_blocks)
+   return SD_LBP_UNMAP;
+   if (sdkp->lbpws)
+   return SD_LBP_WS16;
+   if (sdkp->lbpws10)
+   return SD_LBP_WS10;
+   return SD_LBP_DISABLE;
+}
+
 /**
  * sd_read_block_limits - Query disk device for preferred I/O sizes.
  * @sdkp: disk to query
@@ -3239,23 +3258,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
sdkp->unmap_alignment =
get_unaligned_be32(>data[32]) & ~(1 << 31);
 
-   if (!sdkp->lbpvpd) { /* LBP VPD page not provided */
-
-   if (sdkp->max_unmap_blocks)
-   sd_config_discard(sdkp, SD_LBP_UNMAP);
-   else
-   sd_config_discard(sdkp, SD_LBP_WS16);
-
-   } else {/* LBP VPD page tells us what to use */
-   if (sdkp->lbpu && sdkp->max_unmap_blocks)
-   sd_config_discard(sdkp, SD_LBP_UNMAP);
-   else if (sdkp->lbpws)
-   sd_config_discard(sdkp, SD_LBP_WS16);
-   else if (sdkp->lbpws10)
-   sd_config_discard(sdkp, SD_LBP_WS10);
-   else
-   sd_config_discard(sdkp, SD_LBP_DISABLE);
-   }
+   sd_config_discard(sdkp, sd_discard_mode(sdkp));
}
 
  out:
-- 
2.43.0

[PATCH 07/14] sd: add a sd_disable_write_same helper

2024-05-31 Thread Christoph Hellwig

Add helper to disable WRITE SAME when it is not supported and use it
instead of sd_config_write_same in the I/O completion handler.  This
avoids touching more fields than required in the I/O completion handler
and  prepares for converting sd to use the atomic queue limits API.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sd.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f07d90474e682b..70211d0b187652 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1004,6 +1004,13 @@ static blk_status_t sd_setup_write_zeroes_cmnd(struct 
scsi_cmnd *cmd)
return sd_setup_write_same10_cmnd(cmd, false);
 }
 
+static void sd_disable_write_same(struct scsi_disk *sdkp)
+{
+   sdkp->device->no_write_same = 1;
+   sdkp->max_ws_blocks = 0;
+   blk_queue_max_write_zeroes_sectors(sdkp->disk->queue, 0);
+}
+
 static void sd_config_write_same(struct scsi_disk *sdkp)
 {
struct request_queue *q = sdkp->disk->queue;
@@ -2258,8 +2265,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
if (SCpnt->cmnd[1] & 8) { /* UNMAP */
sd_disable_discard(sdkp);
} else {
-   sdkp->device->no_write_same = 1;
-   sd_config_write_same(sdkp);
+   sd_disable_write_same(sdkp);
req->rq_flags |= RQF_QUIET;
}
break;
-- 
2.43.0

[PATCH 01/14] ubd: refactor the interrupt handler

2024-05-31 Thread Christoph Hellwig

Instead of a separate handler function that leaves no work in the
interrupt hanler itself, split out a per-request end I/O helper and
clean up the coding style and variable naming while we're at it.

Signed-off-by: Christoph Hellwig 
---
 arch/um/drivers/ubd_kern.c | 49 ++
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index ef805eaa9e013d..0c9542d58c01b7 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -447,43 +447,30 @@ static int bulk_req_safe_read(
return n;
 }
 
-/* Called without dev->lock held, and only in interrupt context. */
-static void ubd_handler(void)
+static void ubd_end_request(struct io_thread_req *io_req)
 {
-   int n;
-   int count;
-
-   while(1){
-   n = bulk_req_safe_read(
-   thread_fd,
-   irq_req_buffer,
-   _remainder,
-   _remainder_size,
-   UBD_REQ_BUFFER_SIZE
-   );
-   if (n < 0) {
-   if(n == -EAGAIN)
-   break;
-   printk(KERN_ERR "spurious interrupt in ubd_handler, "
-  "err = %d\n", -n);
-   return;
-   }
-   for (count = 0; count < n/sizeof(struct io_thread_req *); 
count++) {
-   struct io_thread_req *io_req = (*irq_req_buffer)[count];
-
-   if ((io_req->error == BLK_STS_NOTSUPP) && 
(req_op(io_req->req) == REQ_OP_DISCARD)) {
-   blk_queue_max_discard_sectors(io_req->req->q, 
0);
-   
blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
-   }
-   blk_mq_end_request(io_req->req, io_req->error);
-   kfree(io_req);
-   }
+   if (io_req->error == BLK_STS_NOTSUPP &&
+   req_op(io_req->req) == REQ_OP_DISCARD) {
+   blk_queue_max_discard_sectors(io_req->req->q, 0);
+   blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
}
+   blk_mq_end_request(io_req->req, io_req->error);
+   kfree(io_req);
 }
 
 static irqreturn_t ubd_intr(int irq, void *dev)
 {
-   ubd_handler();
+   int len, i;
+
+   while ((len = bulk_req_safe_read(thread_fd, irq_req_buffer,
+   _remainder, _remainder_size,
+   UBD_REQ_BUFFER_SIZE)) >= 0) {
+   for (i = 0; i < len / sizeof(struct io_thread_req *); i++)
+   ubd_end_request((*irq_req_buffer)[i]);
+   }
+
+   if (len < 0 && len != -EAGAIN)
+   pr_err("spurious interrupt in %s, err = %d\n", __func__, len);
return IRQ_HANDLED;
 }
 
-- 
2.43.0

[PATCH 03/14] rbd: increase io_opt again

2024-05-31 Thread Christoph Hellwig

Commit 16d80c54ad42 ("rbd: set io_min, io_opt and discard_granularity to
alloc_size") lowered the io_opt size for rbd from objset_bytes which is
4MB for typical setup to alloc_size which is typically 64KB.

The commit mostly talks about discard behavior and does mention io_min
in passing.  Reducing io_opt means reducing the readahead size, which
seems counter-intuitive given that rbd currently abuses the user
max_sectors setting to actually increase the I/O size.  Switch back
to the old setting to allow larger reads (the readahead size despite it's
name actually limits the size of any buffered read) and to prepare
for using io_opt in the max_sectors calculation and getting drivers out
of the business of overriding the max_user_sectors value.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/rbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 26ff5cd2bf0abc..46dc487ccc17eb 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4955,8 +4955,8 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
struct queue_limits lim = {
.max_hw_sectors = objset_bytes >> SECTOR_SHIFT,
.max_user_sectors   = objset_bytes >> SECTOR_SHIFT,
+   .io_opt = objset_bytes,
.io_min = rbd_dev->opts->alloc_size,
-   .io_opt = rbd_dev->opts->alloc_size,
.max_segments   = USHRT_MAX,
.max_segment_size   = UINT_MAX,
};
-- 
2.43.0

[PATCH 08/14] sd: simplify the disable case in sd_config_discard

2024-05-31 Thread Christoph Hellwig

Fall through to the main call to blk_queue_max_discard_sectors given that
max_blocks has been initialized to zero above instead of duplicating the
call.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 70211d0b187652..0dbc6eb7a7cac3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -844,8 +844,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
unsigned int mode)
 
case SD_LBP_FULL:
case SD_LBP_DISABLE:
-   blk_queue_max_discard_sectors(q, 0);
-   return;
+   break;
 
case SD_LBP_UNMAP:
max_blocks = min_not_zero(sdkp->max_unmap_blocks,
-- 
2.43.0

[PATCH 10/14] sd: cleanup zoned queue limits initialization

2024-05-31 Thread Christoph Hellwig

Consolidate setting zone-related queue limits in sd_zbc_read_zones
instead of splitting them between sd_zbc_revalidate_zones and
sd_zbc_read_zones, and move the early_zone_information initialization
in sd_zbc_read_zones above setting up the queue limits.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sd_zbc.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 806036e48abeda..1c24c844e8d178 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -565,12 +565,6 @@ int sd_zbc_revalidate_zones(struct scsi_disk *sdkp)
sdkp->zone_info.zone_blocks = zone_blocks;
sdkp->zone_info.nr_zones = nr_zones;
 
-   blk_queue_chunk_sectors(q,
-   logical_to_sectors(sdkp->device, zone_blocks));
-
-   /* Enable block layer zone append emulation */
-   blk_queue_max_zone_append_sectors(q, 0);
-
flags = memalloc_noio_save();
ret = blk_revalidate_disk_zones(disk);
memalloc_noio_restore(flags);
@@ -625,6 +619,10 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, u8 
buf[SD_BUF_SIZE])
if (ret != 0)
goto err;
 
+   nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
+   sdkp->early_zone_info.nr_zones = nr_zones;
+   sdkp->early_zone_info.zone_blocks = zone_blocks;
+
/* The drive satisfies the kernel restrictions: set it up */
blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
if (sdkp->zones_max_open == U32_MAX)
@@ -632,10 +630,10 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, u8 
buf[SD_BUF_SIZE])
else
disk_set_max_open_zones(disk, sdkp->zones_max_open);
disk_set_max_active_zones(disk, 0);
-   nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
-
-   sdkp->early_zone_info.nr_zones = nr_zones;
-   sdkp->early_zone_info.zone_blocks = zone_blocks;
+   blk_queue_chunk_sectors(q,
+   logical_to_sectors(sdkp->device, zone_blocks));
+   /* Enable block layer zone append emulation */
+   blk_queue_max_zone_append_sectors(q, 0);
 
return 0;
 
-- 
2.43.0

[PATCH 05/14] sd: simplify the ZBC case in provisioning_mode_store

2024-05-31 Thread Christoph Hellwig

Don't reset the discard settings to no-op over and over when a user
writes to the provisioning attribute as that is already the default
mode for ZBC devices.  In hindsight we should have made writing to
the attribute fail for ZBC devices, but the code has probably been
around for far too long to change this now.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sd.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3dff9150ce11e2..83aa17fea39d39 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -461,14 +461,13 @@ provisioning_mode_store(struct device *dev, struct 
device_attribute *attr,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
 
-   if (sd_is_zoned(sdkp)) {
-   sd_config_discard(sdkp, SD_LBP_DISABLE);
-   return count;
-   }
-
if (sdp->type != TYPE_DISK)
return -EINVAL;
 
+   /* ignore the provisioning mode for ZBC devices */
+   if (sd_is_zoned(sdkp))
+   return count;
+
mode = sysfs_match_string(lbp_mode, buf);
if (mode < 0)
return -EINVAL;
-- 
2.43.0

[PATCH 06/14] sd: add a sd_disable_discard helper

2024-05-31 Thread Christoph Hellwig

Add helper to disable discard when it is not supported and use it
instead of sd_config_discard in the I/O completion handler.  This avoids
touching more fields than required in the I/O completion handler and
prepares for converting sd to use the atomic queue limits API.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 drivers/scsi/sd.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 83aa17fea39d39..f07d90474e682b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -821,6 +821,12 @@ static unsigned char sd_setup_protect_cmnd(struct 
scsi_cmnd *scmd,
return protect;
 }
 
+static void sd_disable_discard(struct scsi_disk *sdkp)
+{
+   sdkp->provisioning_mode = SD_LBP_DISABLE;
+   blk_queue_max_discard_sectors(sdkp->disk->queue, 0);
+}
+
 static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 {
struct request_queue *q = sdkp->disk->queue;
@@ -2245,12 +2251,12 @@ static int sd_done(struct scsi_cmnd *SCpnt)
case 0x24:  /* INVALID FIELD IN CDB */
switch (SCpnt->cmnd[0]) {
case UNMAP:
-   sd_config_discard(sdkp, SD_LBP_DISABLE);
+   sd_disable_discard(sdkp);
break;
case WRITE_SAME_16:
case WRITE_SAME:
if (SCpnt->cmnd[1] & 8) { /* UNMAP */
-   sd_config_discard(sdkp, SD_LBP_DISABLE);
+   sd_disable_discard(sdkp);
} else {
sdkp->device->no_write_same = 1;
sd_config_write_same(sdkp);
-- 
2.43.0

[PATCH 04/14] block: take io_opt and io_min into account for max_sectors

2024-05-31 Thread Christoph Hellwig

The soft max_sectors limit is normally capped by the hardware limits and
an arbitrary upper limit enforced by the kernel, but can be modified by
the user.  A few drivers want to increase this limit (nbd, rbd) or
adjust it up or down based on hardware capabilities (sd).

Change blk_validate_limits to default max_sectors to the optimal I/O
size, or upgrade it to the preferred minimal I/O size if that is
larger than the kernel default if no optimal I/O size is provided based
on the logic in the SD driver.

This keeps the existing kernel default for drivers that do not provide
an io_opt or very big io_min value, but picks a much more useful
default for those who provide these hints, and allows to remove the
hacks to set the user max_sectors limit in nbd, rbd and sd.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 block/blk-settings.c |  7 +++
 drivers/block/nbd.c  |  2 +-
 drivers/block/rbd.c  |  1 -
 drivers/scsi/sd.c| 29 +
 4 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index effeb9a639bb45..a49abdb3554834 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -153,6 +153,13 @@ static int blk_validate_limits(struct queue_limits *lim)
if (lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
return -EINVAL;
lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors);
+   } else if (lim->io_opt) {
+   lim->max_sectors =
+   min(max_hw_sectors, lim->io_opt >> SECTOR_SHIFT);
+   } else if (lim->io_min &&
+  lim->io_min > (BLK_DEF_MAX_SECTORS_CAP << SECTOR_SHIFT)) {
+   lim->max_sectors =
+   min(max_hw_sectors, lim->io_min >> SECTOR_SHIFT);
} else {
lim->max_sectors = min(max_hw_sectors, BLK_DEF_MAX_SECTORS_CAP);
}
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 22a79a62cc4eab..ad887d614d5b3f 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1808,7 +1808,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned 
int refs)
 {
struct queue_limits lim = {
.max_hw_sectors = 65536,
-   .max_user_sectors   = 256,
+   .io_opt = 256 << SECTOR_SHIFT,
.max_segments   = USHRT_MAX,
.max_segment_size   = UINT_MAX,
};
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 46dc487ccc17eb..22ad704f81d8b9 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4954,7 +4954,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
rbd_dev->layout.object_size * rbd_dev->layout.stripe_count;
struct queue_limits lim = {
.max_hw_sectors = objset_bytes >> SECTOR_SHIFT,
-   .max_user_sectors   = objset_bytes >> SECTOR_SHIFT,
.io_opt = objset_bytes,
.io_min = rbd_dev->opts->alloc_size,
.max_segments   = USHRT_MAX,
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f6c822c9cbd2d3..3dff9150ce11e2 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3593,7 +3593,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
struct request_queue *q = sdkp->disk->queue;
sector_t old_capacity = sdkp->capacity;
unsigned char *buffer;
-   unsigned int dev_max, rw_max;
+   unsigned int dev_max;
 
SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
  "sd_revalidate_disk\n"));
@@ -3675,34 +3675,15 @@ static int sd_revalidate_disk(struct gendisk *disk)
else
blk_queue_io_min(sdkp->disk->queue, 0);
 
-   if (sd_validate_opt_xfer_size(sdkp, dev_max)) {
-   q->limits.io_opt = logical_to_bytes(sdp, sdkp->opt_xfer_blocks);
-   rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
-   } else {
-   q->limits.io_opt = 0;
-   rw_max = min_not_zero(logical_to_sectors(sdp, dev_max),
- (sector_t)BLK_DEF_MAX_SECTORS_CAP);
-   }
-
/*
 * Limit default to SCSI host optimal sector limit if set. There may be
 * an impact on performance for when the size of a request exceeds this
 * host limit.
 */
-   rw_max = min_not_zero(rw_max, sdp->host->opt_sectors);
-
-   /* Do not exceed controller limit */
-   rw_max = min(rw_max, queue_max_hw_sectors(q));
-
-   /*
-* Only update max_sectors if previously unset or if the current value
-* exceeds the capabilities of the hardware.
-*/
-   if (sdkp->first_scan ||
-   q->limits.max_sectors > q->limits.max_dev_sectors ||
-   q->limits.max_sectors > q->limits.max_hw_sectors) {
-

[PATCH 02/14] ubd: untagle discard vs write zeroes not support handling

2024-05-31 Thread Christoph Hellwig

Discard and Write Zeroes are different operation and implemented
by different fallocate opcodes for ubd.  If one fails the other one
can work and vice versa.

Split the code to disable the operations in ubd_handler to only
disable the operation that actually failed.

Fixes: 50109b5a03b4 ("um: Add support for DISCARD in the UBD Driver")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
Reviewed-by: Damien Le Moal 
---
 arch/um/drivers/ubd_kern.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 0c9542d58c01b7..093c87879d08ba 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -449,10 +449,11 @@ static int bulk_req_safe_read(
 
 static void ubd_end_request(struct io_thread_req *io_req)
 {
-   if (io_req->error == BLK_STS_NOTSUPP &&
-   req_op(io_req->req) == REQ_OP_DISCARD) {
-   blk_queue_max_discard_sectors(io_req->req->q, 0);
-   blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
+   if (io_req->error == BLK_STS_NOTSUPP) {
+   if (req_op(io_req->req) == REQ_OP_DISCARD)
+   blk_queue_max_discard_sectors(io_req->req->q, 0);
+   else if (req_op(io_req->req) == REQ_OP_WRITE_ZEROES)
+   blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
}
blk_mq_end_request(io_req->req, io_req->error);
kfree(io_req);
-- 
2.43.0

convert the SCSI ULDs to the atomic queue limits API v2

2024-05-31 Thread Christoph Hellwig

Hi all,

this series converts the SCSI upper level drivers to the atomic queue
limits API.

The first patch is a bug fix for ubd that later patches depend on and
might be worth picking up for 6.10.

The second patch changes the max_sectors calculation to take the optimal
I/O size into account so that sd, nbd and rbd don't have to mess with
the user max_sector value.  I'd love to see a careful review from the
nbd and rbd maintainers for this one!

The following patches clean up a few lose ends in the sd driver, and
then convert sd and sr to the atomic queue limits API.  The final
patches remove the now unused block APIs, and convert a few to be
specific to their now more narrow use case.

The patches are against Jens' block-6.10 tree.  Due to the amount of
block layer changes in here, and other that will depend on it, it
would be good if this could eventually be merged through the block
tree, or at least a shared branch between the SCSI and block trees.

Changes since v1:
 - change the io_opt value for rbd
 - fix a left-over direct assignent to q->limits
 - add a new patch to refactor the ubd interrupt handler
 - use an else if to micro-optimize the ubd error handling
 - also remove disk_set_max_open_zones and disk_set_max_active_zones
 - use SECTOR_SHIFT in one more place
 - various spelling fixes
 - comment formating fix

Diffstat:
 arch/um/drivers/ubd_kern.c   |   50 +++--
 block/blk-settings.c |  238 +--
 drivers/block/nbd.c  |2 
 drivers/block/rbd.c  |3 
 drivers/block/xen-blkfront.c |4 
 drivers/scsi/sd.c|  222 
 drivers/scsi/sd.h|6 -
 drivers/scsi/sd_zbc.c|   27 ++--
 drivers/scsi/sr.c|   42 ---
 include/linux/blkdev.h   |   52 +++--
 10 files changed, 210 insertions(+), 436 deletions(-)

Re: [PATCH for-4.19 4/9] x86/irq: describe how the interrupt CPU movement works

2024-05-31 Thread Roger Pau Monné

On Fri, May 31, 2024 at 09:06:10AM +0200, Jan Beulich wrote:
> On 29.05.2024 17:28, Roger Pau Monné wrote:
> > On Wed, May 29, 2024 at 03:57:19PM +0200, Jan Beulich wrote:
> >> On 29.05.2024 11:01, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/include/asm/irq.h
> >>> +++ b/xen/arch/x86/include/asm/irq.h
> >>> @@ -28,6 +28,32 @@ typedef struct {
> >>>  
> >>>  struct irq_desc;
> >>>  
> >>> +/*
> >>> + * Xen logic for moving interrupts around CPUs allows manipulating 
> >>> interrupts
> >>> + * that target remote CPUs.  The logic to move an interrupt from CPU(s) 
> >>> is as
> >>> + * follows:
> >>> + *
> >>> + * 1. cpu_mask and vector is copied to old_cpu_mask and old_vector.
> >>> + * 2. New cpu_mask and vector are set, vector is setup at the new 
> >>> destination.
> >>> + * 3. move_in_progress is set.
> >>> + * 4. Interrupt source is updated to target new CPU and vector.
> >>> + * 5. Interrupts arriving at old_cpu_mask are processed normally.
> >>> + * 6. When an interrupt is delivered at the new destination (cpu_mask) 
> >>> as part
> >>> + *of acking the interrupt move_in_progress is cleared and 
> >>> move_cleanup_count
> >>
> >> Nit: A comma after "interrupt" may help reading.
> >>
> >>> + *is set to the weight of online CPUs in old_cpu_mask.
> >>> + *IRQ_MOVE_CLEANUP_VECTOR is sent to all CPUs in old_cpu_mask.
> >>
> >> These last two steps aren't precise enough, compared to what the code does.
> >> old_cpu_mask is first reduced to online CPUs therein. If the result is non-
> >> empty, what you describe is done. If, however, the result is empty, the
> >> vector is released right away (this code may be there just in case, but I
> >> think it shouldn't be omitted here).
> > 
> > I've left that out because I got the impression it made the text more
> > complex to follow (with the extra branch) for no real benefit, but I'm
> > happy to attempt to add it.
> 
> Why "no real benefit"? Isn't the goal to accurately describe what code does
> (in various places)? If the result isn't an accurate description in one
> specific regard, how reliable would the rest be from a reader's perspective?

FWIW, it seemed to me the reduction of old_cpu_mask was (kind of) a
shortcut to what the normal path does, by releasing the vector early
if there are no online CPUs in old_cpu_mask.

Now that you made me look into it, I think after this series the
old_cpu_mask should never contain offline CPUs, as fixup_irqs() will
take care of removing offliend CPUs from old_cpu_mask, and freeing the
vector if the set becomes empty.

I will expand the comment to mention this case, and consider adjusting
it if this series get merged.

> >>> + * 7. When receiving IRQ_MOVE_CLEANUP_VECTOR CPUs in old_cpu_mask clean 
> >>> the
> >>> + *vector entry and decrease the count in move_cleanup_count.  The 
> >>> CPU that
> >>> + *sets move_cleanup_count to 0 releases the vector.
> >>> + *
> >>> + * Note that when interrupt movement (either move_in_progress or
> >>> + * move_cleanup_count set) is in progress it's not possible to move the
> >>> + * interrupt to yet a different CPU.
> >>> + *
> >>> + * By keeping the vector in the old CPU(s) configured until the 
> >>> interrupt is
> >>> + * acked on the new destination Xen allows draining any pending 
> >>> interrupts at
> >>> + * the old destinations.
> >>> + */
> >>>  struct arch_irq_desc {
> >>>  s16 vector;  /* vector itself is only 8 bits, */
> >>>  s16 old_vector;  /* but we use -1 for unassigned  */
> >>
> >> I take it that it is not a goal to (also) describe under what conditions
> >> an IRQ move may actually be initiated (IRQ_MOVE_PENDING)? I ask not the
> >> least because the 2nd from last paragraph lightly touches that area.
> > 
> > Right, I was mostly focused on moves (forcefully) initiated from
> > fixup_irqs(), which is different from the opportunistic affinity
> > changes signaled by IRQ_MOVE_PENDING.
> > 
> > Not sure whether I want to mention this ahead of the list in a
> > paragraph, or just add it as a step.  Do you have any preference?
> 
> I think ahead might be better. But I also won't insist on it being added.
> Just if you don't, perhaps mention in the description that leaving that
> out is intentional.

No, I'm fine with adding it.

Thanks, Roger.

Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway

2024-05-31 Thread Roger Pau Monné

On Fri, May 31, 2024 at 09:02:20AM +0200, Jan Beulich wrote:
> On 29.05.2024 18:14, Roger Pau Monné wrote:
> > On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote:
> >> On 29.05.2024 17:03, Roger Pau Monné wrote:
> >>> On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote:
>  On 29.05.2024 11:01, Roger Pau Monne wrote:
> > Due to the current rwlock logic, if the CPU calling get_cpu_maps() does 
> > so from
> > a cpu_hotplug_{begin,done}() region the function will still return 
> > success,
> > because a CPU taking the rwlock in read mode after having taken it in 
> > write
> > mode is allowed.  Such behavior however defeats the purpose of 
> > get_cpu_maps(),
> > as it should always return false when called with a CPU hot{,un}plug 
> > operation
> > is in progress.
> 
>  I'm not sure I can agree with this. The CPU doing said operation ought 
>  to be
>  aware of what it is itself doing. And all other CPUs will get back false 
>  from
>  get_cpu_maps().
> >>>
> >>> Well, the CPU is aware in the context of cpu_{up,down}(), but not in
> >>> the interrupts that might be handled while that operation is in
> >>> progress, see below for a concrete example.
> >>>
> >  Otherwise the logic in send_IPI_mask() for example is wrong,
> > as it could decide to use the shorthand even when a CPU operation is in
> > progress.
> 
>  It's also not becoming clear what's wrong there: As long as a CPU isn't
>  offline enough to not be in cpu_online_map anymore, it may well need to 
>  still
>  be the target of IPIs, and targeting it with a shorthand then is still 
>  fine.
> >>>
> >>> The issue is in the online path: there's a window where the CPU is
> >>> online (and the lapic active), but cpu_online_map hasn't been updated
> >>> yet.  A specific example would be time_calibration() being executed on
> >>> the CPU that is running cpu_up().  That could result in a shorthand
> >>> IPI being used, but the mask in r.cpu_calibration_map not containing
> >>> the CPU that's being brought up online because it's not yet added to
> >>> cpu_online_map.  Then the number of CPUs actually running
> >>> time_calibration_rendezvous_fn won't match the weight of the cpumask
> >>> in r.cpu_calibration_map.
> >>
> >> I see, but maybe only partly. Prior to the CPU having its bit set in
> >> cpu_online_map, can it really take interrupts already? Shouldn't it be
> >> running with IRQs off until later, thus preventing it from making it
> >> into the rendezvous function in the first place? But yes, I can see
> >> how the IRQ (IPI) then being delivered later (once IRQs are enabled)
> >> might cause problems, too.
> > 
> > The interrupt will get set in IRR and handled when interrupts are
> > enabled.
> > 
> >>
> >> Plus, with how the rendezvous function is invoked (via
> >> on_selected_cpus() with the mask copied from cpu_online_map), the
> >> first check in smp_call_function_interrupt() ought to prevent the
> >> function from being called on the CPU being onlined. A problem would
> >> arise though if the IPI arrived later and call_data was already
> >> (partly or fully) overwritten with the next request.
> > 
> > Yeah, there's a small window where the fields in call_data are out of
> > sync.
> > 
>  In any event this would again affect only the CPU leading the CPU 
>  operation,
>  which should clearly know at which point(s) it is okay to send IPIs. Are 
>  we
>  actually sending any IPIs from within CPU-online or CPU-offline paths?
> >>>
> >>> Yes, I've seen the time rendezvous happening while in the middle of a
> >>> hotplug operation, and the CPU coordinating the rendezvous being the
> >>> one doing the CPU hotplug operation, so get_cpu_maps() returning true.
> >>
> >> Right, yet together with ...
> >>
>  Together with the earlier paragraph the critical window would be between 
>  the
>  CPU being taken off of cpu_online_map and the CPU actually going "dead" 
>  (i.e.
>  on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And 
>  even
>  then the question would be what bad, if any, would happen to that CPU if 
>  an
>  IPI was still targeted at it by way of using the shorthand. I'm pretty 
>  sure
>  it runs with IRQs off at that time, so no ordinary IRQ could be 
>  delivered.
> 
> > Adjust the logic in get_cpu_maps() to return false when the CPUs lock is
> > already hold in write mode by the current CPU, as read_trylock() would
> > otherwise return true.
> >
> > Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already 
> > locked in write mode')
> 
>  I'm puzzled by this as well: Prior to that and the change referenced by 
>  its
>  Fixes: tag, recursive spin locks were used. For the purposes here that's 
>  the
>  same as permitting read locking even when the write lock is already

Re: [PATCH for-4.19 0/3] CI: Misc improvements

2024-05-31 Thread Oleksii K.

On Wed, 2024-05-29 at 15:19 +0100, Andrew Cooper wrote:
> All found while making extensive use of Gitlab CI for the bitops boot
> testing.
> 
> For 4.19.  It's all very low risk, and improves the
> utility/useability of our
> testing.
Release-Acked-by: Oleksii Kurochko 

~ Oleksii
> 
> Andrew Cooper (3):
>   CI: Remove CI_COMMIT_REF_PROTECTED requirement for HW jobs
>   CI: Use a debug build of Xen for the Xilinx HW tests
>   CI: Improve serial handling in qemu-smoke-ppc64le.sh
> 
>  automation/gitlab-ci/test.yaml   |  8 
>  automation/scripts/qemu-smoke-ppc64le.sh | 13 +++--
>  2 files changed, 11 insertions(+), 10 deletions(-)
>

Re: [PATCH for-4.19 4/9] x86/irq: describe how the interrupt CPU movement works

2024-05-31 Thread Jan Beulich

On 29.05.2024 17:28, Roger Pau Monné wrote:
> On Wed, May 29, 2024 at 03:57:19PM +0200, Jan Beulich wrote:
>> On 29.05.2024 11:01, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/include/asm/irq.h
>>> +++ b/xen/arch/x86/include/asm/irq.h
>>> @@ -28,6 +28,32 @@ typedef struct {
>>>  
>>>  struct irq_desc;
>>>  
>>> +/*
>>> + * Xen logic for moving interrupts around CPUs allows manipulating 
>>> interrupts
>>> + * that target remote CPUs.  The logic to move an interrupt from CPU(s) is 
>>> as
>>> + * follows:
>>> + *
>>> + * 1. cpu_mask and vector is copied to old_cpu_mask and old_vector.
>>> + * 2. New cpu_mask and vector are set, vector is setup at the new 
>>> destination.
>>> + * 3. move_in_progress is set.
>>> + * 4. Interrupt source is updated to target new CPU and vector.
>>> + * 5. Interrupts arriving at old_cpu_mask are processed normally.
>>> + * 6. When an interrupt is delivered at the new destination (cpu_mask) as 
>>> part
>>> + *of acking the interrupt move_in_progress is cleared and 
>>> move_cleanup_count
>>
>> Nit: A comma after "interrupt" may help reading.
>>
>>> + *is set to the weight of online CPUs in old_cpu_mask.
>>> + *IRQ_MOVE_CLEANUP_VECTOR is sent to all CPUs in old_cpu_mask.
>>
>> These last two steps aren't precise enough, compared to what the code does.
>> old_cpu_mask is first reduced to online CPUs therein. If the result is non-
>> empty, what you describe is done. If, however, the result is empty, the
>> vector is released right away (this code may be there just in case, but I
>> think it shouldn't be omitted here).
> 
> I've left that out because I got the impression it made the text more
> complex to follow (with the extra branch) for no real benefit, but I'm
> happy to attempt to add it.

Why "no real benefit"? Isn't the goal to accurately describe what code does
(in various places)? If the result isn't an accurate description in one
specific regard, how reliable would the rest be from a reader's perspective?

>>> + * 7. When receiving IRQ_MOVE_CLEANUP_VECTOR CPUs in old_cpu_mask clean the
>>> + *vector entry and decrease the count in move_cleanup_count.  The CPU 
>>> that
>>> + *sets move_cleanup_count to 0 releases the vector.
>>> + *
>>> + * Note that when interrupt movement (either move_in_progress or
>>> + * move_cleanup_count set) is in progress it's not possible to move the
>>> + * interrupt to yet a different CPU.
>>> + *
>>> + * By keeping the vector in the old CPU(s) configured until the interrupt 
>>> is
>>> + * acked on the new destination Xen allows draining any pending interrupts 
>>> at
>>> + * the old destinations.
>>> + */
>>>  struct arch_irq_desc {
>>>  s16 vector;  /* vector itself is only 8 bits, */
>>>  s16 old_vector;  /* but we use -1 for unassigned  */
>>
>> I take it that it is not a goal to (also) describe under what conditions
>> an IRQ move may actually be initiated (IRQ_MOVE_PENDING)? I ask not the
>> least because the 2nd from last paragraph lightly touches that area.
> 
> Right, I was mostly focused on moves (forcefully) initiated from
> fixup_irqs(), which is different from the opportunistic affinity
> changes signaled by IRQ_MOVE_PENDING.
> 
> Not sure whether I want to mention this ahead of the list in a
> paragraph, or just add it as a step.  Do you have any preference?

I think ahead might be better. But I also won't insist on it being added.
Just if you don't, perhaps mention in the description that leaving that
out is intentional.

Jan

Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway

2024-05-31 Thread Jan Beulich

On 29.05.2024 18:14, Roger Pau Monné wrote:
> On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote:
>> On 29.05.2024 17:03, Roger Pau Monné wrote:
>>> On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote:
 On 29.05.2024 11:01, Roger Pau Monne wrote:
> Due to the current rwlock logic, if the CPU calling get_cpu_maps() does 
> so from
> a cpu_hotplug_{begin,done}() region the function will still return 
> success,
> because a CPU taking the rwlock in read mode after having taken it in 
> write
> mode is allowed.  Such behavior however defeats the purpose of 
> get_cpu_maps(),
> as it should always return false when called with a CPU hot{,un}plug 
> operation
> is in progress.

 I'm not sure I can agree with this. The CPU doing said operation ought to 
 be
 aware of what it is itself doing. And all other CPUs will get back false 
 from
 get_cpu_maps().
>>>
>>> Well, the CPU is aware in the context of cpu_{up,down}(), but not in
>>> the interrupts that might be handled while that operation is in
>>> progress, see below for a concrete example.
>>>
>  Otherwise the logic in send_IPI_mask() for example is wrong,
> as it could decide to use the shorthand even when a CPU operation is in
> progress.

 It's also not becoming clear what's wrong there: As long as a CPU isn't
 offline enough to not be in cpu_online_map anymore, it may well need to 
 still
 be the target of IPIs, and targeting it with a shorthand then is still 
 fine.
>>>
>>> The issue is in the online path: there's a window where the CPU is
>>> online (and the lapic active), but cpu_online_map hasn't been updated
>>> yet.  A specific example would be time_calibration() being executed on
>>> the CPU that is running cpu_up().  That could result in a shorthand
>>> IPI being used, but the mask in r.cpu_calibration_map not containing
>>> the CPU that's being brought up online because it's not yet added to
>>> cpu_online_map.  Then the number of CPUs actually running
>>> time_calibration_rendezvous_fn won't match the weight of the cpumask
>>> in r.cpu_calibration_map.
>>
>> I see, but maybe only partly. Prior to the CPU having its bit set in
>> cpu_online_map, can it really take interrupts already? Shouldn't it be
>> running with IRQs off until later, thus preventing it from making it
>> into the rendezvous function in the first place? But yes, I can see
>> how the IRQ (IPI) then being delivered later (once IRQs are enabled)
>> might cause problems, too.
> 
> The interrupt will get set in IRR and handled when interrupts are
> enabled.
> 
>>
>> Plus, with how the rendezvous function is invoked (via
>> on_selected_cpus() with the mask copied from cpu_online_map), the
>> first check in smp_call_function_interrupt() ought to prevent the
>> function from being called on the CPU being onlined. A problem would
>> arise though if the IPI arrived later and call_data was already
>> (partly or fully) overwritten with the next request.
> 
> Yeah, there's a small window where the fields in call_data are out of
> sync.
> 
 In any event this would again affect only the CPU leading the CPU 
 operation,
 which should clearly know at which point(s) it is okay to send IPIs. Are we
 actually sending any IPIs from within CPU-online or CPU-offline paths?
>>>
>>> Yes, I've seen the time rendezvous happening while in the middle of a
>>> hotplug operation, and the CPU coordinating the rendezvous being the
>>> one doing the CPU hotplug operation, so get_cpu_maps() returning true.
>>
>> Right, yet together with ...
>>
 Together with the earlier paragraph the critical window would be between 
 the
 CPU being taken off of cpu_online_map and the CPU actually going "dead" 
 (i.e.
 on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And even
 then the question would be what bad, if any, would happen to that CPU if an
 IPI was still targeted at it by way of using the shorthand. I'm pretty sure
 it runs with IRQs off at that time, so no ordinary IRQ could be delivered.

> Adjust the logic in get_cpu_maps() to return false when the CPUs lock is
> already hold in write mode by the current CPU, as read_trylock() would
> otherwise return true.
>
> Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already 
> locked in write mode')

 I'm puzzled by this as well: Prior to that and the change referenced by its
 Fixes: tag, recursive spin locks were used. For the purposes here that's 
 the
 same as permitting read locking even when the write lock is already held by
 the local CPU.
>>>
>>> I see, so the Fixes should be:
>>>
>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Instead, which is the commit that started using get_cpu_maps() in
>>> send_IPI_mask().
>>
>> ... this I then wonder whether it's really only the condition in
>>

Re: [PATCH v2 06/13] xen/bitops: Implement ffs() in common logic

2024-05-31 Thread Nicola Vetrini


On 2024-05-31 03:14, Stefano Stabellini wrote:

On Fri, 24 May 2024, Andrew Cooper wrote:
Perform constant-folding unconditionally, rather than having it 
implemented

inconsistency between architectures.

Confirm the expected behaviour with compile time and boot time tests.

For non-constant inputs, use arch_ffs() if provided but fall back to
generic_ffsl() if not.  In particular, RISC-V doesn't have a builtin 
that

works in all configurations.

For x86, rename ffs() to arch_ffs() and adjust the prototype.

For PPC, __builtin_ctz() is 1/3 of the size of size of the transform 
to
generic_fls().  Drop the definition entirely.  ARM too benefits in the 
general

case by using __builtin_ctz(), but less dramatically because it using
optimised asm().

Signed-off-by: Andrew Cooper 


This patch made me realize that we should add __builtin_ctz,
__builtin_constant_p and always_inline to
docs/misra/C-language-toolchain.rst as they don't seem to be currently
documented and they are not part of the C standard

Patch welcome :-)



I can send a patch for the builtins. I think that for attributes it was 
decided to document the use of the __attribute__ token, rather than 
listing all the attributes used by Xen



Reviewed-by: Stefano Stabellini 



---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * Fall back to generic, not builtin.
 * Extend the testing with multi-bit values.
 * Use always_inline for x86
 * Defer x86 optimisation to a later change
---
 xen/arch/arm/include/asm/bitops.h |  2 +-
 xen/arch/ppc/include/asm/bitops.h |  2 +-
 xen/arch/x86/include/asm/bitops.h |  3 ++-
 xen/common/Makefile   |  1 +
 xen/common/bitops.c   | 19 +++
 xen/include/xen/bitops.h  | 17 +
 6 files changed, 41 insertions(+), 3 deletions(-)
 create mode 100644 xen/common/bitops.c

diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h

index ec1cf7b9b323..a88ec2612e16 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -157,7 +157,7 @@ static inline int fls(unsigned int x)
 }


-#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
+#define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
 #define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); 
})


 /**
diff --git a/xen/arch/ppc/include/asm/bitops.h 
b/xen/arch/ppc/include/asm/bitops.h

index ab692d01717b..5c36a6cc0ce3 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -173,7 +173,7 @@ static inline int __test_and_clear_bit(int nr, 
volatile void *addr)


 #define flsl(x) generic_flsl(x)
 #define fls(x) generic_flsl(x)
-#define ffs(x) ({ unsigned int t_ = (x); fls(t_ & -t_); })
+#define arch_ffs(x)  ((x) ? 1 + __builtin_ctz(x) : 0)
 #define ffsl(x) ({ unsigned long t_ = (x); flsl(t_ & -t_); })

 /**
diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h

index 5a71afbc89d5..122767fc0d10 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -430,7 +430,7 @@ static inline int ffsl(unsigned long x)
 return (int)r+1;
 }

-static inline int ffs(unsigned int x)
+static always_inline unsigned int arch_ffs(unsigned int x)
 {
 int r;

@@ -440,6 +440,7 @@ static inline int ffs(unsigned int x)
   "1:" : "=r" (r) : "rm" (x));
 return r + 1;
 }
+#define arch_ffs arch_ffs

 /**
  * fls - find last bit set
diff --git a/xen/common/Makefile b/xen/common/Makefile
index d512cad5243f..21a4fb4c7166 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_ARGO) += argo.o
 obj-y += bitmap.o
+obj-bin-$(CONFIG_DEBUG) += bitops.init.o
 obj-$(CONFIG_GENERIC_BUG_FRAME) += bug.o
 obj-$(CONFIG_HYPFS_CONFIG) += config_data.o
 obj-$(CONFIG_CORE_PARKING) += core_parking.o
diff --git a/xen/common/bitops.c b/xen/common/bitops.c
new file mode 100644
index ..8c161b8ea7fa
--- /dev/null
+++ b/xen/common/bitops.c
@@ -0,0 +1,19 @@
+#include 
+#include 
+#include 
+
+static void __init test_ffs(void)
+{
+/* unsigned int ffs(unsigned int) */
+CHECK(ffs, 0, 0);
+CHECK(ffs, 1, 1);
+CHECK(ffs, 3, 1);
+CHECK(ffs, 7, 1);
+CHECK(ffs, 6, 2);
+CHECK(ffs, 0x8000U, 32);
+}
+
+static void __init __constructor test_bitops(void)
+{
+test_ffs();
+}
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index cd405df96180..f7e90a2893a5 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -31,6 +31,23 @@ unsigned int __pure generic_flsl(unsigned long x);

 #include 

+/*
+ * Find First/Last Set bit (all forms).
+ *
+ * Bits are labelled from 1.  Returns 0 if given 0.
+ */
+static always_inline __pure unsigned int

Re: [PATCH 02/12] block: take io_opt and io_min into account for max_sectors

2024-05-31 Thread Christoph Hellwig

On Fri, May 31, 2024 at 08:48:12AM +0200, Ilya Dryomov wrote:
> We should revert io_opt from opts->alloc_size to objset_bytes (I think
> it's what you meant to say but typoed).

Yes, sorry.

> How do you want to handle it?  I can put together a patch, send it to
> ceph-devel and it will be picked by linux-next sometime next week.  Then
> this patch would grow a contextual conflict and the description would
> need to be updated to not mention a behavior change for rbd anymore.

If that works for you I'd much prefer to include it with this series.
I'd be happy to write it myself.

Re: [PATCH 02/12] block: take io_opt and io_min into account for max_sectors

2024-05-31 Thread Ilya Dryomov

On Fri, May 31, 2024 at 7:54 AM Christoph Hellwig  wrote:
>
> On Thu, May 30, 2024 at 09:48:06PM +0200, Ilya Dryomov wrote:
> > For rbd, this change effectively lowers max_sectors from 4M to 64K or
> > less and that is definitely not desirable.  From previous interactions
> > with users we want max_sectors to match max_hw_sectors -- this has come
> > up a quite a few times over the years.  Some people just aren't aware
> > of the soft cap and the fact that it's adjustable and get frustrated
> > over the time poured into debugging their iostat numbers for workloads
> > that can send object (set) size I/Os.
> >
> > Looking at the git history, we lowered io_opt from objset_bytes to
> > opts->alloc_size in commit [1], but I guess io_opt was lowered just
> > along for the ride.  What that commit was concerned with is really
> > discard_granularity and to a smaller extent io_min.
> >
> > How much difference does io_opt make in the real world?  If what rbd
> > does stands in the way of a tree-wide cleanup, I would much rather bump
> > io_opt back to objset_bytes (i.e. what max_user_sectors is currently
> > set to).
>
> The only existing in-kernel usage is to set the readahead size.
> Based on your comments I seems like we should revert io_opt to
> objset to ->alloc_size in a prep patch?

We should revert io_opt from opts->alloc_size to objset_bytes (I think
it's what you meant to say but typoed).

How do you want to handle it?  I can put together a patch, send it to
ceph-devel and it will be picked by linux-next sometime next week.  Then
this patch would grow a contextual conflict and the description would
need to be updated to not mention a behavior change for rbd anymore.

Thanks,

Ilya

Re: [PATCH 1/2] arch/irq: Make irq_ack_none() mandatory

2024-05-31 Thread Jan Beulich

On 30.05.2024 20:40, Andrew Cooper wrote:
> Any non-stub implementation of these is going to have to do something here.

For whatever definition of "something", seeing ...

> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -31,12 +31,12 @@ struct irq_guest
>  unsigned int virq;
>  };
>  
> -static void ack_none(struct irq_desc *irq)
> +void irq_ack_none(struct irq_desc *irq)
>  {
>  printk("unexpected IRQ trap at irq %02x\n", irq->irq);
>  }

... this, which - perhaps apart from the word "trap" - is entirely Arm-
independent, and could hence quite well live in a common code fallback
implementation. Nevertheless with patch 2 clearly being an improvement,
both patches:
Reviewed-by: Jan Beulich 

Jan

Re: [PATCH v12 5/8] xen/riscv: add minimal stuff to mm.h to build full Xen

2024-05-31 Thread Jan Beulich

On 30.05.2024 19:23, Andrew Cooper wrote:
> On 29/05/2024 8:55 pm, Oleksii Kurochko wrote:
>> Signed-off-by: Oleksii Kurochko 
>> Acked-by: Jan Beulich 
> 
> This patch looks like it can go in independently?  Or does it depend on
> having bitops.h working in practice?

The latter, iirc. In fact I had already tried at least twice to pull this ahead.

Jan

Re: [PATCH v12 8/8] xen/README: add compiler and binutils versions for RISC-V64

2024-05-31 Thread Jan Beulich

On 30.05.2024 21:52, Andrew Cooper wrote:
> On 29/05/2024 8:55 pm, Oleksii Kurochko wrote:
>> diff --git a/README b/README
>> index c8a108449e..30da5ff9c0 100644
>> --- a/README
>> +++ b/README
>> @@ -48,6 +48,10 @@ provided by your OS distributor:
>>- For ARM 64-bit:
>>  - GCC 5.1 or later
>>  - GNU Binutils 2.24 or later
>> +  - For RISC-V 64-bit:
>> +- GCC 12.2 or later
>> +- GNU Binutils 2.39 or later
> 
> I would like to petition for GCC 10 and Binutils 2.35.
> 
> These are the versions provided as cross-compilers by Debian, and
> therefore are the versions I would prefer to do smoke testing with.

See why I asked to amend the specified versions by a softening sentence that
you (only now) said you dislike? The "this is what we use in CI" makes it a
very random choice, entirely unrelated to the compiler's abilities.

Jan

76 matches

Mail list logo