date:20230727

Re: [PATCH] ALSA: xen-front: refactor deprecated strncpy

2023-07-27 Thread Elliott Mitchell

On Thu, Jul 27, 2023 at 09:53:24PM +, Justin Stitt wrote:
> Technically, my patch yields subtly different behavior. The original
> implementation with `strncpy` would fill the entire destination buffer
> with null bytes [3] while `strscpy` will leave the junk, uninitialized
> bytes trailing after the _mandatory_ NUL-termination. So, if somehow
> `pcm->name` or `card->driver/shortname/longname` require this
> NUL-padding behavior then `strscpy_pad` should be used. My
> interpretation, though, is that the aforementioned fields are just fine
> as NUL-terminated strings. Please correct my assumptions if needed and
> I'll send in a v2.

"uninitialized bytes" => "leak of sensitive information" => "security hole"

One hopes the unitialized bytes don't contain sensitive information, but
that is the start of the chain.  One can hope the VM on the other end is
friendly, but that isn't something to rely on.

I'm not in charge of any of the appropriate subsystems, I just happened
to randomly look at this as message on a mailing list I'm on.  Could be
the maintainers will find this acceptable.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

[PATCH] ALSA: xen-front: refactor deprecated strncpy

2023-07-27 Thread Justin Stitt

`strncpy` is deprecated for use on NUL-terminated destination strings [1].

A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on its destination buffer argument which is
_not_ always the case for `strncpy`!

It should be noted that, in this case, the destination buffer has a
length strictly greater than the source string. Moreover, the source
string is NUL-terminated (and so is the destination) which means there
was no real bug happening here. Nonetheless, this patch would get us one
step closer to eliminating the `strncpy` API in the kernel, as its use
is too ambiguous. We need to favor less ambiguous replacements such as:
strscpy, strscpy_pad, strtomem and strtomem_pad (amongst others).

Technically, my patch yields subtly different behavior. The original
implementation with `strncpy` would fill the entire destination buffer
with null bytes [3] while `strscpy` will leave the junk, uninitialized
bytes trailing after the _mandatory_ NUL-termination. So, if somehow
`pcm->name` or `card->driver/shortname/longname` require this
NUL-padding behavior then `strscpy_pad` should be used. My
interpretation, though, is that the aforementioned fields are just fine
as NUL-terminated strings. Please correct my assumptions if needed and
I'll send in a v2.

[1]: 
www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
[2]: manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
[3]: https://linux.die.net/man/3/strncpy

Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt 
---
 sound/xen/xen_snd_front_alsa.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/xen/xen_snd_front_alsa.c b/sound/xen/xen_snd_front_alsa.c
index db917453a473..7a3dfce97c15 100644
--- a/sound/xen/xen_snd_front_alsa.c
+++ b/sound/xen/xen_snd_front_alsa.c
@@ -783,7 +783,7 @@ static int new_pcm_instance(struct xen_snd_front_card_info 
*card_info,
pcm->info_flags = 0;
/* we want to handle all PCM operations in non-atomic context */
pcm->nonatomic = true;
-   strncpy(pcm->name, "Virtual card PCM", sizeof(pcm->name));
+   strscpy(pcm->name, "Virtual card PCM", sizeof(pcm->name));
 
if (instance_cfg->num_streams_pb)
snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK,
@@ -835,9 +835,9 @@ int xen_snd_front_alsa_init(struct xen_snd_front_info 
*front_info)
goto fail;
}
 
-   strncpy(card->driver, XENSND_DRIVER_NAME, sizeof(card->driver));
-   strncpy(card->shortname, cfg->name_short, sizeof(card->shortname));
-   strncpy(card->longname, cfg->name_long, sizeof(card->longname));
+   strscpy(card->driver, XENSND_DRIVER_NAME, sizeof(card->driver));
+   strscpy(card->shortname, cfg->name_short, sizeof(card->shortname));
+   strscpy(card->longname, cfg->name_long, sizeof(card->longname));
 
ret = snd_card_register(card);
if (ret < 0)

---
base-commit: 57012c57536f8814dec92e74197ee96c3498d24e
change-id: 20230727-sound-xen-398c9927b3d8

Best regards,
--
Justin Stitt

[xen-unstable-smoke test] 182050: tolerable all pass - PUSHED

2023-07-27 Thread osstest service owner

flight 182050 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182050/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  15d327e3d6924aabf991246353068eca07654f0e
baseline version:
 xen  3e55f826f4dc0dfd8a67525181c68189243539cc

Last test of basis   182049  2023-07-27 22:04:04 Z0 days
Testing same since   182050  2023-07-28 01:02:02 Z0 days1 attempts


People who touched revisions under test:
  Bertrand Marquis 
  Federico Serafini 
  Julien Grall 
  Luca Fancellu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   3e55f826f4..15d327e3d6  15d327e3d6924aabf991246353068eca07654f0e -> smoke

RE: [PATCH v3 3/4] iommu/vtd: rename io_apic_read_remap_rte() local variable

2023-07-27 Thread Tian, Kevin

> From: Roger Pau Monne 
> Sent: Wednesday, July 26, 2023 8:55 PM
> 
> Preparatory change to unify the IO-APIC pin variable name between
> io_apic_read_remap_rte() and amd_iommu_ioapic_update_ire(), so that
> the local variable can be made a function parameter with the same name
> across vendors.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Kevin Tian

[xen-unstable-smoke test] 182049: tolerable all pass - PUSHED

2023-07-27 Thread osstest service owner

flight 182049 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182049/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3e55f826f4dc0dfd8a67525181c68189243539cc
baseline version:
 xen  0400946d532ec41bc5d0bedd3e9ef036308ce623

Last test of basis   182043  2023-07-27 13:00:27 Z0 days
Testing same since   182049  2023-07-27 22:04:04 Z0 days1 attempts


People who touched revisions under test:
  Juergen Gross 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   0400946d53..3e55f826f4  3e55f826f4dc0dfd8a67525181c68189243539cc -> smoke

Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-27 Thread Volodymyr Babchuk


Hi Roger,

Roger Pau Monné  writes:

> On Thu, Jul 27, 2023 at 12:56:54AM +, Volodymyr Babchuk wrote:
>> Hi Roger,
>> 
>> Roger Pau Monné  writes:
>> 
>> > On Wed, Jul 26, 2023 at 01:17:58AM +, Volodymyr Babchuk wrote:
>> >> 
>> >> Hi Roger,
>> >> 
>> >> Roger Pau Monné  writes:
>> >> 
>> >> > On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
>> >> >> From: Oleksandr Andrushchenko 
>> >> >> @@ -498,6 +537,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
>> >> >> unsigned int size,
>> >> >>  ASSERT(data_offset < size);
>> >> >>  }
>> >> >>  spin_unlock(>vpci->lock);
>> >> >> +unlock_locks(d);
>> >> >
>> >> > There's one issue here, some handlers will cal pcidevs_lock(), which
>> >> > will result in a lock over inversion, as in the previous patch we
>> >> > agreed that the locking order was pcidevs_lock first, d->pci_lock
>> >> > after.
>> >> >
>> >> > For example the MSI control_write() handler will call
>> >> > vpci_msi_arch_enable() which takes the pcidevs lock.  I think I will
>> >> > have to look into using a dedicated lock for MSI related handling, as
>> >> > that's the only place where I think we have this pattern of taking the
>> >> > pcidevs_lock after the d->pci_lock.
>> >> 
>> >> I'll mention this in the commit message. Is there something else that I
>> >> should do right now?
>> >
>> > Well, I don't think we want to commit this as-is with a known lock
>> > inversion.
>> >
>> > The functions that require the pcidevs lock are:
>> >
>> > pt_irq_{create,destroy}_bind()
>> > unmap_domain_pirq()
>> >
>> > AFAICT those functions require the lock in order to assert that the
>> > underlying device doesn't go away, as they do also use d->event_lock
>> > in order to get exclusive access to the data fields.  Please double
>> > check that I'm not mistaken.
>> 
>> You are right, all three function does not access any of PCI state
>> directly. However...
>> 
>> > If that's accurate you will have to check the call tree that spawns
>> > from those functions in order to modify the asserts to check for
>> > either the pcidevs or the per-domain pci_list lock being taken.
>> 
>> ... I checked calls for PT_IRQ_TYPE_MSI case, there is only call that
>> bothers me: hvm_pi_update_irte(), which calls IO-MMU code via
>> vmx_pi_update_irte():
>> 
>> amd_iommu_msi_msg_update_ire() or msi_msg_write_remap_rte().
>
> That path is only for VT-d, so strictly speaking you only need to worry
> about msi_msg_write_remap_rte().
>
> msi_msg_write_remap_rte() does take the IOMMU intremap lock.
>
> There are also existing callers of iommu_update_ire_from_msi() that
> call the functions without the pcidevs locked.  See
> hpet_msi_set_affinity() for example.

Thank you for clarifying this.

I have found another call path which causes troubles:
__pci_enable_msi[x] is called from pci_enable_msi() via vMSI, via
physdev_map_irq and also directly from ns16550 driver.

__pci_enable_msi[x] accesses pdev fields, mostly pdev->msix or
pdev->msi_list, so looks like we need pcidevs_lock(), as pdev fields are
not protected by d->pci_lock...

-- 
WBR, Volodymyr

Re: [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Damien Le Moal

On 7/28/23 07:59, Dave Chinner wrote:
> On Thu, Jul 27, 2023 at 07:20:46PM +0900, Damien Le Moal wrote:
>> On 7/27/23 17:55, Qi Zheng wrote:
>   goto err;
>   }
>   +    zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
> +    zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
> +    zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
> +    zmd->mblk_shrinker->private_data = zmd;
> +
> +    shrinker_register(zmd->mblk_shrinker);

 I fail to see how this new shrinker API is better... Why isn't there a
 shrinker_alloc_and_register() function ? That would avoid adding all this 
 code
 all over the place as the new API call would be very similar to the current
 shrinker_register() call with static allocation.
>>>
>>> In some registration scenarios, memory needs to be allocated in advance.
>>> So we continue to use the previous prealloc/register_prepared()
>>> algorithm. The shrinker_alloc_and_register() is just a helper function
>>> that combines the two, and this increases the number of APIs that
>>> shrinker exposes to the outside, so I choose not to add this helper.
>>
>> And that results in more code in many places instead of less code + a simple
>> inline helper in the shrinker header file...
> 
> It's not just a "simple helper" - it's a function that has to take 6
> or 7 parameters with a return value that must be checked and
> handled.
> 
> This was done in the first versions of the patch set - the amount of
> code in each caller does not go down and, IMO, was much harder to
> read and determine "this is obviously correct" that what we have
> now.
> 
>> So not adding that super simple
>> helper is not exactly the best choice in my opinion.
> 
> Each to their own - I much prefer the existing style/API over having
> to go look up a helper function every time I want to check some
> random shrinker has been set up correctly

OK. All fair points.


-- 
Damien Le Moal
Western Digital Research

Re: [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Dave Chinner

On Thu, Jul 27, 2023 at 07:20:46PM +0900, Damien Le Moal wrote:
> On 7/27/23 17:55, Qi Zheng wrote:
> >>>   goto err;
> >>>   }
> >>>   +    zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
> >>> +    zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
> >>> +    zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
> >>> +    zmd->mblk_shrinker->private_data = zmd;
> >>> +
> >>> +    shrinker_register(zmd->mblk_shrinker);
> >>
> >> I fail to see how this new shrinker API is better... Why isn't there a
> >> shrinker_alloc_and_register() function ? That would avoid adding all this 
> >> code
> >> all over the place as the new API call would be very similar to the current
> >> shrinker_register() call with static allocation.
> > 
> > In some registration scenarios, memory needs to be allocated in advance.
> > So we continue to use the previous prealloc/register_prepared()
> > algorithm. The shrinker_alloc_and_register() is just a helper function
> > that combines the two, and this increases the number of APIs that
> > shrinker exposes to the outside, so I choose not to add this helper.
> 
> And that results in more code in many places instead of less code + a simple
> inline helper in the shrinker header file...

It's not just a "simple helper" - it's a function that has to take 6
or 7 parameters with a return value that must be checked and
handled.

This was done in the first versions of the patch set - the amount of
code in each caller does not go down and, IMO, was much harder to
read and determine "this is obviously correct" that what we have
now.

> So not adding that super simple
> helper is not exactly the best choice in my opinion.

Each to their own - I much prefer the existing style/API over having
to go look up a helper function every time I want to check some
random shrinker has been set up correctly

-Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [XEN PATCH v3] xen/spinlock: mechanically rename parameter name 'debug'

2023-07-27 Thread Julien Grall


Hi,

On 27/07/2023 20:35, Stefano Stabellini wrote:

On Thu, 27 Jul 2023, Jan Beulich wrote:

On 26.07.2023 23:49, Stefano Stabellini wrote:

On Wed, 26 Jul 2023, Jan Beulich wrote:

On 26.07.2023 08:42, Nicola Vetrini wrote:

On 26/07/23 08:34, Jan Beulich wrote:

On 25.07.2023 22:45, Nicola Vetrini wrote:

Rule 5.3 has the following headline:
"An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope"

To avoid any confusion resulting from the parameter 'debug'
hiding the homonymous function declared at
'xen/arch/x86/include/asm/processor.h:428'
the rename of parameters s/debug/lkdbg/ is performed.

Signed-off-by: Nicola Vetrini 
---
Changes in v2:
- s/dbg/lkdbg/
Changes in v3:
- Added missing renames for consistency


Hmm, you asked whether to send v3, but then you didn't wait for an
answer. So to repeat what I said there: I'd prefer if we could first
settle whether to rename the conflicting x86 symbol.



Stefano replied asking for a v3 [1] before I had a chance to read your
message this morning.


Right, sorry, I spotted his reply only after seeing the v3.


For what is worth I prefer the current implementation compared to
renaming debug()


I don't. My replacement name suggestions were only "just in case"; I
don't really like them.


Understood :-)

How would you like to proceed?

1. we commit this patch as is
2. we wait for a third opinion from another maintainer
3. we find a new name for the variable
4. we change debug() instead
IMO, the name debug() is quite generic and it is not obvious that the 
function is a trap handler. So I think renaming debug() is the right way 
to go.


Cheers,

--
Julien Grall

Re: [PATCH v2] xen/arm32: head: Widen the use of the temporary mapping

2023-07-27 Thread Julien Grall


Hi,

On 24/07/2023 18:15, Luca Fancellu wrote:




On 24 Jul 2023, at 11:14, Julien Grall  wrote:

From: Julien Grall 

At the moment, the temporary mapping is only used when the virtual
runtime region of Xen is clashing with the physical region.

In follow-up patches, we will rework how secondary CPU bring-up works
and it will be convenient to use the fixmap area for accessing
the root page-table (it is per-cpu).

Rework the code to use temporary mapping when the Xen physical address
is not overlapping with the temporary mapping.

This also has the advantage to simplify the logic to identity map
Xen.

Signed-off-by: Julien Grall 


Hi Julien,

Seems good to me, I’ve also tested on qemu and FVP, creating/destroying guests 
and no issues so far.

Reviewed-by: Luca Fancellu 
Tested-by: Luca Fancellu 


Thanks. I have committed with Henry's tag. But I also realized that I 
forgot to strip the changelog :/.


Cheers,

--
Julien Grall

Re: [XEN PATCH v3] device_tree: address violations of MISRA C:2012 Rules 8.2 and 8.3

2023-07-27 Thread Julien Grall


Hi,

On 25/07/2023 20:23, Stefano Stabellini wrote:

On Tue, 25 Jul 2023, Federico Serafini wrote:

Give a name to unnamed parameters thus addressing violations of
MISRA C:2012 Rule 8.2 ("Function types shall be in prototype form with
named parameters").
Keep consistency between parameter names and types used in function
declarations and the ones used in the corresponding function
definitions, thus addressing violations of MISRA C:2012 Rule 8.3
("All declarations of an object or function shall use the same names
and type qualifiers").

No functional changes.

Signed-off-by: Federico Serafini 


Reviewed-by: Stefano Stabellini 


I have committed the patch.

Cheers,

--
Julien Grall

Re: [PATCH v3] xen/arm: Move TEE mediators in a kconfig submenu

2023-07-27 Thread Julien Grall


Hi,

On 25/07/2023 16:45, Julien Grall wrote:

Hi Bertrand,

On 25/07/2023 16:42, Bertrand Marquis wrote:

Rework TEE mediators to put them under a submenu in Kconfig.
The submenu is only visible if UNSUPPORTED is activated as all currently
existing mediators are UNSUPPORTED.

While there rework a bit the configuration so that OP-TEE and FF-A
mediators are selecting the generic TEE interface instead of depending
on it.
Make the TEE option hidden as it is of no interest for anyone to select
it without one of the mediators so having them select it instead should
be enough.

Signed-off-by: Bertrand Marquis 


Reviewed-by: Julien Grall 


I have committed it.

Cheers,

--
Julien Grall

[libvirt test] 182030: tolerable all pass - PUSHED

2023-07-27 Thread osstest service owner

flight 182030 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182030/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182021
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182021
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182021
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  8e958c1644185eeaece0a724d9ff8674c4edce38
baseline version:
 libvirt  ea4c67f56769b292fdb03dc3e626bde22111c79d

Last test of basis   182021  2023-07-26 10:25:25 Z1 days
Testing same since   182030  2023-07-27 04:18:55 Z0 days1 attempts


People who touched revisions under test:
  Fedora Weblate Translation 
  Jiri Denemark 
  Weblate 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these

Re: [PATCH v3 17/25] tools/xenstore: rework struct xs_tdb_record_hdr

2023-07-27 Thread Julien Grall


Hi Juergen,

On 24/07/2023 12:02, Juergen Gross wrote:

Struct xs_tdb_record_hdr is used for nodes stored in the data base.
When working on a node, struct node is being used, which is including
the same information as struct xs_tdb_record_hdr, but in a different
format. Rework struct xs_tdb_record_hdr in order to prepare including
it in struct node.

Do the following modifications:

- move its definition to xenstored_core.h, as the reason to put it into
   utils.h are no longer existing

- rename it to struct node_hdr, as the "tdb" in its name has only
   historical reasons

- replace the empty permission array at the end with a comment about
   the layout of data in the data base (concatenation of header,
   permissions, node contents, and children list)

- use narrower types for num_perms and datalen, as those are naturally
   limited to XENSTORE_PAYLOAD_MAX (childlen is different here, as it is
   in theory basically unlimited)


The assumption is XENSTORE_PAYLOAD_MAX will never change and/or always 
apply for all the connection. I am aware of at least one downstream use 
where this is not the case.


I am happy to use narrower types, but I would at least like some checks 
in Xenstore to ensure the limits are not reached.




Signed-off-by: Juergen Gross 
---
V2:
- new patch
---
  tools/xenstore/utils.h |  9 ---
  tools/xenstore/xenstored_core.c| 35 +++---
  tools/xenstore/xenstored_core.h| 20 ++-
  tools/xenstore/xenstored_transaction.c |  6 ++---
  4 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/tools/xenstore/utils.h b/tools/xenstore/utils.h
index 028ecb9d7a..405d662ea2 100644
--- a/tools/xenstore/utils.h
+++ b/tools/xenstore/utils.h
@@ -9,15 +9,6 @@
  
  #include "xenstore_lib.h"
  
-/* Header of the node record in tdb. */

-struct xs_tdb_record_hdr {
-   uint64_t generation;
-   uint32_t num_perms;
-   uint32_t datalen;
-   uint32_t childlen;
-   struct xs_permissions perms[0];
-};
-
  /* Is A == B ? */
  #define streq(a,b) (strcmp((a),(b)) == 0)
  
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c

index 1f5f118f1c..86b7c9bf36 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -555,9 +555,9 @@ static void initialize_fds(int *p_sock_pollfd_idx, int 
*ptimeout)
}
  }
  
-const struct xs_tdb_record_hdr *db_fetch(const char *db_name, size_t *size)

+const struct node_hdr *db_fetch(const char *db_name, size_t *size)
  {
-   struct xs_tdb_record_hdr *hdr;
+   struct node_hdr *hdr;
  
  	hdr = hashtable_search(nodes, db_name);

if (!hdr) {
@@ -565,7 +565,7 @@ const struct xs_tdb_record_hdr *db_fetch(const char 
*db_name, size_t *size)
return NULL;
}
  
-	*size = sizeof(*hdr) + hdr->num_perms * sizeof(hdr->perms[0]) +

+   *size = sizeof(*hdr) + hdr->num_perms * sizeof(struct xs_permissions) +
hdr->datalen + hdr->childlen;
  
  	trace_tdb("read %s size %zu\n", db_name, *size + strlen(db_name));

@@ -573,10 +573,15 @@ const struct xs_tdb_record_hdr *db_fetch(const char 
*db_name, size_t *size)
return hdr;
  }
  
+static struct xs_permissions *perms_from_node_hdr(const struct node_hdr *hdr)


I don't much like the casting-away the const here. Looking at the code, 
most of the users seems to not modify the value. So how about return 
const here and open-coding or casting-away the const in the caller (with 
a proper explanation)?



+{
+   return (struct xs_permissions *)(hdr + 1);
+}
+
  static void get_acc_data(const char *name, struct node_account_data *acc)
  {
size_t size;
-   const struct xs_tdb_record_hdr *hdr;
+   const struct node_hdr *hdr;
  
  	if (acc->memory < 0) {

hdr = db_fetch(name, );
@@ -585,7 +590,7 @@ static void get_acc_data(const char *name, struct 
node_account_data *acc)
acc->memory = 0;
} else {
acc->memory = size;
-   acc->domid = hdr->perms[0].id;
+   acc->domid = perms_from_node_hdr(hdr)->id;
}
}
  }
@@ -606,7 +611,7 @@ int db_write(struct connection *conn, const char *db_name, 
const void *data,
 size_t size, struct node_account_data *acc,
 enum write_node_mode mode, bool no_quota_check)
  {
-   const struct xs_tdb_record_hdr *hdr = data;
+   const struct node_hdr *hdr = data;
struct node_account_data old_acc = {};
unsigned int old_domid, new_domid;
size_t name_len = strlen(db_name);
@@ -620,7 +625,7 @@ int db_write(struct connection *conn, const char *db_name, 
const void *data,
  
  	get_acc_data(db_name, _acc);

old_domid = get_acc_domid(conn, db_name, old_acc.domid);
-   new_domid = get_acc_domid(conn, db_name, hdr->perms[0].id);
+   new_domid = get_acc_domid(conn, db_name,

Re: [XEN PATCH] xen: change parameter name in monitor_domctl() declaration

2023-07-27 Thread Luca Fancellu




> On 27 Jul 2023, at 16:35, Federico Serafini  
> wrote:
> 
> Change parameter name in monitor_domctl() declaration for
> consistency with the corresponding definition.
> This addresses a violation of MISRA C:2012 Rule 8.3: "All declarations
> of an object or function shall use the same names and type qualifiers".
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Luca Fancellu

Re: [PATCH v3 16/25] tools/xenstore: move copying of node data out of db_fetch()

2023-07-27 Thread Julien Grall


Hi Juergen,

On 24/07/2023 12:02, Juergen Gross wrote:

Today the node data is copied in db_fetch() on each data base read in
order to avoid accidental data base modifications when working on a
node.

read_node() is the only caller of db_fetch() which isn't freeing the
returned data area immediately after using it. The other callers don't
modify the returned data, so they don't need the data to be copied.

Move copying of the data into read_node(), resulting in a speedup of
the other callers due to no memory allocation and no copying being
needed anymore.

This allows to let db_fetch() return a pointer to const data.

As db_fetch() can't return any error other than ENOENT now, error
handling for the callers can be simplified.

Signed-off-by: Juergen Gross 
---
V2:
- new patch
V3:
- modify return type of db_fetch() to return a pointer to const
   (Julien Grall)
- drop stale comment (Julien Grall)
- fix transaction handling
---
  tools/xenstore/xenstored_core.c| 45 +++---
  tools/xenstore/xenstored_core.h|  2 +-
  tools/xenstore/xenstored_transaction.c | 23 +
  3 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 9f88914149..1f5f118f1c 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -555,10 +555,9 @@ static void initialize_fds(int *p_sock_pollfd_idx, int 
*ptimeout)
}
  }
  
-struct xs_tdb_record_hdr *db_fetch(const char *db_name, size_t *size)

+const struct xs_tdb_record_hdr *db_fetch(const char *db_name, size_t *size)
  {
-   const struct xs_tdb_record_hdr *hdr;
-   struct xs_tdb_record_hdr *p;
+   struct xs_tdb_record_hdr *hdr;


Can't 'hdr' stay const?

  
  	hdr = hashtable_search(nodes, db_name);

if (!hdr) {


[...]


@@ -388,14 +385,26 @@ static int finalize_transaction(struct connection *conn,
if (i->ta_node) {
hdr = db_fetch(i->trans_name, );
if (hdr) {
+   /*
+* Delete transaction entry and write it as
+* no-TA entry. As we only hold a reference
+* to the data, increment its ref count, then
+* delete it from the DB. Now we own it and can
+* drop the const attribute for changing the


It is not great, but I understand this is another necessary evil. So I 
am ok with this cast-away const. It is also well documented.



+* generation count.
+*/
enum write_node_mode mode;
+   struct xs_tdb_record_hdr *own;
  
-hdr->generation = ++generation;

+   talloc_increase_ref_count(hdr);
+   db_delete(conn, i->trans_name, NULL);
+
+   own = (struct xs_tdb_record_hdr *)hdr;
+   own->generation = ++generation;
mode = (i->generation == NO_GENERATION)
   ? NODE_CREATE : NODE_MODIFY;
-   *is_corrupt |= db_write(conn, i->node, hdr,
+   *is_corrupt |= db_write(conn, i->node, own,
size, NULL, mode, true);
-   db_delete(conn, i->trans_name, NULL);
} else {
*is_corrupt = true;
}


Cheers,

--
Julien Grall

Re: [PATCH v3 14/25] tools/xenstore: change talloc_free() to take a const pointer

2023-07-27 Thread Julien Grall


Hi Juergen,

On 24/07/2023 12:02, Juergen Gross wrote:

With talloc_free() and related functions not taking a ponter to const


typo: s/ponter/pointer/


it is tedious to use the const attribute for talloc()-ed memory in
many cases.

Change the related prototypes to use "const void *" instead of
"void *".

Signed-off-by: Juergen Gross 
---
V3:
- new patch
---
  tools/xenstore/talloc.c | 8 
  tools/xenstore/talloc.h | 4 ++--
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/xenstore/talloc.c b/tools/xenstore/talloc.c
index 23c3a23b19..4f08dbec59 100644
--- a/tools/xenstore/talloc.c
+++ b/tools/xenstore/talloc.c
@@ -319,7 +319,7 @@ static int talloc_unreference(const void *context, const 
void *ptr)
remove a specific parent context from a pointer. This is a more
controlled varient of talloc_free()
  */
-int talloc_unlink(const void *context, void *ptr)
+int talloc_unlink(const void *context, const void *ptr)
  {
struct talloc_chunk *tc_p, *new_p;
void *new_parent;
@@ -499,7 +499,7 @@ void *talloc_init(const char *fmt, ...)
should probably not be used in new code. It's in here to keep the talloc
code consistent across Samba 3 and 4.
  */
-static void talloc_free_children(void *ptr)
+static void talloc_free_children(const void *ptr)
  {
struct talloc_chunk *tc;
  
@@ -539,7 +539,7 @@ static void talloc_free_children(void *ptr)

 will not be freed if the ref_count is > 1 or the destructor (if
 any) returns non-zero
  */
-int talloc_free(void *ptr)
+int talloc_free(const void *ptr)
  {
int saved_errno = errno;
struct talloc_chunk *tc;
@@ -571,7 +571,7 @@ int talloc_free(void *ptr)
goto err;
}
tc->destructor = (talloc_destructor_t)-1;
-   if (d(ptr) == -1) {
+   if (d((void *)ptr) == -1) {


AFAICT, you can't propagate the const because the destructor may need to 
modify the content. I guess this is a necessary evil here but it would 
be good to document it.


Cheers,

--
Julien Grall

Re: [PATCH v3 11/25] tools/xenstore: drop use of tdb

2023-07-27 Thread Julien Grall


Hi Juergen,

On 24/07/2023 12:02, Juergen Gross wrote:

Today all Xenstore nodes are stored in a TDB data base. This data base
has several disadvantages:

- It is using a fixed sized hash table, resulting in high memory
   overhead for small installations with only very few VMs, and a rather
   large performance hit for systems with lots of VMs due to many
   collisions.
   The hash table size today is 7919 entries. This means that e.g. in
   case of a simple desktop use case with 2 or 3 VMs probably far less
   than 10% of the entries will be used (assuming roughly 100 nodes per
   VM). OTOH a setup on a large server with 500 VMs would result in
   heavy conflicts in the hash list with 5-10 nodes per hash table entry.

- TDB is using a single large memory area for storing the nodes. It
   only ever increases this area and will never shrink it afterwards.
   This will result in more memory usage than necessary after a peak of
   Xenstore usage.

- Xenstore is only single-threaded, while TDB is designed to be fit
   for multi-threaded use cases, resulting in much higher code
   complexity than needed.

- Special use cases of Xenstore are not possible to implement with TDB
   in an effective way, while an implementation of a data base tailored
   for Xenstore could simplify some handling (e.g. transactions) a lot.

So drop using TDB and store the nodes directly in memory making them
easily accessible. Use a hash-based lookup mechanism for fast lookup
of nodes by their full path.

For now only replace TDB keeping the current access functions.

Signed-off-by: Juergen Gross 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v3 00/25] tools/xenstore: drop TDB

2023-07-27 Thread Julien Grall


Hi Juergen,

On 24/07/2023 12:02, Juergen Gross wrote:

Juergen Gross (25):
   tools/xenstore: explicitly specify create or modify for tdb_store()
   tools/xenstore: replace key in struct node with data base name
   tools/xenstore: let transaction_prepend() return the name for access
   tools/xenstore: rename do_tdb_delete() and change parameter type
   tools/xenstore: rename do_tdb_write() and change parameter type
   tools/xenstore: switch get_acc_data() to use name instead of key
   tools/xenstore: add wrapper for tdb_fetch()


I have committed up to this patch.

Cheers,

--
Julien Grall

Re: [PATCH v3 10/25] tools/xenstore: add hashtable_replace() function

2023-07-27 Thread Julien Grall


Hi Juergen,

On 24/07/2023 12:02, Juergen Gross wrote:

For an effective way to replace a hashtable entry add a new function
hashtable_replace().

This is in preparation to replace TDB with a more simple data storage.

Signed-off-by: Juergen Gross 
---
V3:
- fix commit message (Julien Grall)
- move unrelated change to previous patch (Julien Grall)
- make value parameter const
---
  tools/xenstore/hashtable.c | 19 +++
  tools/xenstore/hashtable.h | 16 
  2 files changed, 35 insertions(+)

diff --git a/tools/xenstore/hashtable.c b/tools/xenstore/hashtable.c
index 0409725060..f85b5a71f1 100644
--- a/tools/xenstore/hashtable.c
+++ b/tools/xenstore/hashtable.c
@@ -205,6 +205,25 @@ void *hashtable_search(const struct hashtable *h, const 
void *k)
  return e ? e->v : NULL;
  }
  
+int hashtable_replace(struct hashtable *h, const void *k, const void *v)


With the const dropped for 'v' and ...


+{
+struct entry *e;
+
+e = hashtable_search_entry(h, k);
+if (!e)
+return ENOENT;
+
+if (h->flags & HASHTABLE_FREE_VALUE)
+{
+talloc_free(e->v);
+talloc_steal(e, v);
+}
+
+e->v = (void *)v;


... cast:

Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [XEN PATCH] xen/notifier: address violations of MISRA C:2012 Rule 8.3

2023-07-27 Thread Stefano Stabellini

On Thu, 27 Jul 2023, Federico Serafini wrote:
> Change parameter names in function declarations to be consistent with
> the corresponding definitions. This addesses violations of MISRA C:2012
> Rule 8.3: "All declarations of an object or function shall use the same
> names and type qualifiers".
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 


> ---
>  xen/include/xen/notifier.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/include/xen/notifier.h b/xen/include/xen/notifier.h
> index 3d6017d4f7..51453c1552 100644
> --- a/xen/include/xen/notifier.h
> +++ b/xen/include/xen/notifier.h
> @@ -37,9 +37,9 @@ struct notifier_head {
>  
>  
>  void notifier_chain_register(
> -struct notifier_head *nh, struct notifier_block *nb);
> +struct notifier_head *nh, struct notifier_block *n);
>  void notifier_chain_unregister(
> -struct notifier_head *nh, struct notifier_block *nb);
> +struct notifier_head *nh, struct notifier_block *n);
>  
>  int notifier_call_chain(
>  struct notifier_head *nh, unsigned long val, void *v,
> -- 
> 2.34.1
>

Re: [XEN PATCH 2/3] xen/arm: irq: address violations of MISRA C: 2012 Rules 8.2 and 8.3

2023-07-27 Thread Stefano Stabellini

On Thu, 27 Jul 2023, Federico Serafini wrote:
> On 27/07/23 13:27, Jan Beulich wrote:
> > On 27.07.2023 13:02, Federico Serafini wrote:
> > > 
> > > I have ready a patch for violations of rules 8.2 and 8.3 in
> > > xen/include/xen/iommu.h.
> > > I am talking about this, in this IRQ thread, because I think the
> > > following two options also apply for an eventual v2 patch for the IRQ
> > > module, until a decision about rule 8.2 and function pointers is taken:
> > > 
> > > 1) Split patches and submit only the changes *not* involving function
> > >  pointers.
> > > 2) In the meantime that you make a decision,
> > >  I submit patches thus addressing the existing violations.
> > > 
> > > I personally prefer the second one, but please let me know what you
> > > think.
> > 
> > It's not entirely clear to me what 2 means, as I wouldn't expect you
> > intend to deal with "violations" which we may decide aren't any in
> > out world.
> > 
> > Jan
> 
> In point 2) I would like to act as if the rule 8.2 had been approved without
> any deviation, I think this will lead to a smaller number of patches and a
> smaller amount of text attached to each modified function.
> If you are concerned about inconsistency between the resulting commit
> messages and your future decision then we can go for option 1).

Basically I think we need to go with 1), which is also what Jan wrote.

Sorry about that, I know it is not great, we should have settled this
already.

Re: [XEN PATCH v3] xen/spinlock: mechanically rename parameter name 'debug'

2023-07-27 Thread Stefano Stabellini

On Thu, 27 Jul 2023, Jan Beulich wrote:
> On 26.07.2023 23:49, Stefano Stabellini wrote:
> > On Wed, 26 Jul 2023, Jan Beulich wrote:
> >> On 26.07.2023 08:42, Nicola Vetrini wrote:
> >>> On 26/07/23 08:34, Jan Beulich wrote:
>  On 25.07.2023 22:45, Nicola Vetrini wrote:
> > Rule 5.3 has the following headline:
> > "An identifier declared in an inner scope shall not hide an
> > identifier declared in an outer scope"
> >
> > To avoid any confusion resulting from the parameter 'debug'
> > hiding the homonymous function declared at
> > 'xen/arch/x86/include/asm/processor.h:428'
> > the rename of parameters s/debug/lkdbg/ is performed.
> >
> > Signed-off-by: Nicola Vetrini 
> > ---
> > Changes in v2:
> > - s/dbg/lkdbg/
> > Changes in v3:
> > - Added missing renames for consistency
> 
>  Hmm, you asked whether to send v3, but then you didn't wait for an
>  answer. So to repeat what I said there: I'd prefer if we could first
>  settle whether to rename the conflicting x86 symbol.
> 
> >>>
> >>> Stefano replied asking for a v3 [1] before I had a chance to read your 
> >>> message this morning.
> >>
> >> Right, sorry, I spotted his reply only after seeing the v3.
> > 
> > For what is worth I prefer the current implementation compared to
> > renaming debug()
> 
> I don't. My replacement name suggestions were only "just in case"; I
> don't really like them.

Understood :-)

How would you like to proceed?

1. we commit this patch as is
2. we wait for a third opinion from another maintainer
3. we find a new name for the variable
4. we change debug() instead

?

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Daniel P. Smith

On 7/27/23 11:15, George Dunlap wrote:

On Thu, Jul 27, 2023 at 3:42 PM Jan Beulich > wrote:

On 27.07.2023 15:26, Daniel P. Smith wrote:
 > Let's bring this back to the actual implementation instead of the
 > theoretical. Your position is that Xen's paddr_t is desired
because it
 > can store larger values than that of size_t. Now if you look in Xen
 > proper (main 64bit code on x86), paddr_t is a typedef for a 64bit
 > unsigned integer. And if you look at size_t, it is also a typedef
to a
 > 64bit unsigned integer, they are literally a couple of lines
apart in
 > types.h. Thus they are the same size and can only represent the same
 > maximum size.

What about 32-bit Arm, or any other 32-bit architecture that we might
see Xen ported to, with wider than 32-bit physical address space?

To be more concrete here:

Suppose that you had a machine with 32-bit virtual address spaces (i.e., 
going up to 4GiB), and 36-bit physical address spaces (i.e., going up to 
64GiB).  And suppose you had a system with 16GiB of physical ram.  And 
you wanted to use Hyperlaunch to create VMs using some sort of memory 
image that was 5GiB (presumably of some kind of static data, not, say, a 
kernel or initramfs).  You wouldn't be able to do it if the "size" 
parameter of the boot modules was limited to 4GiB (without some kind of 
hack where you string multiple boot modules together).

The reality to your scenario here is that it would never work on x86. 
Regardless of which entry point you come in through, 32bit BIOS entry, 
UEFI, or PVH, all of these expect the boot material to be passed in 
using the MB1 or MB2 protocol. Under those protocols, a module 
definition has a start and end addresses which are u32. While the start 
address could be okay, the end address will be well beyond 4GiB which at 
best would overflow the address, but more likely be rejected by the 
bootloader.

I guess part of the question is whether we think that's an important use 
case; on the whole if you're populating 5GiB of RAM, it seems like it 
would be better to have the VM load it itself from disk.

The point of hyperlaunch is to enable advanced use-cases and domain 
resume on boot is one I think someone might find useful.

I do see the logic behind wanting to avoid "paddr_t" for a size; I'm 
sure Jan that you would nack any patch that used "size_t" as a memory 
address (instead of, say, uintptr_t).  In that case, "psize_t" is the 
obvious solution.

I would be amenable with declaring a psize_t that lived in all the same 
places paddr_t are defined to signal them as a pair, of sorts. I would 
have to see if adding paddr_t and this new psize_t addresses to the 
32bit defs.h would result in making the structures 64bit clean in the 
x86 32bit code. If so, then it would remove the need for a separate 
declaration of the structures there.

v/r,
dps

Re: [PATCH] x86/HVM: drop dead assignments from hvmemul_rep_{movs,stos}()

2023-07-27 Thread Stefano Stabellini

On Thu, 27 Jul 2023, Jan Beulich wrote:
> In the latter case the variable altogether is then unused and hence gets
> dropped, eliminating a Misra Rule 5.3 violation. I'm afraid I mistakenly
> introduced both assignments in 57a57465daaf ("x86/HVM: use available
> linear->phys translations in REP MOVS/STOS handling"), likely as a
> result of some re-work on the patch.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Stefano Stabellini 


> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -1861,7 +1861,6 @@ static int cf_check hvmemul_rep_movs(
>  return rc;
>  }
>  
> -bytes = PAGE_SIZE - (daddr & ~PAGE_MASK);
>  if ( hvio->mmio_access.write_access &&
>   (hvio->mmio_gla == (daddr & PAGE_MASK)) &&
>   /* See comment above. */
> @@ -1988,7 +1987,7 @@ static int cf_check hvmemul_rep_stos(
>  container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
>  struct vcpu *curr = current;
>  struct hvm_vcpu_io *hvio = >arch.hvm.hvm_io;
> -unsigned long addr, bytes;
> +unsigned long addr;
>  paddr_t gpa;
>  p2m_type_t p2mt;
>  bool_t df = !!(ctxt->regs->eflags & X86_EFLAGS_DF);
> @@ -1998,7 +1997,6 @@ static int cf_check hvmemul_rep_stos(
>  if ( rc != X86EMUL_OKAY )
>  return rc;
>  
> -bytes = PAGE_SIZE - (addr & ~PAGE_MASK);
>  if ( hvio->mmio_access.write_access &&
>   (hvio->mmio_gla == (addr & PAGE_MASK)) &&
>   /* See respective comment in MOVS processing. */
>

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Stefano Stabellini

On Thu, 27 Jul 2023, George Dunlap wrote:
> On Thu, Jul 27, 2023 at 3:42 PM Jan Beulich  wrote:
>   On 27.07.2023 15:26, Daniel P. Smith wrote:
>   > Let's bring this back to the actual implementation instead of the
>   > theoretical. Your position is that Xen's paddr_t is desired because it
>   > can store larger values than that of size_t. Now if you look in Xen
>   > proper (main 64bit code on x86), paddr_t is a typedef for a 64bit
>   > unsigned integer. And if you look at size_t, it is also a typedef to a
>   > 64bit unsigned integer, they are literally a couple of lines apart in
>   > types.h. Thus they are the same size and can only represent the same
>   > maximum size.
> 
>   What about 32-bit Arm, or any other 32-bit architecture that we might
>   see Xen ported to, with wider than 32-bit physical address space?
> 
> 
> To be more concrete here:
> 
> Suppose that you had a machine with 32-bit virtual address spaces (i.e., 
> going up to 4GiB), and 36-bit physical address spaces (i.e., going
> up to 64GiB).  And suppose you had a system with 16GiB of physical ram.  And 
> you wanted to use Hyperlaunch to create VMs using some sort of
> memory image that was 5GiB (presumably of some kind of static data, not, say, 
> a kernel or initramfs).  You wouldn't be able to do it if the
> "size" parameter of the boot modules was limited to 4GiB (without some kind 
> of hack where you string multiple boot modules together).

Yes exactly.

I would like this code to be common across ARM and x86. On arm32 size_t
wouldn't work, with size_t as it is defined today. One option is to use
paddr_t on all arches, or at least on arm32. Another option is to change
the definition of size_t on arm32.

My suggestion it to move the existing equivalent dom0less interface
defined here:
xen/arch/arm/include/asm/setup.c:struct bootmodule
to common code and base this work on top of it. struct bootmodule uses
paddr_t for both start and size to solve the arm32 issue today.

Re: [PATCH RFC] x86/boot: Update construct_dom0() to take a const char *cmdline

2023-07-27 Thread Daniel P. Smith


On 7/19/23 09:18, Andrew Cooper wrote:

With hvm_copy_to_guest_*() able to use const sources, update construct_dom0()
and friends to pass a const cmdline pointer.  Nothing in these paths have a
reason to be modifying the command line passed in.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Daniel Smith 
CC: Christopher Clark 

Slightly RFC.

I'm confused as to why image is const, but the initrd isn't.

Also, I suspect this will interfere with the Hyperlauch work, and I'd be happy
to leave it alone if all of this is being fixed differently anyway.


This is overall a good change and honestly I don't see this having any 
significant impact on HL. And if it does, it would be better to fix HL 
then block this positive change.



This is necessary to make the -Wwrite-strings bodge compile, but I'm hoping
that a less-bad solution to the cmdline literals problem would avoid the need
to propagate const through this callpath.
---
  xen/arch/x86/dom0_build.c | 2 +-
  xen/arch/x86/hvm/dom0_build.c | 4 ++--
  xen/arch/x86/include/asm/dom0_build.h | 4 ++--
  xen/arch/x86/include/asm/setup.h  | 2 +-
  xen/arch/x86/pv/dom0_build.c  | 2 +-
  5 files changed, 7 insertions(+), 7 deletions(-)



Reviewed-by: Daniel P. Smith

[PATCH v4 1/1] ns16550: add support for polling mode when device tree is used

2023-07-27 Thread Oleksii Kurochko

RISC-V doesn't support interrupts for the time being, so it would be nice to
have polling mode.

To force poll mode it was added parsing of new 'poll' property under "com="

If there is no interrupt property in UART node, then polling will be used.
According to 4.2.2 National Semiconductor 16450/16550 Compatible UART
Requirements from The Devicetree Specification v0.4
( https://www.devicetree.org/specifications/ ) the interrupt property is
optional.

Also, it is possible that interrupt '0' can be used for some architectures as
an legal UART interrupt number ( according to dts files in Linux kernel ), so
the check of the return value of platform_get_irq() was updated in case of when
device tree is used for UART initialization.
For example:
https://github.com/torvalds/linux/blob/master/arch/powerpc/boot/dts/ebony.dts#L197

Signed-off-by: Oleksii Kurochko 
---
Changes in V4:
 - fix code style for ns16550_irq().
 - partially revert changes in pci_uart_config().
 - add check of "poll" property under "com=" instead of "console=".
 - remove seting of polling mode in parsing of msi in parse_positional().
 - update comment above opt_com1 about Configuration string of serial port.
---
 xen/drivers/char/ns16550.c | 64 +-
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 2aed6ec707..83170fd6b8 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -58,7 +58,11 @@ static struct ns16550 {
 struct timer timer;
 struct timer resume_timer;
 unsigned int timeout_ms;
-bool_t intr_works;
+enum {
+intr_off,
+intr_on,
+polling,
+} intr_works;
 bool_t dw_usr_bsy;
 #ifdef NS16550_PCI
 /* PCI card parameters. */
@@ -112,6 +116,19 @@ struct ns16550_config_param {
 static void enable_exar_enhanced_bits(const struct ns16550 *uart);
 #endif
 
+/*
+ * Configure serial port with a string:
+ *   
[/][,DPS[,[,|msi|poll[,[,].
+ * The tail of the string can be omitted if platform defaults are sufficient.
+ * If the baud rate is pre-configured, perhaps by a bootloader, then 'auto'
+ * can be specified in place of a numeric baud rate. Polled mode is specified
+ * by poll property.
+ */
+static char __initdata opt_com1[128] = "";
+static char __initdata opt_com2[128] = "";
+string_param("com1", opt_com1);
+string_param("com2", opt_com2);
+
 static void cf_check ns16550_delayed_resume(void *data);
 
 static u8 ns_read_reg(const struct ns16550 *uart, unsigned int reg)
@@ -181,7 +198,7 @@ static void cf_check ns16550_interrupt(
 struct serial_port *port = dev_id;
 struct ns16550 *uart = port->uart;
 
-uart->intr_works = 1;
+uart->intr_works = intr_on;
 
 while ( !(ns_read_reg(uart, UART_IIR) & UART_IIR_NOINT) )
 {
@@ -212,7 +229,7 @@ static void cf_check __ns16550_poll(struct cpu_user_regs 
*regs)
 struct serial_port *port = this_cpu(poll_port);
 struct ns16550 *uart = port->uart;
 
-if ( uart->intr_works )
+if ( uart->intr_works == intr_on )
 return; /* Interrupts work - no more polling */
 
 while ( ns_read_reg(uart, UART_LSR) & UART_LSR_DR )
@@ -305,7 +322,8 @@ static void ns16550_setup_preirq(struct ns16550 *uart)
 unsigned char lcr;
 unsigned int  divisor;
 
-uart->intr_works = 0;
+if ( uart->intr_works != polling )
+uart->intr_works = intr_off;
 
 pci_serial_early_init(uart);
 
@@ -394,7 +412,7 @@ static void __init cf_check ns16550_init_irq(struct 
serial_port *port)
 
 static void ns16550_setup_postirq(struct ns16550 *uart)
 {
-if ( uart->irq > 0 )
+if ( uart->intr_works != polling )
 {
 /* Master interrupt enable; also keep DTR/RTS asserted. */
 ns_write_reg(uart,
@@ -473,6 +491,7 @@ static void __init cf_check ns16550_init_postirq(struct 
serial_port *port)
 if ( rc )
 {
 uart->irq = 0;
+uart->intr_works = polling;
 if ( msi_desc )
 msi_free_irq(msi_desc);
 else
@@ -488,7 +507,7 @@ static void __init cf_check ns16550_init_postirq(struct 
serial_port *port)
 }
 #endif
 
-if ( uart->irq > 0 )
+if ( uart->intr_works != polling )
 {
 uart->irqaction.handler = ns16550_interrupt;
 uart->irqaction.name= "ns16550";
@@ -595,7 +614,8 @@ static void __init cf_check ns16550_endboot(struct 
serial_port *port)
 static int __init cf_check ns16550_irq(struct serial_port *port)
 {
 struct ns16550 *uart = port->uart;
-return ((uart->irq > 0) ? uart->irq : -1);
+
+return ((uart->intr_works != polling) ? uart->irq : -1);
 }
 
 static void cf_check ns16550_start_tx(struct serial_port *port)
@@ -654,6 +674,9 @@ static void ns16550_init_common(struct ns16550 *uart)
 
 /* Default lsr_mask = UART_LSR_THRE */
 uart->lsr_mask  = UART_LSR_THRE;
+
+if ( strstr(opt_com1,

[PATCH v6 2/2] xl: Add escape character argument to xl console

2023-07-27 Thread Peter Hoyes

From: Peter Hoyes 

Add -e argument to xl console and pass to new escape_character argument
of libxl_console_exec.

Introduce a new API version to support this new argument and advertise
the new functionality in libxl.h

In libxl_console_exec, there are currently two call sites to execl,
which uses varargs, in order to support optionally passing
'start-notify-fd' to the console client. In order to support passing
the 'escape' argument optionally too, refactor to instead have a single
call site to execv, which has the same behavior but takes an array of
arguments.

If -e is not specified, --escape is not passed to the console client and
the existing value (^]) is used as a default.

Update the xl docs.

Signed-off-by: Peter Hoyes 
---
Changes in v6:
- Fix the new compatiblity functions to use last presented version
- Fix the arguments of the existing compatibility functions

Changes in v5:
- Add this changelog
- Fix comment style in libxl_console_exec

Changes in v4:
- Document xl console -e argument in xl.1.pod.in
- Add changes for libxl API version compatibility

Changes in v3:
- Re-add the Reviewed-By tag accidentally removed in v2

Changes in v2:
- Drop the tags intended only for internal use at Arm

 docs/man/xl.1.pod.in |  8 +-
 tools/include/libxl.h| 43 
 tools/libs/light/libxl_console.c | 30 --
 tools/xl/xl_cmdtable.c   |  3 ++-
 tools/xl/xl_console.c| 10 +---
 tools/xl/xl_vmcontrol.c  |  2 +-
 6 files changed, 77 insertions(+), 19 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 101e14241d..9ba22a8fa2 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -222,7 +222,8 @@ Attach to the console of a domain specified by 
I.  If you've set up
 your domains to have a traditional login console this will look much like a
 normal text login screen.
 
-Use the key combination Ctrl+] to detach from the domain console.
+Use the escape character key combination (default Ctrl+]) to detach from the
+domain console.
 
 B
 
@@ -239,6 +240,11 @@ emulated serial for HVM guests and PV console for PV 
guests.
 
 Connect to console number I. Console numbers start from 0.
 
+=item I<-e escapechar>
+
+Customize the escape sequence used to detach from the domain console to
+I. If not specified, the value "^]" is used.
+
 =back
 
 =item B [I] I
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index cac641a7eb..de29f11bc9 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -81,6 +81,15 @@
  */
 #define LIBXL_HAVE_CONSOLE_NOTIFY_FD 1
 
+/* LIBXL_HAVE_CONSOLE_ESCAPE_CHARACTER
+ *
+ * If this is defined, libxl_console_exec and
+ * libxl_primary_console_exe take an escape_character parameter. That
+ * parameter will be used to modify the escape sequence used to exit the
+ * console.
+ */
+#define LIBXL_HAVE_CONSOLE_ESCAPE_CHARACTER 1
+
 /* LIBXL_HAVE_CONST_COPY_AND_LENGTH_FUNCTIONS
  *
  * If this is defined, the copy functions have constified src parameter and the
@@ -790,7 +799,8 @@ typedef struct libxl__ctx libxl_ctx;
 #if LIBXL_API_VERSION != 0x040200 && LIBXL_API_VERSION != 0x040300 && \
 LIBXL_API_VERSION != 0x040400 && LIBXL_API_VERSION != 0x040500 && \
 LIBXL_API_VERSION != 0x040700 && LIBXL_API_VERSION != 0x040800 && \
-LIBXL_API_VERSION != 0x041300 && LIBXL_API_VERSION != 0x041400
+LIBXL_API_VERSION != 0x041300 && LIBXL_API_VERSION != 0x041400 && \
+LIBXL_API_VERSION != 0x041800
 #error Unknown LIBXL_API_VERSION
 #endif
 #endif
@@ -1958,7 +1968,8 @@ int libxl_vncviewer_exec(libxl_ctx *ctx, uint32_t domid, 
int autopass);
  * the caller that it has connected to the guest console.
  */
 int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
-   libxl_console_type type, int notify_fd);
+   libxl_console_type type, int notify_fd,
+   char* escape_character);
 /* libxl_primary_console_exec finds the domid and console number
  * corresponding to the primary console of the given vm, then calls
  * libxl_console_exec with the right arguments (domid might be different
@@ -1968,9 +1979,12 @@ int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, 
int cons_num,
  * guests using pygrub.
  * If notify_fd is not -1, xenconsole will write 0x00 to it to nofity
  * the caller that it has connected to the guest console.
+ * If escape_character is not NULL, the provided value is used to exit
+ * the guest console.
  */
 int libxl_primary_console_exec(libxl_ctx *ctx, uint32_t domid_vm,
-   int notify_fd);
+   int notify_fd,
+   char* escape_character);
 
 #if defined(LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x040800
 
@@ -1978,17 +1992,36 @@ static inline int libxl_console_exec_0x040700(libxl_ctx 
*ctx,
   uint32_t domid, int cons_num,

[PATCH v6 1/2] tools/console: Add escape argument to configure escape character

2023-07-27 Thread Peter Hoyes

From: Peter Hoyes 

Dom0 may be accessed via telnet, meaning the default escape character
(which is the same as telnet's) cannot be directly used to exit the
console. It would be helpful to make the escape character customizable
in such use cases.

Add --escape argument to console tool for this purpose.

Add argument to getopt options, parse and validate the escape character
and pass value to console_loop.

If --escape is not specified, it falls back to the existing behavior
using DEFAULT_ESCAPE_SEQUENCE.

Signed-off-by: Peter Hoyes 
Reviewed-by: Anthony PERARD 
Reviewed-by: Hongda Deng 
---
Changes in v5:
- Add this changelog

Changes in v4:
- Improve validation of the escape_character optarg

Changes in v3:
- Re-add the Reviewed-By tag accidentally removed in v2

Changes in v2:
- Drop the tags intended only for internal use at Arm

 tools/console/client/main.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/console/client/main.c b/tools/console/client/main.c
index 6775006488..d2dcc3ddca 100644
--- a/tools/console/client/main.c
+++ b/tools/console/client/main.c
@@ -42,7 +42,7 @@
 #include 
 #include "xenctrl.h"
 
-#define ESCAPE_CHARACTER 0x1d
+#define DEFAULT_ESCAPE_CHARACTER 0x1d
 
 static volatile sig_atomic_t received_signal = 0;
 static char lockfile[sizeof (XEN_LOCK_DIR "/xenconsole.") + 8] = { 0 };
@@ -77,6 +77,7 @@ static void usage(const char *program) {
   "  -n, --num N  use console number N\n"
   "  --type TYPE  console type. must be 'pv', 'serial' or 
'vuart'\n"
   "  --start-notify-fd N file descriptor used to notify parent\n"
+  "  --escape E   escape sequence to exit console\n"
   , program);
 }
 
@@ -174,7 +175,7 @@ static void restore_term(int fd, struct termios *old)
 }
 
 static int console_loop(int fd, struct xs_handle *xs, char *pty_path,
-   bool interactive)
+   bool interactive, char escape_character)
 {
int ret, xs_fd = xs_fileno(xs), max_fd = -1;
 
@@ -215,7 +216,7 @@ static int console_loop(int fd, struct xs_handle *xs, char 
*pty_path,
char msg[60];
 
len = read(STDIN_FILENO, msg, sizeof(msg));
-   if (len == 1 && msg[0] == ESCAPE_CHARACTER) {
+   if (len == 1 && msg[0] == escape_character) {
return 0;
} 
 
@@ -335,6 +336,7 @@ int main(int argc, char **argv)
{ "help",0, 0, 'h' },
{ "start-notify-fd", 1, 0, 's' },
{ "interactive", 0, 0, 'i' },
+   { "escape",  1, 0, 'e' },
{ 0 },
 
};
@@ -345,6 +347,7 @@ int main(int argc, char **argv)
console_type type = CONSOLE_INVAL;
bool interactive = 0;
const char *console_names = "serial, pv, vuart";
+   char escape_character = DEFAULT_ESCAPE_CHARACTER;
 
while((ch = getopt_long(argc, argv, sopt, lopt, _ind)) != -1) {
switch(ch) {
@@ -375,6 +378,16 @@ int main(int argc, char **argv)
case 'i':
interactive = 1;
break;
+   case 'e':
+   if (optarg[0] == '^' && optarg[1] && optarg[2] == '\0')
+   escape_character = optarg[1] & 0x1f;
+   else if (optarg[0] && optarg[1] == '\0')
+   escape_character = optarg[0];
+   else {
+   fprintf(stderr, "Invalid escape argument\n");
+   exit(EINVAL);
+   }
+   break;
default:
fprintf(stderr, "Invalid argument\n");
fprintf(stderr, "Try `%s --help' for more 
information.\n", 
@@ -493,7 +506,7 @@ int main(int argc, char **argv)
close(start_notify_fd);
}
 
-   console_loop(spty, xs, path, interactive);
+   console_loop(spty, xs, path, interactive, escape_character);
 
free(path);
free(dom_path);
-- 
2.34.1

[PATCH v6 0/2] Add escape character argument to Xen console

2023-07-27 Thread Peter Hoyes

From: Peter Hoyes 

Dom0 may be accessed via telnet, meaning the default escape character
(which is the same as telnet's) cannot be directly used to exit the
console. It would be helpful to make the escape character customizable
in such use cases, falling back to the existing value if not set.

Make the necessary changes to the console client, libxl and the xl
console sub-command to support this.

Peter Hoyes (2):
  tools/console: Add escape argument to configure escape character
  xl: Add escape character argument to xl console

 docs/man/xl.1.pod.in |  8 +-
 tools/console/client/main.c  | 21 +---
 tools/include/libxl.h| 43 
 tools/libs/light/libxl_console.c | 30 --
 tools/xl/xl_cmdtable.c   |  3 ++-
 tools/xl/xl_console.c| 10 +---
 tools/xl/xl_vmcontrol.c  |  2 +-
 7 files changed, 94 insertions(+), 23 deletions(-)

-- 
2.34.1

[xen-unstable-smoke test] 182043: tolerable all pass - PUSHED

2023-07-27 Thread osstest service owner

flight 182043 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182043/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  0400946d532ec41bc5d0bedd3e9ef036308ce623
baseline version:
 xen  baa6ea7003868d1a339d06b17fd32d41b851d571

Last test of basis   182034  2023-07-27 08:02:11 Z0 days
Testing same since   182043  2023-07-27 13:00:27 Z0 days1 attempts


People who touched revisions under test:
  Shawn Anastasio 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   baa6ea7003..0400946d53  0400946d532ec41bc5d0bedd3e9ef036308ce623 -> smoke

[XEN PATCH] xen/notifier: address violations of MISRA C:2012 Rule 8.3

2023-07-27 Thread Federico Serafini

Change parameter names in function declarations to be consistent with
the corresponding definitions. This addesses violations of MISRA C:2012
Rule 8.3: "All declarations of an object or function shall use the same
names and type qualifiers".

No functional changes.

Signed-off-by: Federico Serafini 
---
 xen/include/xen/notifier.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/include/xen/notifier.h b/xen/include/xen/notifier.h
index 3d6017d4f7..51453c1552 100644
--- a/xen/include/xen/notifier.h
+++ b/xen/include/xen/notifier.h
@@ -37,9 +37,9 @@ struct notifier_head {
 
 
 void notifier_chain_register(
-struct notifier_head *nh, struct notifier_block *nb);
+struct notifier_head *nh, struct notifier_block *n);
 void notifier_chain_unregister(
-struct notifier_head *nh, struct notifier_block *nb);
+struct notifier_head *nh, struct notifier_block *n);
 
 int notifier_call_chain(
 struct notifier_head *nh, unsigned long val, void *v,
-- 
2.34.1

Re: [XEN PATCH 4/4] x86: avoid shadowing to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Jan Beulich

On 27.07.2023 17:58, Nicola Vetrini wrote:
> 
> 
> On 27/07/23 17:41, Jan Beulich wrote:
>> On 27.07.2023 12:48, Nicola Vetrini wrote:
>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>>> @@ -1483,7 +1483,7 @@ x86_emulate(
>>>   {
>>>   enum x86_segment seg;
>>>   struct segment_register cs, sreg;
>>> -struct cpuid_leaf cpuid_leaf;
>>> +struct cpuid_leaf res;
>>
>> This is too generic a name for a variable with a scope of several
>> thousand lines. Perhaps just "leaf"?
> 
> It can also be defined inside the switch clause, since it has no other 
> purpose than store a result.

That would be more code churn, though.

>>> @@ -8408,8 +8408,6 @@ x86_emulate(
>>>   generate_exception(X86_EXC_MF);
>>>   if ( stub_exn.info.fields.trapnr == X86_EXC_XM )
>>>   {
>>> -unsigned long cr4;
>>> -
>>>   if ( !ops->read_cr || ops->read_cr(4, , ctxt) != X86EMUL_OKAY 
>>> )
>>>   cr4 = X86_CR4_OSXMMEXCPT;
>>>   generate_exception(cr4 & X86_CR4_OSXMMEXCPT ? X86_EXC_XM : 
>>> X86_EXC_UD);
>>
>> This change looks okay to me, but I'd like to strongly encourage
>> you to split both changes. They're of different nature, and for
>> the latter it may even be worthwhile pointing out when exactly
>> this duplication of variables was introduced (it clearly would
>> better have been avoided).
>>
> 
> I did it this way because they are the only violations of R5.3 left in 
> this file (among those not subject to deviation). By splitting you mean 
> two patches in this series or a separate patch just for this change?

Separate or within a series doesn't matter. Just preferably not in
the same patch. (And btw, if you split larger patches more, some of
your changes may also go in more quickly. Yet of course this shouldn't
get too fine grained.)

Jan

Re: [XEN PATCH 4/4] x86: avoid shadowing to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Nicola Vetrini





On 27/07/23 17:41, Jan Beulich wrote:

On 27.07.2023 12:48, Nicola Vetrini wrote:

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1483,7 +1483,7 @@ x86_emulate(
  {
  enum x86_segment seg;
  struct segment_register cs, sreg;
-struct cpuid_leaf cpuid_leaf;
+struct cpuid_leaf res;


This is too generic a name for a variable with a scope of several
thousand lines. Perhaps just "leaf"?


It can also be defined inside the switch clause, since it has no other 
purpose than store a result.





@@ -8408,8 +8408,6 @@ x86_emulate(
  generate_exception(X86_EXC_MF);
  if ( stub_exn.info.fields.trapnr == X86_EXC_XM )
  {
-unsigned long cr4;
-
  if ( !ops->read_cr || ops->read_cr(4, , ctxt) != X86EMUL_OKAY )
  cr4 = X86_CR4_OSXMMEXCPT;
  generate_exception(cr4 & X86_CR4_OSXMMEXCPT ? X86_EXC_XM : 
X86_EXC_UD);


This change looks okay to me, but I'd like to strongly encourage
you to split both changes. They're of different nature, and for
the latter it may even be worthwhile pointing out when exactly
this duplication of variables was introduced (it clearly would
better have been avoided).



I did it this way because they are the only violations of R5.3 left in 
this file (among those not subject to deviation). By splitting you mean 
two patches in this series or a separate patch just for this change?


--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)

Re: [XEN PATCH] xen: change names and type qualifiers in copy_domain_page() declaration

2023-07-27 Thread Jan Beulich

On 27.07.2023 17:04, Federico Serafini wrote:
> Change names and type qualifiers of parameters in copy_domain_page()
> declaration to keep consistency with the corresponding definition.
> This addresses violations of MISRA C:2012 Rule 8.3: "All declarations
> of an object or function shall use the same names and type qualifiers".
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich

Re: [XEN PATCH 4/4] x86: avoid shadowing to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Jan Beulich

On 27.07.2023 12:48, Nicola Vetrini wrote:
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -1483,7 +1483,7 @@ x86_emulate(
>  {
>  enum x86_segment seg;
>  struct segment_register cs, sreg;
> -struct cpuid_leaf cpuid_leaf;
> +struct cpuid_leaf res;

This is too generic a name for a variable with a scope of several
thousand lines. Perhaps just "leaf"?

> @@ -8408,8 +8408,6 @@ x86_emulate(
>  generate_exception(X86_EXC_MF);
>  if ( stub_exn.info.fields.trapnr == X86_EXC_XM )
>  {
> -unsigned long cr4;
> -
>  if ( !ops->read_cr || ops->read_cr(4, , ctxt) != X86EMUL_OKAY )
>  cr4 = X86_CR4_OSXMMEXCPT;
>  generate_exception(cr4 & X86_CR4_OSXMMEXCPT ? X86_EXC_XM : 
> X86_EXC_UD);

This change looks okay to me, but I'd like to strongly encourage
you to split both changes. They're of different nature, and for
the latter it may even be worthwhile pointing out when exactly
this duplication of variables was introduced (it clearly would
better have been avoided).

Jan

[XEN PATCH] xen: change parameter name in monitor_domctl() declaration

2023-07-27 Thread Federico Serafini

Change parameter name in monitor_domctl() declaration for
consistency with the corresponding definition.
This addresses a violation of MISRA C:2012 Rule 8.3: "All declarations
of an object or function shall use the same names and type qualifiers".

No functional changes.

Signed-off-by: Federico Serafini 
---
 xen/include/xen/monitor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/monitor.h b/xen/include/xen/monitor.h
index 6b17a93071..713d54f7c1 100644
--- a/xen/include/xen/monitor.h
+++ b/xen/include/xen/monitor.h
@@ -27,7 +27,7 @@
 struct domain;
 struct xen_domctl_monitor_op;
 
-int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *op);
+int monitor_domctl(struct domain *d, struct xen_domctl_monitor_op *mop);
 void monitor_guest_request(void);
 
 int monitor_traps(struct vcpu *v, bool sync, vm_event_request_t *req);
-- 
2.34.1

Re: [XEN PATCH 2/4] x86/emulate: move a variable declaration to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Nicola Vetrini





On 27/07/23 17:31, Jan Beulich wrote:

On 27.07.2023 17:22, Nicola Vetrini wrote:



On 27/07/23 17:06, Jan Beulich wrote:

On 27.07.2023 12:48, Nicola Vetrini wrote:

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2024,15 +2024,15 @@ static int cf_check hvmemul_rep_stos(
   
   switch ( p2mt )

   {
-unsigned long bytes;
   char *buf;
   
   default:

   /* Allocate temporary buffer. */
   for ( ; ; )
   {
-bytes = *reps * bytes_per_rep;
-buf = xmalloc_bytes(bytes);
+unsigned long bytes_tmp;
+bytes_tmp = *reps * bytes_per_rep;
+buf = xmalloc_bytes(bytes_tmp);
   if ( buf || *reps <= 1 )
   break;
   *reps >>= 1;


This wants dealing with differently - the outer scope variable is unused
(only written to) afaics. Eliminating it will, aiui, address another
violation at the same time. And then the same in hvmemul_rep_movs(), just
that there the variable itself needs to survive. I guess I'll make a
patch ...


Wouldn't this code at line ~2068 be possibly affected by writing to
bytes, if the outer variable is used?


Which outer variable? I'm suggesting to drop that (see the patch that
I've sent already).

Jan


/* Adjust address for reverse store. */
if ( df )
gpa -= bytes - bytes_per_rep;

rc = hvm_copy_to_guest_phys(gpa, buf, bytes, curr);

You're right about the other violation (R2.1)





I see, sorry for the noise.

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)

Re: [XEN PATCH 3/4] x86/mm: mechanically rename variable to avoid shadowing

2023-07-27 Thread Jan Beulich

On 27.07.2023 12:48, Nicola Vetrini wrote:
> The rename s/p2mt/t/ is done to avoid shadowing the same declaration
> in the enclosing scope.
> 
> Signed-off-by: Nicola Vetrini 
> ---
> Is the semantics of the function altered if the inner declaration
> is removed entirely?

No, that's what should be done. It's an output only for get_gfn().

Jan

Re: [XEN PATCH 2/4] x86/emulate: move a variable declaration to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Jan Beulich

On 27.07.2023 17:22, Nicola Vetrini wrote:
> 
> 
> On 27/07/23 17:06, Jan Beulich wrote:
>> On 27.07.2023 12:48, Nicola Vetrini wrote:
>>> --- a/xen/arch/x86/hvm/emulate.c
>>> +++ b/xen/arch/x86/hvm/emulate.c
>>> @@ -2024,15 +2024,15 @@ static int cf_check hvmemul_rep_stos(
>>>   
>>>   switch ( p2mt )
>>>   {
>>> -unsigned long bytes;
>>>   char *buf;
>>>   
>>>   default:
>>>   /* Allocate temporary buffer. */
>>>   for ( ; ; )
>>>   {
>>> -bytes = *reps * bytes_per_rep;
>>> -buf = xmalloc_bytes(bytes);
>>> +unsigned long bytes_tmp;
>>> +bytes_tmp = *reps * bytes_per_rep;
>>> +buf = xmalloc_bytes(bytes_tmp);
>>>   if ( buf || *reps <= 1 )
>>>   break;
>>>   *reps >>= 1;
>>
>> This wants dealing with differently - the outer scope variable is unused
>> (only written to) afaics. Eliminating it will, aiui, address another
>> violation at the same time. And then the same in hvmemul_rep_movs(), just
>> that there the variable itself needs to survive. I guess I'll make a
>> patch ...
> 
> Wouldn't this code at line ~2068 be possibly affected by writing to 
> bytes, if the outer variable is used?

Which outer variable? I'm suggesting to drop that (see the patch that
I've sent already).

Jan

> /* Adjust address for reverse store. */
> if ( df )
>gpa -= bytes - bytes_per_rep;
> 
> rc = hvm_copy_to_guest_phys(gpa, buf, bytes, curr);
> 
> You're right about the other violation (R2.1)
>

Re: [PATCH] x86/HVM: drop dead assignments from hvmemul_rep_{movs,stos}()

2023-07-27 Thread Andrew Cooper

On 27/07/2023 4:25 pm, Jan Beulich wrote:
> In the latter case the variable altogether is then unused and hence gets
> dropped, eliminating a Misra Rule 5.3 violation. I'm afraid I mistakenly
> introduced both assignments in 57a57465daaf ("x86/HVM: use available
> linear->phys translations in REP MOVS/STOS handling"), likely as a
> result of some re-work on the patch.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper

[PATCH] x86/HVM: drop dead assignments from hvmemul_rep_{movs,stos}()

2023-07-27 Thread Jan Beulich

In the latter case the variable altogether is then unused and hence gets
dropped, eliminating a Misra Rule 5.3 violation. I'm afraid I mistakenly
introduced both assignments in 57a57465daaf ("x86/HVM: use available
linear->phys translations in REP MOVS/STOS handling"), likely as a
result of some re-work on the patch.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1861,7 +1861,6 @@ static int cf_check hvmemul_rep_movs(
 return rc;
 }
 
-bytes = PAGE_SIZE - (daddr & ~PAGE_MASK);
 if ( hvio->mmio_access.write_access &&
  (hvio->mmio_gla == (daddr & PAGE_MASK)) &&
  /* See comment above. */
@@ -1988,7 +1987,7 @@ static int cf_check hvmemul_rep_stos(
 container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
 struct vcpu *curr = current;
 struct hvm_vcpu_io *hvio = >arch.hvm.hvm_io;
-unsigned long addr, bytes;
+unsigned long addr;
 paddr_t gpa;
 p2m_type_t p2mt;
 bool_t df = !!(ctxt->regs->eflags & X86_EFLAGS_DF);
@@ -1998,7 +1997,6 @@ static int cf_check hvmemul_rep_stos(
 if ( rc != X86EMUL_OKAY )
 return rc;
 
-bytes = PAGE_SIZE - (addr & ~PAGE_MASK);
 if ( hvio->mmio_access.write_access &&
  (hvio->mmio_gla == (addr & PAGE_MASK)) &&
  /* See respective comment in MOVS processing. */

Re: [XEN PATCH 2/4] x86/emulate: move a variable declaration to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Nicola Vetrini





On 27/07/23 17:06, Jan Beulich wrote:

On 27.07.2023 12:48, Nicola Vetrini wrote:

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2024,15 +2024,15 @@ static int cf_check hvmemul_rep_stos(
  
  switch ( p2mt )

  {
-unsigned long bytes;
  char *buf;
  
  default:

  /* Allocate temporary buffer. */
  for ( ; ; )
  {
-bytes = *reps * bytes_per_rep;
-buf = xmalloc_bytes(bytes);
+unsigned long bytes_tmp;
+bytes_tmp = *reps * bytes_per_rep;
+buf = xmalloc_bytes(bytes_tmp);
  if ( buf || *reps <= 1 )
  break;
  *reps >>= 1;


This wants dealing with differently - the outer scope variable is unused
(only written to) afaics. Eliminating it will, aiui, address another
violation at the same time. And then the same in hvmemul_rep_movs(), just
that there the variable itself needs to survive. I guess I'll make a
patch ...

Jan


Wouldn't this code at line ~2068 be possibly affected by writing to 
bytes, if the outer variable is used?


/* Adjust address for reverse store. */
if ( df )
  gpa -= bytes - bytes_per_rep;

rc = hvm_copy_to_guest_phys(gpa, buf, bytes, curr);

You're right about the other violation (R2.1)

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)

Re: [XEN PATCH 1/4] x86: mechanically rename to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Nicola Vetrini





On 27/07/23 17:00, Andrew Cooper wrote:

On 27/07/2023 3:50 pm, Jan Beulich wrote:

On 27.07.2023 12:47, Nicola Vetrini wrote:

Rule 5.3 has the following headline:
"An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope"

The renames done by this patch avoid shadowing from happening.
They are as follows:
- s/str/s/ in 'lapic_disable'
- s/str/level/ in '(apic|mce)_set_verbosity'
- s/str/state_str/ in 'mwait_idle_probe'
- s/str/memmap_name/ in 'init_e820'

I'm sorry to say that, but I'm not willing to go and figure out where
that "str" is that there's supposedly a collision with. Please can you
state such right here, ...


- s/i/j/ in 'mce_action' (the shadowing here is due to macro
   'x86_mcinfo_lookup' that defines 'i' as a loop counter)

... much like you do in this case?


In fairness to Nicola, that was given.


Signed-off-by: Nicola Vetrini 
---
Function 'str' in 'xen/arch/x86/include/asm/desc.h'
causes the shadowing.


which is the wrapper for the STR instruction.

It's used in a single assertion, and I'd be happy getting rid of it
entirely.  Alternatively, it could be renamed to read_tr() (or
read_tr_sel() ?) if we want to keep the assertion.

We're not renaming every other use of 'str' to mean string just for this...

~Andrew


Seems reasonable to remove it, though there aren't that many instances 
of shadowing on 'str'.


--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread George Dunlap

On Thu, Jul 27, 2023 at 3:42 PM Jan Beulich  wrote:

> On 27.07.2023 15:26, Daniel P. Smith wrote:
> > Let's bring this back to the actual implementation instead of the
> > theoretical. Your position is that Xen's paddr_t is desired because it
> > can store larger values than that of size_t. Now if you look in Xen
> > proper (main 64bit code on x86), paddr_t is a typedef for a 64bit
> > unsigned integer. And if you look at size_t, it is also a typedef to a
> > 64bit unsigned integer, they are literally a couple of lines apart in
> > types.h. Thus they are the same size and can only represent the same
> > maximum size.
>
> What about 32-bit Arm, or any other 32-bit architecture that we might
> see Xen ported to, with wider than 32-bit physical address space?
>

To be more concrete here:

Suppose that you had a machine with 32-bit virtual address spaces (i.e.,
going up to 4GiB), and 36-bit physical address spaces (i.e., going up to
64GiB).  And suppose you had a system with 16GiB of physical ram.  And you
wanted to use Hyperlaunch to create VMs using some sort of memory image
that was 5GiB (presumably of some kind of static data, not, say, a kernel
or initramfs).  You wouldn't be able to do it if the "size" parameter of
the boot modules was limited to 4GiB (without some kind of hack where you
string multiple boot modules together).

I guess part of the question is whether we think that's an important use
case; on the whole if you're populating 5GiB of RAM, it seems like it would
be better to have the VM load it itself from disk.

I do see the logic behind wanting to avoid "paddr_t" for a size; I'm sure
Jan that you would nack any patch that used "size_t" as a memory address
(instead of, say, uintptr_t).  In that case, "psize_t" is the obvious
solution.

 -George

Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-27 Thread Volodymyr Babchuk


Jan,

Jan Beulich  writes:

> On 27.07.2023 12:31, Volodymyr Babchuk wrote:
>> 
>> Hi Jan
>> 
>> Jan Beulich  writes:
>> 
>>> On 27.07.2023 02:56, Volodymyr Babchuk wrote:
 Hi Roger,

 Roger Pau Monné  writes:

> On Wed, Jul 26, 2023 at 01:17:58AM +, Volodymyr Babchuk wrote:
>>
>> Hi Roger,
>>
>> Roger Pau Monné  writes:
>>
>>> On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
 From: Oleksandr Andrushchenko 
 @@ -498,6 +537,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
 unsigned int size,
  ASSERT(data_offset < size);
  }
  spin_unlock(>vpci->lock);
 +unlock_locks(d);
>>>
>>> There's one issue here, some handlers will cal pcidevs_lock(), which
>>> will result in a lock over inversion, as in the previous patch we
>>> agreed that the locking order was pcidevs_lock first, d->pci_lock
>>> after.
>>>
>>> For example the MSI control_write() handler will call
>>> vpci_msi_arch_enable() which takes the pcidevs lock.  I think I will
>>> have to look into using a dedicated lock for MSI related handling, as
>>> that's the only place where I think we have this pattern of taking the
>>> pcidevs_lock after the d->pci_lock.
>>
>> I'll mention this in the commit message. Is there something else that I
>> should do right now?
>
> Well, I don't think we want to commit this as-is with a known lock
> inversion.
>
> The functions that require the pcidevs lock are:
>
> pt_irq_{create,destroy}_bind()
> unmap_domain_pirq()
>
> AFAICT those functions require the lock in order to assert that the
> underlying device doesn't go away, as they do also use d->event_lock
> in order to get exclusive access to the data fields.  Please double
> check that I'm not mistaken.

 You are right, all three function does not access any of PCI state
 directly. However...

> If that's accurate you will have to check the call tree that spawns
> from those functions in order to modify the asserts to check for
> either the pcidevs or the per-domain pci_list lock being taken.

 ... I checked calls for PT_IRQ_TYPE_MSI case, there is only call that
 bothers me: hvm_pi_update_irte(), which calls IO-MMU code via
 vmx_pi_update_irte():

 amd_iommu_msi_msg_update_ire() or msi_msg_write_remap_rte().

 Both functions read basic pdev fields like sbfd or type. I see no
 problem there, as values of those fields are not supposed to be changed.
>>>
>>> But whether fields are basic or will never change doesn't matter when
>>> the pdev struct itself suddenly disappears.
>> 
>> This is not a problem, as it is expected that d->pci_lock is being held,
>> so pdev structure will not disappear. I am trying to answer another
>> question: is d->pci_lock enough or pcidevs_lock is also should required?
>
> To answer such questions, may I ask that you first firmly write down
> (and submit) what each of the locks guards?

I can do this for a newly introduced lock. So domain->pci_lock guards:

1. domain->pcidevs_list. This means that PCI devices can't be added to
or removed from a domain, when the lock is taken in read mode. As a
byproduct, any pdev assigned to a domain can't be deleted because we
need to deassign it first. To modify domain->pcidevs_list we need to
hold both d->pci_lock in write mode and pcidevs_lock.

2. Presence of pdev->vpci struct for any pdev assigned to a domain. The
structure itself is locked by pdev->vpci->lock. But to add/remove
pdev->vpci itself we need to hold d->pci_lock in the write mode.

As for pcidevs_lock, AFAIK, there is no strictly written rules, what is
exactly is protected by this lock.

-- 
WBR, Volodymyr

Re: [PATCH v7 08/15] xenpm: Change get-cpufreq-para output for hwp

2023-07-27 Thread Anthony PERARD

On Thu, Jul 27, 2023 at 12:00:54PM +0100, Anthony PERARD wrote:
> On Wed, Jul 26, 2023 at 01:09:38PM -0400, Jason Andryuk wrote:
> > diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
> > index 1c474c3b59..21c93386de 100644
> > --- a/tools/misc/xenpm.c
> > +++ b/tools/misc/xenpm.c
> > @@ -711,6 +711,7 @@ void start_gather_func(int argc, char *argv[])
> >  /* print out parameters about cpu frequency */
> >  static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para 
> > *p_cpufreq)
> >  {
> > +bool hwp = strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) == 0;
> >  int i;
> >  
> >  printf("cpu id   : %d\n", cpuid);
> > @@ -720,49 +721,57 @@ static void print_cpufreq_para(int cpuid, struct 
> > xc_get_cpufreq_para *p_cpufreq)
> >  printf("scaling_driver   : %s\n", p_cpufreq->scaling_driver);
> >  
> > +if ( !hwp )
> 
> This test kind of feels wrong. Should we test instead the thing we want
> to print? Maybe declaring another bool, something like "bool
> scaling_governor = !hwp" just below the declaration of "bool hwp"?
> Otherwise, if there's another technology that comes along that isn't
> "hwp" and not something that can be printed with this, there's going to
> be some kind of hidden bug (even if probably easy to spot during
> development).

We agreed in patch 11 that this test would be temporary, and changed in
patch 11, so:

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v7 11/15] xenpm: Print HWP/CPPC parameters

2023-07-27 Thread Anthony PERARD

On Thu, Jul 27, 2023 at 09:54:17AM -0400, Jason Andryuk wrote:
> On Thu, Jul 27, 2023 at 7:32 AM Anthony PERARD
>  wrote:
> >
> > On Wed, Jul 26, 2023 at 01:09:41PM -0400, Jason Andryuk wrote:
> > > @@ -772,6 +812,32 @@ static void print_cpufreq_para(int cpuid, struct 
> > > xc_get_cpufreq_para *p_cpufreq)
> > > p_cpufreq->u.s.scaling_min_freq,
> > > p_cpufreq->u.s.scaling_cur_freq);
> > >  }
> > > +else
> > > +{
> >
> > I feel like this could be confusing. In this function, we have both:
> > if ( hwp ) { this; } else { that; }
> > and
> > if ( !hwp ) { that; } else { this; }
> >
> > If we could have the condition in the same order, or use the same
> > condition for both "true" blocks, that would be nice.
> 
> Makes sense.  From your comment on patch 8, I was thinking of
> switching to a bool scaling_governor and using that.  But now I think
> hwp is better since it's the specific governor - not the default one.
> In light of that, I would just re-organize everything to be "if ( hwp
> )".
> 
> With that, patch 8 is more of a transitive step, and I would leave the
> "if ( !hwp )".  Then here in patch 11 the HWP code would be added
> first inside "if ( hwp )".  Does that sound good?

Yes, that sounds fine.

> > > +const xc_cppc_para_t *cppc = _cpufreq->u.cppc_para;
> > > +
> > > +printf("cppc variables   :\n");
> > > +printf("  hardware limits: lowest [%u] lowest nonlinear 
> > > [%u]\n",
> > > +   cppc->lowest, cppc->lowest_nonlinear);
> >
> > All these %u should be %"PRIu32", right? Even if the rest of the
> > function is bogus... and even if it's probably be a while before %PRIu32
> > is different from %u.
> 
> Yes, you are correct.  In patch 8 "xenpm: Change get-cpufreq-para
> output for hwp", some existing %u-s are moved and a few more added.
> Do you want all of those converted to %PRIu32?

For the newly added %u, yes, that would be nice.

As for the existing one, maybe as a separated patch, if you wish. At the
moment, patch 8 is easy to review with "--ignore-space-change".

Cheers,

-- 
Anthony PERARD

Re: [XEN PATCH 2/4] x86/emulate: move a variable declaration to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Jan Beulich

On 27.07.2023 12:48, Nicola Vetrini wrote:
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2024,15 +2024,15 @@ static int cf_check hvmemul_rep_stos(
>  
>  switch ( p2mt )
>  {
> -unsigned long bytes;
>  char *buf;
>  
>  default:
>  /* Allocate temporary buffer. */
>  for ( ; ; )
>  {
> -bytes = *reps * bytes_per_rep;
> -buf = xmalloc_bytes(bytes);
> +unsigned long bytes_tmp;
> +bytes_tmp = *reps * bytes_per_rep;
> +buf = xmalloc_bytes(bytes_tmp);
>  if ( buf || *reps <= 1 )
>  break;
>  *reps >>= 1;

This wants dealing with differently - the outer scope variable is unused
(only written to) afaics. Eliminating it will, aiui, address another
violation at the same time. And then the same in hvmemul_rep_movs(), just
that there the variable itself needs to survive. I guess I'll make a
patch ...

Jan

Re: [PATCH v7 04/15] xen/sysctl: Nest cpufreq scaling options

2023-07-27 Thread Jason Andryuk

On Thu, Jul 27, 2023 at 6:27 AM Anthony PERARD
 wrote:
>
> On Wed, Jul 26, 2023 at 01:09:34PM -0400, Jason Andryuk wrote:
> > Add a union and struct so that most of the scaling variables of struct
> > xen_get_cpufreq_para are within in a binary-compatible layout.  This
> > allows cppc_para to live in the larger union and use uint32_ts - struct
> > xen_cppc_para will be 10 uint32_t's.
> >
> > The new scaling struct is 3 * uint32_t + 16 bytes CPUFREQ_NAME_LEN + 4 *
> > uint32_t for xen_ondemand = 11 uint32_t.  That means the old size is
> > retained, int32_t turbo_enabled doesn't move and it's binary compatible.
> >
> > The out-of-context memcpy() in xc_get_cpufreq_para() now handles the
> > copying of the fields removed there.
> >
> > Signed-off-by: Jason Andryuk 
> > Reviewed-by: Jan Beulich 
> > ---
> > NOTE: Jan would like a toolside review / ack because:
> > Nevertheless I continue to be uncertain about all of this: Parts of
> > the struct can apparently go out of sync with the sysctl struct, but
> > other parts have to remain in sync without there being an
> > appropriate build-time check (checking merely sizes clearly isn't
> > enough). Therefore I'd really like to have a toolstack side review /
> > ack here as well.
>
> I wish we could just use "struct xen_get_cpufreq_para" instead of
> having to make a copy to replace the XEN_GUEST_HANDLE_*() macro.
>
> Next I guess we could try to have something like the compat layer in xen
> is doing, with plenty of CHECK_FIELD_ and other CHECK_* macro, but that
> would be a lot of work. (xen/include/xen/compat.h and how it's used in
> xen/include/compat/xlat.h)

I can add a set of BUILD_BUG_ON()s checking the offsets of the two
structs' members.  I think that would work and we don't need the
complication of the compat code.  The build of the library just deals
with a single bit-width and needs to be consistent.  The hypervisor
needs to deal with receiving differing pointer sizes and layouts, but
the userspace library just uses whatever it was compiled for.  The
preprocessor expands XEN_GUEST_HANDLE_64(uint32) -> "typedef struct {
uint32_t *p; } __guest_handle_uint32", so it's just going to be a
single pointer in size, which will match between the two.

Does that sound right, or am I missing something?

> Unless you feel like adding more build check, I guess the patch is good
> enough like that:
> Acked-by: Anthony PERARD 

If I am incorrect about the above... I'll just leave it as-is.

Thanks,
Jason

[XEN PATCH] xen: change names and type qualifiers in copy_domain_page() declaration

2023-07-27 Thread Federico Serafini

Change names and type qualifiers of parameters in copy_domain_page()
declaration to keep consistency with the corresponding definition.
This addresses violations of MISRA C:2012 Rule 8.3: "All declarations
of an object or function shall use the same names and type qualifiers".

No functional changes.

Signed-off-by: Federico Serafini 
---
 xen/include/xen/domain_page.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/domain_page.h b/xen/include/xen/domain_page.h
index 149b217b96..c4007eac09 100644
--- a/xen/include/xen/domain_page.h
+++ b/xen/include/xen/domain_page.h
@@ -15,7 +15,7 @@
  * Clear a given page frame, or copy between two of them.
  */
 void clear_domain_page(mfn_t mfn);
-void copy_domain_page(mfn_t dst, const mfn_t src);
+void copy_domain_page(mfn_t dest, mfn_t source);
 
 #ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
 
-- 
2.34.1

Re: [XEN PATCH 1/4] x86: mechanically rename to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Andrew Cooper

On 27/07/2023 3:50 pm, Jan Beulich wrote:
> On 27.07.2023 12:47, Nicola Vetrini wrote:
>> Rule 5.3 has the following headline:
>> "An identifier declared in an inner scope shall not hide an
>> identifier declared in an outer scope"
>>
>> The renames done by this patch avoid shadowing from happening.
>> They are as follows:
>> - s/str/s/ in 'lapic_disable'
>> - s/str/level/ in '(apic|mce)_set_verbosity'
>> - s/str/state_str/ in 'mwait_idle_probe'
>> - s/str/memmap_name/ in 'init_e820'
> I'm sorry to say that, but I'm not willing to go and figure out where
> that "str" is that there's supposedly a collision with. Please can you
> state such right here, ...
>
>> - s/i/j/ in 'mce_action' (the shadowing here is due to macro
>>   'x86_mcinfo_lookup' that defines 'i' as a loop counter)
> ... much like you do in this case?

In fairness to Nicola, that was given.

> Signed-off-by: Nicola Vetrini 
> ---
> Function 'str' in 'xen/arch/x86/include/asm/desc.h'
> causes the shadowing.

which is the wrapper for the STR instruction.

It's used in a single assertion, and I'd be happy getting rid of it
entirely.  Alternatively, it could be renamed to read_tr() (or
read_tr_sel() ?) if we want to keep the assertion.

We're not renaming every other use of 'str' to mean string just for this...

~Andrew

[XEN PATCH v3 2/2] Config.mk: evaluate XEN_COMPILE_ARCH and XEN_OS immediately

2023-07-27 Thread Anthony PERARD

With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

So to avoid having these command been run more than necessary, we
will replace ?= by an equivalent but with immediate expansion.

Reported-by: Jason Andryuk 
Signed-off-by: Anthony PERARD 
---

Notes:
v3:
- replace evaluation on first use construct by immediate expansion which
  isn't likely to break and is clearer.

 Config.mk | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/Config.mk b/Config.mk
index c529b1ba19..dc3afaa47e 100644
--- a/Config.mk
+++ b/Config.mk
@@ -19,13 +19,17 @@ or   = $(if $(strip $(1)),$(1),$(if $(strip 
$(2)),$(2),$(if $(strip $(3)),$(
 
 -include $(XEN_ROOT)/.config
 
-XEN_COMPILE_ARCH?= $(shell uname -m | sed -e s/i.86/x86_32/ \
+ifeq ($(origin XEN_COMPILE_ARCH), undefined)
+XEN_COMPILE_ARCH:= $(shell uname -m | sed -e s/i.86/x86_32/ \
  -e s/i86pc/x86_32/ -e s/amd64/x86_64/ \
  -e s/armv7.*/arm32/ -e s/armv8.*/arm64/ \
  -e s/aarch64/arm64/)
+endif
 
 XEN_TARGET_ARCH ?= $(XEN_COMPILE_ARCH)
-XEN_OS  ?= $(shell uname -s)
+ifeq ($(origin XEN_OS), undefined)
+XEN_OS  := $(shell uname -s)
+endif
 
 CONFIG_$(XEN_OS) := y
 
-- 
Anthony PERARD

[XEN PATCH v3 1/2] build: evaluate XEN_BUILD_* and XEN_DOMAIN immediately

2023-07-27 Thread Anthony PERARD

With GNU make 4.4, the number of execution of the command present in
these $(shell ) increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

Also, `make -d` shows a lot of these:
Makefile:15: not recursively expanding XEN_BUILD_DATE to export to shell 
function
Makefile:16: not recursively expanding XEN_BUILD_TIME to export to shell 
function
Makefile:17: not recursively expanding XEN_BUILD_HOST to export to shell 
function
Makefile:14: not recursively expanding XEN_DOMAIN to export to shell 
function

So to avoid having these command been run more than necessary, we
will replace ?= by an equivalent but with immediate expansion.

Reported-by: Jason Andryuk 
Signed-off-by: Anthony PERARD 
---

Notes:
v3:
- replace evaluation on first use construct by immediate expansion which
  isn't likely to break and is clearer.

 xen/Makefile | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xen/Makefile b/xen/Makefile
index e8aa663781..c1738dbbde 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -11,10 +11,18 @@ export XEN_FULLVERSION   = 
$(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
 -include xen-version
 
 export XEN_WHOAMI  ?= $(USER)
-export XEN_DOMAIN  ?= $(shell ([ -x /bin/dnsdomainname ] && 
/bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo 
[unknown]))
-export XEN_BUILD_DATE  ?= $(shell LC_ALL=C date)
-export XEN_BUILD_TIME  ?= $(shell LC_ALL=C date +%T)
-export XEN_BUILD_HOST  ?= $(shell hostname)
+ifeq ($(origin XEN_DOMAIN), undefined)
+export XEN_DOMAIN  := $(shell ([ -x /bin/dnsdomainname ] && 
/bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo 
[unknown]))
+endif
+ifeq ($(origin XEN_BUILD_DATE), undefined)
+export XEN_BUILD_DATE  := $(shell LC_ALL=C date)
+endif
+ifeq ($(origin XEN_BUILD_TIME), undefined)
+export XEN_BUILD_TIME  := $(shell LC_ALL=C date +%T)
+endif
+ifeq ($(origin XEN_BUILD_HOST), undefined)
+export XEN_BUILD_HOST  := $(shell hostname)
+endif
 
 # Best effort attempt to find a python interpreter, defaulting to Python 3 if
 # available.  Fall back to just `python` if `which` is nowhere to be found.
-- 
Anthony PERARD

[XEN PATCH v3 0/2] build: reduce number of $(shell) execution on make 4.4

2023-07-27 Thread Anthony PERARD

Patch series available in this git branch:
https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git 
br.build-exported-shell-command-value-v3

v3:
- replace evaluation on first use construct by immediate expansion.

v2:
- new patches removing TARGET_SUBARCH and TARGET_ARCH.
- style change in first patch

With GNU make 4.4, the number of execution of the command present in $(shell )
in exported variables increased greatly. This is probably because as of make
4.4, exported variable are also added to the environment of $(shell )
construct.

>From the annoncement:

https://lists.gnu.org/archive/html/info-gnu/2022-10/msg8.html
> * WARNING: Backward-incompatibility!
>   Previously makefile variables marked as export were not exported to 
commands
>   started by the $(shell ...) function.  Now, all exported variables are
>   exported to $(shell ...).  If this leads to recursion during expansion, 
then
>   for backward-compatibility the value from the original environment is 
used.
>   To detect this change search for 'shell-export' in the .FEATURES 
variable.

Also, there's a new paragraph in the GNU make manual, but it's a warning about
exporting all variable, still it might be relevant to the new observed
behavior:

https://www.gnu.org/software/make/manual/make.html#Variables_002fRecursion
> Adding a variable’s value to the environment requires it to be expanded. 
If
> expanding a variable has side-effects (such as the info or eval or similar
> functions) then these side-effects will be seen every time a command is
> invoked.

The issue was reported on IRC by jandryuk.

So, I've locate a few obvious candidate to fix, maybe there's more $(shell) to
change?

Anthony PERARD (2):
  build: evaluate XEN_BUILD_* and XEN_DOMAIN immediately
  Config.mk: evaluate XEN_COMPILE_ARCH and XEN_OS immediately

 Config.mk|  8 ++--
 xen/Makefile | 16 
 2 files changed, 18 insertions(+), 6 deletions(-)

-- 
Anthony PERARD

Re: [XEN PATCH 1/4] x86: mechanically rename to address MISRA C:2012 Rule 5.3

2023-07-27 Thread Jan Beulich

On 27.07.2023 12:47, Nicola Vetrini wrote:
> Rule 5.3 has the following headline:
> "An identifier declared in an inner scope shall not hide an
> identifier declared in an outer scope"
> 
> The renames done by this patch avoid shadowing from happening.
> They are as follows:
> - s/str/s/ in 'lapic_disable'
> - s/str/level/ in '(apic|mce)_set_verbosity'
> - s/str/state_str/ in 'mwait_idle_probe'
> - s/str/memmap_name/ in 'init_e820'

I'm sorry to say that, but I'm not willing to go and figure out where
that "str" is that there's supposedly a collision with. Please can you
state such right here, ...

> - s/i/j/ in 'mce_action' (the shadowing here is due to macro
>   'x86_mcinfo_lookup' that defines 'i' as a loop counter)

... much like you do in this case?

> - s/desc/descriptor/ in '_hvm_load_entry'
> - s/socket_info/sock_info/ in 'do_write_psr_msrs'

(I didn't look at any of these in any detail, partly because again
I hope for additional context before doing so.)

> - s/debug_stack_lines/dbg_stack_lines/ in 'compat_show_guest_stack'

This wants doing differently: The two functions originally lived in
different source files, so passing the static variable as argument
was preferred over making the variable non-static. When the function
was moved, that aspect was overlooked. The function argument simply
wants dropping.

Jan

Re: [PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-27 Thread Oleksandr Tyshchenko

On 27.07.23 17:33, Jan Beulich wrote:

Hello Jan

> On 26.07.2023 17:13, Oleksandr Tyshchenko wrote:
>> On 26.07.23 17:50, Jan Beulich wrote:
>>> On 26.07.2023 16:14, Oleksandr Tyshchenko wrote:
 From: Oleksandr Tyshchenko 

 Without it being present it won't be possible to use some
 libxl__device_type's callbacks for virtio devices as the common code
 can only invoke these callbacks (by dereferencing a pointer) for valid
 libxl__device_type's elements when iterating over device_type_tbl[].

 Signed-off-by: Oleksandr Tyshchenko 
 ---
tools/libs/light/libxl_create.c | 1 +
1 file changed, 1 insertion(+)

 diff --git a/tools/libs/light/libxl_create.c 
 b/tools/libs/light/libxl_create.c
 index 393c535579..c91059d713 100644
 --- a/tools/libs/light/libxl_create.c
 +++ b/tools/libs/light/libxl_create.c
 @@ -1887,6 +1887,7 @@ const libxl__device_type *device_type_tbl[] = {
__dtdev_devtype,
__vdispl_devtype,
__vsnd_devtype,
 +__virtio_devtype,
NULL
};
>>>
>>>   From description and nature of the change this looks like a Fixes:
>>> tag would be warranted.
>>
>> Looks like, yes. Thanks.
>>
>> I guess, this should point to the commit that introduced
>> libxl__virtio_devtype
>>
>> Fixes: 43ba5202e2ee ('libxl: add support for generic virtio device')
> 
> In light of Anthony's feedback I'm now thinking that no Fixes: tag
> should be here, as is being clarified by the addition to the
> description 

I was about to send V2 with the addition + Fixes tag and noticed your reply.

Basically, I agree to not append Fixes tag, there is nothing broken 
within current code base regarding that, an addition clarifies the state 
and describes what/how may be broken.

I should have mentioned that from the very beginning.

(which I guess can be folded in while committing).

It would be really good.

> 
> Jan

Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-27 Thread Roger Pau Monné

On Thu, Jul 27, 2023 at 02:56:18PM +0200, Jan Beulich wrote:
> On 27.07.2023 14:42, Roger Pau Monné wrote:
> > There are also existing callers of iommu_update_ire_from_msi() that
> > call the functions without the pcidevs locked.  See
> > hpet_msi_set_affinity() for example.
> 
> Ftaod first and foremost because there's no pdev in that case.

Likewise for (mostly?) the rest of the callers, as callers of the
.set_affinity hw_irq_controller hook don't have a PCI device at
hand.

Thanks, Roger.

[ovmf test] 182037: all pass - PUSHED

2023-07-27 Thread osstest service owner

flight 182037 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182037/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf c7a7f09c1dfa8896ccf94be2d8c7c8b420257fec
baseline version:
 ovmf 25a6745fe886e88fe175a50dcab4562c65b7cea3

Last test of basis   182018  2023-07-26 01:10:45 Z1 days
Testing same since   182037  2023-07-27 11:40:46 Z0 days1 attempts


People who touched revisions under test:
  Yuanhao Xie 
  YuanhaoXie 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   25a6745fe8..c7a7f09c1d  c7a7f09c1dfa8896ccf94be2d8c7c8b420257fec -> 
xen-tested-master

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Jan Beulich

On 27.07.2023 15:26, Daniel P. Smith wrote:
> Let's bring this back to the actual implementation instead of the 
> theoretical. Your position is that Xen's paddr_t is desired because it 
> can store larger values than that of size_t. Now if you look in Xen 
> proper (main 64bit code on x86), paddr_t is a typedef for a 64bit 
> unsigned integer. And if you look at size_t, it is also a typedef to a 
> 64bit unsigned integer, they are literally a couple of lines apart in 
> types.h. Thus they are the same size and can only represent the same 
> maximum size.

What about 32-bit Arm, or any other 32-bit architecture that we might
see Xen ported to, with wider than 32-bit physical address space?

> The only area of issue for x86 is during the short bit of 
> code that runs in 32bit mode during startup. In this series, we address 
> this by using a set of macros in the 32bit code to provide 64bit clean 
> definition of the structures. This approach is acceptable because as far 
> as I am aware, x86 is the only platform where the hypervisor has to 
> transition from one bit size to another, e.g. Arm just starts in 64bit 
> mode when on a 64bit device.
> 
> At the end of the day, size_t is the same size as paddr_t for the end 
> execution environments and I would levy a guess that should x86 suddenly 
> find itself having a 128bit mode which would likely drive paddr_t to 
> 128bits, so would follow size_t.

Likely. Yet when we still had ix86 (x86-32) support, paddr_t also
wasn't the same as size_t. Even on x86-64 it's possible we'd see
physical address width go beyond 64 bits before virtual address width
would, just like had happened for ix86 (which initially only allowed
for 32-bit physical addresses, until PSE36 arrived).

In your implementation you want to cover the general case, not a subset
of special ones.

Jan

Re: [PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-27 Thread Oleksandr Tyshchenko



On 27.07.23 16:45, Anthony PERARD wrote:


Hello Anthony

> On Thu, Jul 27, 2023 at 10:38:03AM +, Oleksandr Tyshchenko wrote:
>>
>>
>> On 27.07.23 12:50, Anthony PERARD wrote:
>>
>> Hello Anthony
>>
>>> On Wed, Jul 26, 2023 at 05:14:59PM +0300, Oleksandr Tyshchenko wrote:
 From: Oleksandr Tyshchenko 

 Without it being present it won't be possible to use some
 libxl__device_type's callbacks for virtio devices as the common code
 can only invoke these callbacks (by dereferencing a pointer) for valid
 libxl__device_type's elements when iterating over device_type_tbl[].
>>>
>>> Did you notice an issue with it been missing from device_type_tbl[] ?
>>> Because to me it looks like all the functions that are using
>>> device_type_tbl will just skip over virtio devtype.
>>>
>>> domcreate_attach_devices:
>>>   skip virtio because ".skip_attach = 1"
>>>
>>> libxl__need_xenpv_qemu:
>>>   skip virtio because "dm_needed" is NULL
>>>
>>> retrieve_domain_configuration_end:
>>>   skip because "compare" is "libxl_device_virtio_compare" which is NULL
>>>
>>> libxl__update_domain_configuration:
>>>   skip because "update_config" is NULL.
>>>
>>> So, I think the patch is fine, adding virtio to the device_type_tbl
>>> array is good for completeness, but the patch description may be
>>> misleading.
>>>
>>> Did I miss something?
>>
>> No, you didn't.
>>
>> Just to be clear, there is no issue within *current* the code base, I am
>> experimenting with using device-model bits, so I implemented
>> libxl__device_virtio_dm_needed() locally and noticed that it didn't get
>> called at all, the reason was in absence of libxl__virtio_devtype in the
>> said array.
>>
>> Do you agree with the following addition to the commit description?
>>
>> "Please note, there is no issue within current the code base as virtio
>> devices don't use callbacks that depend on libxl__virtio_devtype
>> presence in device_type_tbl[]. The issue will appear as soon as we start
>> using these callbacks (for example, dm_needed)."
> 
> Yes, that would be fine. With that addition:
> Acked-by: Anthony PERARD 


Thanks for the clarification and A-b.

> 
> Thanks,
>

Re: [PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-27 Thread Jan Beulich

On 26.07.2023 17:13, Oleksandr Tyshchenko wrote:
> On 26.07.23 17:50, Jan Beulich wrote:
>> On 26.07.2023 16:14, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko 
>>>
>>> Without it being present it won't be possible to use some
>>> libxl__device_type's callbacks for virtio devices as the common code
>>> can only invoke these callbacks (by dereferencing a pointer) for valid
>>> libxl__device_type's elements when iterating over device_type_tbl[].
>>>
>>> Signed-off-by: Oleksandr Tyshchenko 
>>> ---
>>>   tools/libs/light/libxl_create.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/tools/libs/light/libxl_create.c 
>>> b/tools/libs/light/libxl_create.c
>>> index 393c535579..c91059d713 100644
>>> --- a/tools/libs/light/libxl_create.c
>>> +++ b/tools/libs/light/libxl_create.c
>>> @@ -1887,6 +1887,7 @@ const libxl__device_type *device_type_tbl[] = {
>>>   __dtdev_devtype,
>>>   __vdispl_devtype,
>>>   __vsnd_devtype,
>>> +__virtio_devtype,
>>>   NULL
>>>   };
>>
>>  From description and nature of the change this looks like a Fixes:
>> tag would be warranted.
> 
> Looks like, yes. Thanks.
> 
> I guess, this should point to the commit that introduced 
> libxl__virtio_devtype
> 
> Fixes: 43ba5202e2ee ('libxl: add support for generic virtio device')

In light of Anthony's feedback I'm now thinking that no Fixes: tag
should be here, as is being clarified by the addition to the
description (which I guess can be folded in while committing).

Jan

Re: [PATCH v3 4/4] x86/iommu: pass full IO-APIC RTE for remapping table update

2023-07-27 Thread Jan Beulich

On 27.07.2023 16:19, Roger Pau Monné wrote:
> On Thu, Jul 27, 2023 at 02:39:06PM +0200, Jan Beulich wrote:
>> On 26.07.2023 14:55, Roger Pau Monne wrote:
>>> @@ -439,36 +430,45 @@ unsigned int cf_check io_apic_read_remap_rte(
>>>  }
>>>  
>>>  void cf_check io_apic_write_remap_rte(
>>> -unsigned int apic, unsigned int reg, unsigned int value)
>>> +unsigned int apic, unsigned int pin, uint64_t rte)
>>>  {
>>> -unsigned int pin = (reg - 0x10) / 2;
>>> +struct IO_xAPIC_route_entry new_rte = { .raw = rte };
>>>  struct IO_xAPIC_route_entry old_rte = { };
>>> -struct IO_APIC_route_remap_entry *remap_rte;
>>> -unsigned int rte_upper = (reg & 1) ? 1 : 0;
>>>  struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
>>> -int saved_mask;
>>> -
>>> -old_rte = __ioapic_read_entry(apic, pin, true);
>>> +bool masked = true;
>>> +int rc;
>>>  
>>> -remap_rte = (struct IO_APIC_route_remap_entry *) _rte;
>>> -
>>> -/* mask the interrupt while we change the intremap table */
>>> -saved_mask = remap_rte->mask;
>>> -remap_rte->mask = 1;
>>> -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
>>> -remap_rte->mask = saved_mask;
>>> -
>>> -if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
>>> -   _rte, rte_upper, value) )
>>> +if ( !cpu_has_cx16 )
>>>  {
>>> -__io_apic_write(apic, reg, value);
>>> +   /*
>>> +* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
>>> +* avoid interrupts seeing an inconsistent IRTE entry.
>>> +*/
>>> +old_rte = __ioapic_read_entry(apic, pin, true);
>>> +if ( !old_rte.mask )
>>> +{
>>> +masked = false;
>>> +old_rte.mask = 1;
>>> +__ioapic_write_entry(apic, pin, true, old_rte);
>>> +}
>>> +}
>>>  
>>> -/* Recover the original value of 'mask' bit */
>>> -if ( rte_upper )
>>> -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
>>> +rc = ioapic_rte_to_remap_entry(iommu, apic, pin, _rte, new_rte);
>>> +if ( rc )
>>> +{
>>> +if ( !masked )
>>> +{
>>> +/* Recover the original value of 'mask' bit */
>>> +old_rte.mask = 0;
>>> +__ioapic_write_entry(apic, pin, true, old_rte);
>>> +}
>>> +dprintk(XENLOG_ERR VTDPREFIX,
>>> +"failed to update IRTE for IO-APIC#%u pin %u: %d\n",
>>> +apic, pin, rc);
>>
>> Afaics you don't alter the error behavior of ioapic_rte_to_remap_entry(),
>> and in the sole (pre-existing) case of an error a debug log message is
>> already being issued. Why the addition?
> 
> I think I was trying to mimic the behavior of
> amd_iommu_ioapic_update_ire(), which does print the errors as opposed
> to doing so in update_intremap_entry_from_ioapic().
> 
> I've now removed it, and adjusted the code to match the rest of your
> comments.  Will post v4 of 4/4 only as a reply to v3, I don't think
> there's a need to resend the rest unless you prefer it that way.

Just this patch is going to be fine (maybe even as v3.1).

Jan

[ANNOUNCE] Call for agenda items for 3 August Community Call @ 1500 UTC

2023-07-27 Thread George Dunlap

Hi all,

The proposed agenda is in
https://cryptpad.fr/pad/#/2/pad/edit/Rk5kDLwcc7CrqFm6O7xq2Jci/ and you can
edit to add items.  Alternatively, you can reply to this mail directly.

Agenda items appreciated a few days before the call: please put your name
besides items if you edit the document.

Note the following administrative conventions for the call:
* Unless, agreed in the previous meeting otherwise, the call is on the 1st
Thursday of each month at 1600 British Time (either GMT or BST)
* I usually send out a meeting reminder a few days before with a
provisional agenda

* To allow time to switch between meetings, we'll plan on starting the
agenda at 16:05 sharp.  Aim to join by 16:03 if possible to allocate time
to sort out technical difficulties 

* If you want to be CC'ed please add or remove yourself from the
sign-up-sheet at
https://cryptpad.fr/pad/#/2/pad/edit/D9vGzihPxxAOe6RFPz0sRCf+/

Best Regards
George


== Dial-in Information ==
## Meeting time
16:00 - 17:00 British time
Further International meeting times:


https://www.timeanddate.com/worldclock/meetingdetails.html?year=2023=8=3=15=0=0=1234=37=224=179


## Dial in details
Web: https://meet.jit.si/XenProjectCommunityCall

Dial-in info and pin can be found here:

https://meet.jit.si/static/dialInInfo.html?room=XenProjectCommunityCall

Re: [PATCH v3 4/4] x86/iommu: pass full IO-APIC RTE for remapping table update

2023-07-27 Thread Roger Pau Monné

On Thu, Jul 27, 2023 at 02:39:06PM +0200, Jan Beulich wrote:
> On 26.07.2023 14:55, Roger Pau Monne wrote:
> > @@ -439,36 +430,45 @@ unsigned int cf_check io_apic_read_remap_rte(
> >  }
> >  
> >  void cf_check io_apic_write_remap_rte(
> > -unsigned int apic, unsigned int reg, unsigned int value)
> > +unsigned int apic, unsigned int pin, uint64_t rte)
> >  {
> > -unsigned int pin = (reg - 0x10) / 2;
> > +struct IO_xAPIC_route_entry new_rte = { .raw = rte };
> >  struct IO_xAPIC_route_entry old_rte = { };
> > -struct IO_APIC_route_remap_entry *remap_rte;
> > -unsigned int rte_upper = (reg & 1) ? 1 : 0;
> >  struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
> > -int saved_mask;
> > -
> > -old_rte = __ioapic_read_entry(apic, pin, true);
> > +bool masked = true;
> > +int rc;
> >  
> > -remap_rte = (struct IO_APIC_route_remap_entry *) _rte;
> > -
> > -/* mask the interrupt while we change the intremap table */
> > -saved_mask = remap_rte->mask;
> > -remap_rte->mask = 1;
> > -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
> > -remap_rte->mask = saved_mask;
> > -
> > -if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
> > -   _rte, rte_upper, value) )
> > +if ( !cpu_has_cx16 )
> >  {
> > -__io_apic_write(apic, reg, value);
> > +   /*
> > +* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
> > +* avoid interrupts seeing an inconsistent IRTE entry.
> > +*/
> > +old_rte = __ioapic_read_entry(apic, pin, true);
> > +if ( !old_rte.mask )
> > +{
> > +masked = false;
> > +old_rte.mask = 1;
> > +__ioapic_write_entry(apic, pin, true, old_rte);
> > +}
> > +}
> >  
> > -/* Recover the original value of 'mask' bit */
> > -if ( rte_upper )
> > -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
> > +rc = ioapic_rte_to_remap_entry(iommu, apic, pin, _rte, new_rte);
> > +if ( rc )
> > +{
> > +if ( !masked )
> > +{
> > +/* Recover the original value of 'mask' bit */
> > +old_rte.mask = 0;
> > +__ioapic_write_entry(apic, pin, true, old_rte);
> > +}
> > +dprintk(XENLOG_ERR VTDPREFIX,
> > +"failed to update IRTE for IO-APIC#%u pin %u: %d\n",
> > +apic, pin, rc);
> 
> Afaics you don't alter the error behavior of ioapic_rte_to_remap_entry(),
> and in the sole (pre-existing) case of an error a debug log message is
> already being issued. Why the addition?

I think I was trying to mimic the behavior of
amd_iommu_ioapic_update_ire(), which does print the errors as opposed
to doing so in update_intremap_entry_from_ioapic().

I've now removed it, and adjusted the code to match the rest of your
comments.  Will post v4 of 4/4 only as a reply to v3, I don't think
there's a need to resend the rest unless you prefer it that way.

Thanks, Roger.

Re: [PATCH v7 11/15] xenpm: Print HWP/CPPC parameters

2023-07-27 Thread Jason Andryuk

On Thu, Jul 27, 2023 at 7:32 AM Anthony PERARD
 wrote:
>
> On Wed, Jul 26, 2023 at 01:09:41PM -0400, Jason Andryuk wrote:
> > @@ -772,6 +812,32 @@ static void print_cpufreq_para(int cpuid, struct 
> > xc_get_cpufreq_para *p_cpufreq)
> > p_cpufreq->u.s.scaling_min_freq,
> > p_cpufreq->u.s.scaling_cur_freq);
> >  }
> > +else
> > +{
>
> I feel like this could be confusing. In this function, we have both:
> if ( hwp ) { this; } else { that; }
> and
> if ( !hwp ) { that; } else { this; }
>
> If we could have the condition in the same order, or use the same
> condition for both "true" blocks, that would be nice.

Makes sense.  From your comment on patch 8, I was thinking of
switching to a bool scaling_governor and using that.  But now I think
hwp is better since it's the specific governor - not the default one.
In light of that, I would just re-organize everything to be "if ( hwp
)".

With that, patch 8 is more of a transitive step, and I would leave the
"if ( !hwp )".  Then here in patch 11 the HWP code would be added
first inside "if ( hwp )".  Does that sound good?

> > +const xc_cppc_para_t *cppc = _cpufreq->u.cppc_para;
> > +
> > +printf("cppc variables   :\n");
> > +printf("  hardware limits: lowest [%u] lowest nonlinear 
> > [%u]\n",
> > +   cppc->lowest, cppc->lowest_nonlinear);
>
> All these %u should be %"PRIu32", right? Even if the rest of the
> function is bogus... and even if it's probably be a while before %PRIu32
> is different from %u.

Yes, you are correct.  In patch 8 "xenpm: Change get-cpufreq-para
output for hwp", some existing %u-s are moved and a few more added.
Do you want all of those converted to %PRIu32?

Thanks,
Jason

Re: [PATCH] libxl: Add missing libxl__virtio_devtype to device_type_tbl array

2023-07-27 Thread Anthony PERARD

On Thu, Jul 27, 2023 at 10:38:03AM +, Oleksandr Tyshchenko wrote:
> 
> 
> On 27.07.23 12:50, Anthony PERARD wrote:
> 
> Hello Anthony
> 
> > On Wed, Jul 26, 2023 at 05:14:59PM +0300, Oleksandr Tyshchenko wrote:
> >> From: Oleksandr Tyshchenko 
> >>
> >> Without it being present it won't be possible to use some
> >> libxl__device_type's callbacks for virtio devices as the common code
> >> can only invoke these callbacks (by dereferencing a pointer) for valid
> >> libxl__device_type's elements when iterating over device_type_tbl[].
> > 
> > Did you notice an issue with it been missing from device_type_tbl[] ?
> > Because to me it looks like all the functions that are using
> > device_type_tbl will just skip over virtio devtype.
> > 
> > domcreate_attach_devices:
> >  skip virtio because ".skip_attach = 1"
> > 
> > libxl__need_xenpv_qemu:
> >  skip virtio because "dm_needed" is NULL
> > 
> > retrieve_domain_configuration_end:
> >  skip because "compare" is "libxl_device_virtio_compare" which is NULL
> > 
> > libxl__update_domain_configuration:
> >  skip because "update_config" is NULL.
> > 
> > So, I think the patch is fine, adding virtio to the device_type_tbl
> > array is good for completeness, but the patch description may be
> > misleading.
> > 
> > Did I miss something?
> 
> No, you didn't.
> 
> Just to be clear, there is no issue within *current* the code base, I am 
> experimenting with using device-model bits, so I implemented 
> libxl__device_virtio_dm_needed() locally and noticed that it didn't get 
> called at all, the reason was in absence of libxl__virtio_devtype in the 
> said array.
> 
> Do you agree with the following addition to the commit description?
> 
> "Please note, there is no issue within current the code base as virtio 
> devices don't use callbacks that depend on libxl__virtio_devtype 
> presence in device_type_tbl[]. The issue will appear as soon as we start
> using these callbacks (for example, dm_needed)."

Yes, that would be fine. With that addition:
Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v7 14/15] xenpm: Add set-cpufreq-cppc subcommand

2023-07-27 Thread Anthony PERARD

On Wed, Jul 26, 2023 at 01:09:44PM -0400, Jason Andryuk wrote:
> set-cpufreq-cppc allows setting the Hardware P-State (HWP) parameters.
> 
> It can be run on all or just a single cpu.  There are presets of
> balance, powersave & performance.  Those can be further tweaked by
> param:val arguments as explained in the usage description.
> 
> Parameter names are just checked to the first 3 characters to shorten
> typing.
> 
> Some options are hardware dependent, and ranges can be found in
> get-cpufreq-para.
> 
> Signed-off-by: Jason Andryuk 
> Acked-by: Jan Beulich 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

[PATCH v5 2/2] xen/riscv: introduce identity mapping

2023-07-27 Thread Oleksii Kurochko

The way how switch to virtual address was implemented in the
commit e66003e7be ("xen/riscv: introduce setup_initial_pages")
isn't safe enough as:
* enable_mmu() depends on hooking all exceptions
  and pagefault.
* Any exception other than pagefault, or not taking a pagefault
  causes it to malfunction, which means you will fail to boot
  depending on where Xen was loaded into memory.

Instead of the proposed way of switching to virtual addresses was
decided to use identity mapping for area which constains needed code
to switch from identity mapping and after switching to virtual addresses,
identity mapping is removed from page-tables in the following way:
search for top-most page table entry and remove it.

Fixes: e66003e7be ("xen/riscv: introduce setup_initial_pages")
Signed-off-by: Oleksii Kurochko 
Suggested-by: Andrew Cooper 
---
Changes in V5:
 - update the algo of identity mapping removing.
 - introduce IDENT_AREA_SIZE.
 - introduce turn_on_mmu() function to enable and switch from 1:1 mapping.
 - fix typo in PGTBL_INITIAL_COUNT define.
 - update the comment above PGTBL_INITIAL_COUNT.
 - update the commit message.
---
Changes in V4:
 - remove definition of ARRAY_SIZE and ROUNDUP as  was introduced 
where these macros are located now.
 - update definition of PGTBL_INITIAL_COUNT
 - update the commit message
 - update the algo of identity mapping removing
---
Changes in V3:
 - remove unrelated to the patch changes ( SPDX tags in config.h ).
 - update definition of PGTBL_INITIAL_COUNT taking into account identity 
mapping.
 - refactor remove_identity_mapping() function.
 - add explanatory comments in xen.lds.S and mm.c.
 - update commit message.
 - move save/restore of a0/a1 registers to [PATCH v2 2/3] xen/riscv: introduce
   function for physical offset calculation.
---
Changes in V2:
  - update definition of PGTBL_INITIAL_COUNT and the comment above.
  - code style fixes.
  - 1:1 mapping for entire Xen.
  - remove id_addrs array becase entire Xen is mapped.
  - reverse condition for cycle inside remove_identity_mapping().
  - fix page table walk in remove_identity_mapping().
  - update the commit message.
  - add Suggested-by: Andrew Cooper 
  - save hart_id and dtb_addr before call MMU related C functions.
  - use phys_offset variable instead of doing calcultations to get phys offset
in head.S file. ( it can be easily done as entire Xen is 1:1 mapped )
  - declare enable_muu() as __init.
---
 xen/arch/riscv/include/asm/config.h |  2 +
 xen/arch/riscv/include/asm/mm.h |  5 +-
 xen/arch/riscv/mm.c | 90 +++--
 xen/arch/riscv/riscv64/head.S   | 32 ++
 xen/arch/riscv/setup.c  | 14 +
 xen/arch/riscv/xen.lds.S| 11 
 6 files changed, 99 insertions(+), 55 deletions(-)

diff --git a/xen/arch/riscv/include/asm/config.h 
b/xen/arch/riscv/include/asm/config.h
index fa90ae0898..f0544c6a20 100644
--- a/xen/arch/riscv/include/asm/config.h
+++ b/xen/arch/riscv/include/asm/config.h
@@ -95,6 +95,8 @@
 #define RV_STAGE1_MODE SATP_MODE_SV32
 #endif
 
+#define IDENT_AREA_SIZE 64
+
 #endif /* __RISCV_CONFIG_H__ */
 /*
  * Local variables:
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 7b94cbadd7..07c7a0abba 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -13,8 +13,11 @@ extern unsigned char cpu0_boot_stack[];
 void setup_initial_pagetables(void);
 
 void enable_mmu(void);
-void cont_after_mmu_is_enabled(void);
+
+void remove_identity_mapping(void);
 
 unsigned long calc_phys_offset(void);
 
+void turn_on_mmu(unsigned long ra);
+
 #endif /* _ASM_RISCV_MM_H */
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 1df39ddf1b..d19fdb7878 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -35,8 +36,11 @@ static unsigned long __ro_after_init phys_offset;
  *
  * It might be needed one more page table in case when Xen load address
  * isn't 2 MB aligned.
+ *
+ * CONFIG_PAGING_LEVELS page tables are needed for the identity mapping,
+ * except that the root page table is shared with the initial mapping
  */
-#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1)
+#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) * 2 + 1)
 
 pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
 stage1_pgtbl_root[PAGETABLE_ENTRIES];
@@ -75,6 +79,7 @@ static void __init setup_initial_mapping(struct mmu_desc 
*mmu_desc,
 unsigned int index;
 pte_t *pgtbl;
 unsigned long page_addr;
+bool is_identity_mapping = map_start == pa_start;
 
 if ( (unsigned long)_start % XEN_PT_LEVEL_SIZE(0) )
 {
@@ -108,16 +113,18 @@ static void __init setup_initial_mapping(struct mmu_desc 
*mmu_desc,
 {
 unsigned long paddr = (page_addr - map_start) + pa_start;
 unsigned int permissions = PTE_LEAF_DEFAULT;
+

[PATCH v5 0/2] xen/riscv: introduce identity mapping

2023-07-27 Thread Oleksii Kurochko

The patch series introduces things necessary to implement identity mapping:
  1. Make identity mapping for the entire Xen.
  2. Enable MMU.
  3. Jump to the virtual address world
  4. Remove identity mapping.

Also current patch series introduces the calculation of physical offset before
MMU is enabled as access to physical offset will be calculated wrong after
MMU will be enabled because access to phys_off variable is PC-relative and
in the case when linker address != load address, it will cause MMU fault.

The reason for this patch series can be found here:
https://lore.kernel.org/xen-devel/4e336121-fc0c-b007-bf7b-430352563...@citrix.com/
---
Changes in V5:
- update the algo of identity mapping removing.
- introduce IDENT_AREA_SIZE.
- introduce turn_on_mmu() function to enable and switch from 1:1 
mapping.
- fix typo in PGTBL_INITIAL_COUNT define.
- update the comment above PGTBL_INITIAL_COUNT.
- update prototype of calc_phys_offset(). now it returns phys_offset.
- declare phys_offset as static.
- save returned value of calc_phys_offset to register s2.
---
Changes in V4:
  - drop patch  [PATCH v3 1/3] xen/riscv: add SPDX tag to config.h as it was
merged to staging
  - remove definition of ARRAY_SIZE and ROUNDUP as  was 
introduced where these macros are located now.
- update definition of PGTBL_INITIAL_COUNT
- update the commit message for patch 'xen/riscv: introduce identity 
mapping'
- update the comments in head.S
  - update the algo of identity mapping removing 
---
Changes in V3:
 - Update the patch series message.
 - The following patches were merged to staging so droped from the patch series:
   * xen/riscv: add .sbss section to .bss
   * xen/riscv: introduce reset_stack() function
   * xen/riscv: move extern of cpu0_boot_stack to header
   * xen/riscv: add SPDX tags
 - move save/restore of a0/a1 registers from patch 4 to patch 2 ( numbers are
   from the previous patch series version )
 - add SPDX tag in config.h
 - update definition of PGTBL_INITIAL_COUNT taking into account identity 
mapping.
 - refactor remove_identity_mapping() function.
 - add explanatory comments in xen.lds.S and mm.c.
---
Changes in V2:
 - update the patch series message.
 - drop patches from the previous version of the patch series:
   * xen/riscv: add __ASSEMBLY__ guards". ( merged )
   * xen/riscv: make sure that identity mapping isn't bigger then page size
 ( entire Xen is 1:1 mapped so there is no need for the checks from the 
patch )
 - add .sbss.* and put it befor .bss* .
 - move out reset_stack() to .text section.
 - add '__ro_after_init' for phys_offset variable.
 - add '__init' for calc_phys_offset().
 - declaring variable phys_off as non static as it will be used in head.S.
 - update definition of PGTBL_INITIAL_COUNT and the comment above.
 - code style fixes.
 - remove id_addrs array becase entire Xen is mapped.
 - reverse condition for cycle inside remove_identity_mapping().
 - fix page table walk in remove_identity_mapping().
 - save hart_id and dtb_addr before call MMU related C functions
 - use phys_offset variable instead of doing calcultations to get phys offset
   in head.S file. ( it can be easily done as entire Xen is 1:1 mapped now )
 - declare enable_muu() as __init.
 - Update SPDX tags.
 - Add Review-By/Suggested-By for some patches.
 - code style fixes.

Oleksii Kurochko (2):
  xen/riscv: introduce function for physical offset calculation
  xen/riscv: introduce identity mapping

 xen/arch/riscv/include/asm/config.h |   2 +
 xen/arch/riscv/include/asm/mm.h |   7 +-
 xen/arch/riscv/mm.c | 107 
 xen/arch/riscv/riscv64/head.S   |  46 
 xen/arch/riscv/setup.c  |  14 +---
 xen/arch/riscv/xen.lds.S|  11 +++
 6 files changed, 130 insertions(+), 57 deletions(-)

-- 
2.41.0

[PATCH v5 1/2] xen/riscv: introduce function for physical offset calculation

2023-07-27 Thread Oleksii Kurochko

The function was introduced to calculate and save physical
offset before MMU is enabled because access to start() is
PC-relative and in case of linker_addr != load_addr it will
result in incorrect value in phys_offset.

Signed-off-by: Oleksii Kurochko 
---
Changes in V5:
 - update prototype of calc_phys_offset(). now it returns phys_offset.
 - declare phys_offset as static.
 - save returned value of calc_phys_offset to register s2.
---
Changes in V4:
 - update the comment messages in head.S related to save/restore of a0/a1 regs.
---
Changes in V3:
 - save/restore of a0/a1 registers before C first function call.
---
Changes in V2:
  - add __ro_after_init for phys_offset variable.
  - remove double blank lines.
  - add __init for calc_phys_offset().
  - update the commit message.
  - declaring variable phys_off as non static as it will be used in head.S.
---
 xen/arch/riscv/include/asm/mm.h |  2 ++
 xen/arch/riscv/mm.c | 19 ---
 xen/arch/riscv/riscv64/head.S   | 14 ++
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 5e3ac5cde3..7b94cbadd7 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -15,4 +15,6 @@ void setup_initial_pagetables(void);
 void enable_mmu(void);
 void cont_after_mmu_is_enabled(void);
 
+unsigned long calc_phys_offset(void);
+
 #endif /* _ASM_RISCV_MM_H */
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index fddb3cd0bd..1df39ddf1b 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+#include 
 #include 
 #include 
 #include 
@@ -19,9 +20,10 @@ struct mmu_desc {
 pte_t *pgtbl_base;
 };
 
-#define PHYS_OFFSET ((unsigned long)_start - XEN_VIRT_START)
-#define LOAD_TO_LINK(addr) ((addr) - PHYS_OFFSET)
-#define LINK_TO_LOAD(addr) ((addr) + PHYS_OFFSET)
+static unsigned long __ro_after_init phys_offset;
+
+#define LOAD_TO_LINK(addr) ((unsigned long)(addr) - phys_offset)
+#define LINK_TO_LOAD(addr) ((unsigned long)(addr) + phys_offset)
 
 /*
  * It is expected that Xen won't be more then 2 MB.
@@ -273,3 +275,14 @@ void __init noreturn noinline enable_mmu()
 switch_stack_and_jump((unsigned long)cpu0_boot_stack + STACK_SIZE,
   cont_after_mmu_is_enabled);
 }
+
+/*
+ * calc_phys_offset() should be used before MMU is enabled because access to
+ * start() is PC-relative and in case when load_addr != linker_addr phys_offset
+ * will have an incorrect value
+ */
+unsigned long __init calc_phys_offset(void)
+{
+phys_offset = (unsigned long)start - XEN_VIRT_START;
+return phys_offset;
+}
diff --git a/xen/arch/riscv/riscv64/head.S b/xen/arch/riscv/riscv64/head.S
index 2c0304646a..ae194bb099 100644
--- a/xen/arch/riscv/riscv64/head.S
+++ b/xen/arch/riscv/riscv64/head.S
@@ -29,6 +29,20 @@ ENTRY(start)
 
 jal reset_stack
 
+/*
+ * save hart_id ( bootcpu_id ) and dtb_base as a0 and a1 register can
+ * be used by C code
+ */
+mv  s0, a0
+mv  s1, a1
+
+jal calc_phys_offset
+mv  s2, a0
+
+/* restore hart_id ( bootcpu_id ) and dtb address */
+mv  a0, s0
+mv  a1, s1
+
 tailstart_xen
 
 .section .text, "ax", %progbits
-- 
2.41.0

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Daniel P. Smith





On 7/27/23 08:54, Jan Beulich wrote:

On 27.07.2023 14:48, Daniel P. Smith wrote:

On 7/27/23 07:58, Jan Beulich wrote:

On 27.07.2023 13:46, Daniel P. Smith wrote:

On 7/21/23 02:14, Jan Beulich wrote:

On 21.07.2023 00:12, Christopher Clark wrote:

On Thu, Jul 13, 2023 at 11:51 PM Christopher Clark <
christopher.w.cl...@gmail.com> wrote:

On Sat, Jul 8, 2023 at 11:47 AM Stefano Stabellini 
wrote:

On Sat, 1 Jul 2023, Christopher Clark wrote:

To convert the x86 boot logic from multiboot to boot module structures,
change the bootstrap map function to accept a boot module parameter.

To allow incremental change from multiboot to boot modules across all
x86 setup logic, provide a temporary inline wrapper that still accepts a
multiboot module parameter and use it where necessary. The wrapper is
placed in a new arch/x86 header  to avoid putting a static
inline function into an existing header that has no such functions
already. This new header will be expanded with additional functions in
subsequent patches in this series.

No functional change intended.

Signed-off-by: Christopher Clark 
Signed-off-by: Daniel P. Smith 



[...]


diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
index b72ae31a66..eb93cc3439 100644
--- a/xen/include/xen/bootinfo.h
+++ b/xen/include/xen/bootinfo.h
@@ -10,6 +10,9 @@
#endif

struct boot_module {
+paddr_t start;
+size_t size;


I think size should be paddr_t (instead of size_t) to make sure it is
the right size on both 64-bit and 32-bit architectures that support
64-bit addresses.



Thanks, that explanation does make sense - ack.



I've come back to reconsider this as it doesn't seem right to me to store a
non-address value (which this will always be) in a type explicitly defined
to hold an address: addresses may have architectural alignment requirements
whereas a size value is just a number of bytes so will not. The point of a
size_t value is that size_t is defined to be large enough to hold the size
of any valid object in memory, so I think this was right as-is.


"Any object in memory" implies virtual addresses (or more generally addresses
which can be used for accessing objects). This isn't the case when considering
physical addresses - there may be far more memory in a system than can be made
accessible all in one go.


That is not my understanding of it, but I could be wrong. My
understanding based on all the debates I have read online around this
topic is that the intent in the spec is that size_t has to be able to
hold a value that represents the largest object the CPU can manipulate
with general purpose operations. From which I understand to mean as
large as the largest register a CPU instruction may use for a size
argument to a general purpose instruction. On x86_64, that is a 64bit
register, as I don't believe the SSE/AVX registers are counted even
though the are used by compiler/libc implementations to optimize some
memory operations.


I can't see how this relates to my earlier remark.


Perhaps I misunderstood what your point was then. I thought you were
taking the position that size_t could not be used to represent the
largest object in memory addressable by a single CPU operation.


No. I was trying to clarify that we're talking about physical addresses
here. Which you still seem to have trouble with, ...


No, I perfectly understand what you are saying and am not having 
difficulties with.



   From what I have seen for Xen, this is currently reflected in the x86
code base, as size_t is 32bits for the early 32bit code and 64bits for
Xen proper.

That aside, another objection I have to the use of paddr_t is that it is
type abuse. Types are meant to convey context to the intended use of the
variable and enable the ability to enforce proper usage of the variable,
otherwise we might as well just use u64/uint64_t and be done. The
field's purpose is to convey a size of an object,


You use "object" here again, when in physical address space (with paging
enabled) this isn't an appropriate term.


Because that is the language used in the C spec to refer to instances in
memory,

"Object: region of data storage in the execution environment, the
contents of which can represent values"

ISO/IEC 9899:1999(E) - 3.14:
https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf



With the following two interpretations of the spec for size_t to mean
(any emphasis being mine),


"size_t is an unsigned integer type used to represent the size of any
**object** (including arrays) in the particular implementation."

Wikipedia - size_t: https://en.wikipedia.org/wiki/C_data_types#stddef.h


"size_t can store the maximum size of a theoretically possible
**object** of any type (including array)."

CPP Ref - size_t: (https://en.cppreference.com/w/c/types/size_t)


... according to all of this and ...


and labeling it a type
that is intended for physical address objects violates both intents
behind declaring a type, it asserts an invalid

[linux-linus test] 182025: regressions - FAIL

2023-07-27 Thread osstest service owner

flight 182025 linux-linus real [real]
flight 182036 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/182025/
http://logs.test-lab.xenproject.org/osstest/logs/182036/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-libvirt-raw 13 guest-start  fail REGR. vs. 180278
 test-arm64-arm64-xl-vhd  13 guest-start  fail REGR. vs. 180278
 test-amd64-amd64-xl-credit2 22 guest-start/debian.repeat fail in 182036 REGR. 
vs. 180278

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-credit2 20 guest-localmigrate/x10 fail pass in 
182036-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt   16 saverestore-support-check fail blocked in 180278
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 
180278
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
180278
 test-armhf-armhf-xl-multivcpu  8 xen-boot fail like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux18b44bc5a67275641fb26f2c54ba7eef80ac5950
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z  101 days
Failing since180281  2023-04-17 06:24:36 Z  101 days  190 attempts
Testing same since   182025  2023-07-26 16:01:05 Z0 days1 attempts


3821 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass

Re: [XEN PATCH 2/3] xen/arm: irq: address violations of MISRA C: 2012 Rules 8.2 and 8.3

2023-07-27 Thread Federico Serafini


On 27/07/23 13:27, Jan Beulich wrote:

On 27.07.2023 13:02, Federico Serafini wrote:


I have ready a patch for violations of rules 8.2 and 8.3 in
xen/include/xen/iommu.h.
I am talking about this, in this IRQ thread, because I think the
following two options also apply for an eventual v2 patch for the IRQ
module, until a decision about rule 8.2 and function pointers is taken:

1) Split patches and submit only the changes *not* involving function
 pointers.
2) In the meantime that you make a decision,
 I submit patches thus addressing the existing violations.

I personally prefer the second one, but please let me know what you
think.


It's not entirely clear to me what 2 means, as I wouldn't expect you
intend to deal with "violations" which we may decide aren't any in
out world.

Jan


In point 2) I would like to act as if the rule 8.2 had been approved 
without any deviation, I think this will lead to a smaller number of 
patches and a smaller amount of text attached to each modified function.

If you are concerned about inconsistency between the resulting commit
messages and your future decision then we can go for option 1).

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)

Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-27 Thread Jan Beulich

On 27.07.2023 14:42, Roger Pau Monné wrote:
> There are also existing callers of iommu_update_ire_from_msi() that
> call the functions without the pcidevs locked.  See
> hpet_msi_set_affinity() for example.

Ftaod first and foremost because there's no pdev in that case.

Jan

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Jan Beulich

On 27.07.2023 14:48, Daniel P. Smith wrote:
> On 7/27/23 07:58, Jan Beulich wrote:
>> On 27.07.2023 13:46, Daniel P. Smith wrote:
>>> On 7/21/23 02:14, Jan Beulich wrote:
 On 21.07.2023 00:12, Christopher Clark wrote:
> On Thu, Jul 13, 2023 at 11:51 PM Christopher Clark <
> christopher.w.cl...@gmail.com> wrote:
>> On Sat, Jul 8, 2023 at 11:47 AM Stefano Stabellini 
>> 
>> wrote:
>>> On Sat, 1 Jul 2023, Christopher Clark wrote:
 To convert the x86 boot logic from multiboot to boot module structures,
 change the bootstrap map function to accept a boot module parameter.

 To allow incremental change from multiboot to boot modules across all
 x86 setup logic, provide a temporary inline wrapper that still accepts 
 a
 multiboot module parameter and use it where necessary. The wrapper is
 placed in a new arch/x86 header  to avoid putting a static
 inline function into an existing header that has no such functions
 already. This new header will be expanded with additional functions in
 subsequent patches in this series.

 No functional change intended.

 Signed-off-by: Christopher Clark 
 Signed-off-by: Daniel P. Smith 

>>>
>>> [...]
>>>
 diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
 index b72ae31a66..eb93cc3439 100644
 --- a/xen/include/xen/bootinfo.h
 +++ b/xen/include/xen/bootinfo.h
 @@ -10,6 +10,9 @@
#endif

struct boot_module {
 +paddr_t start;
 +size_t size;
>>>
>>> I think size should be paddr_t (instead of size_t) to make sure it is
>>> the right size on both 64-bit and 32-bit architectures that support
>>> 64-bit addresses.
>>>
>>
>> Thanks, that explanation does make sense - ack.
>>
>
> I've come back to reconsider this as it doesn't seem right to me to store 
> a
> non-address value (which this will always be) in a type explicitly defined
> to hold an address: addresses may have architectural alignment 
> requirements
> whereas a size value is just a number of bytes so will not. The point of a
> size_t value is that size_t is defined to be large enough to hold the size
> of any valid object in memory, so I think this was right as-is.

 "Any object in memory" implies virtual addresses (or more generally 
 addresses
 which can be used for accessing objects). This isn't the case when 
 considering
 physical addresses - there may be far more memory in a system than can be 
 made
 accessible all in one go.
>>>
>>> That is not my understanding of it, but I could be wrong. My
>>> understanding based on all the debates I have read online around this
>>> topic is that the intent in the spec is that size_t has to be able to
>>> hold a value that represents the largest object the CPU can manipulate
>>> with general purpose operations. From which I understand to mean as
>>> large as the largest register a CPU instruction may use for a size
>>> argument to a general purpose instruction. On x86_64, that is a 64bit
>>> register, as I don't believe the SSE/AVX registers are counted even
>>> though the are used by compiler/libc implementations to optimize some
>>> memory operations.
>>
>> I can't see how this relates to my earlier remark.
> 
> Perhaps I misunderstood what your point was then. I thought you were 
> taking the position that size_t could not be used to represent the 
> largest object in memory addressable by a single CPU operation.

No. I was trying to clarify that we're talking about physical addresses
here. Which you still seem to have trouble with, ...

>>>   From what I have seen for Xen, this is currently reflected in the x86
>>> code base, as size_t is 32bits for the early 32bit code and 64bits for
>>> Xen proper.
>>>
>>> That aside, another objection I have to the use of paddr_t is that it is
>>> type abuse. Types are meant to convey context to the intended use of the
>>> variable and enable the ability to enforce proper usage of the variable,
>>> otherwise we might as well just use u64/uint64_t and be done. The
>>> field's purpose is to convey a size of an object,
>>
>> You use "object" here again, when in physical address space (with paging
>> enabled) this isn't an appropriate term.
> 
> Because that is the language used in the C spec to refer to instances in 
> memory,
> 
> "Object: region of data storage in the execution environment, the 
> contents of which can represent values"
> 
> ISO/IEC 9899:1999(E) - 3.14: 
> https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf
> 
> 
> 
> With the following two interpretations of the spec for size_t to mean 
> (any emphasis being mine),
> 
> 
> "size_t is an unsigned integer type used to represent the size of any 
> **object** (including arrays) in

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Daniel P. Smith





On 7/27/23 07:58, Jan Beulich wrote:

On 27.07.2023 13:46, Daniel P. Smith wrote:



On 7/21/23 02:14, Jan Beulich wrote:

On 21.07.2023 00:12, Christopher Clark wrote:

On Thu, Jul 13, 2023 at 11:51 PM Christopher Clark <
christopher.w.cl...@gmail.com> wrote:




On Sat, Jul 8, 2023 at 11:47 AM Stefano Stabellini 
wrote:


On Sat, 1 Jul 2023, Christopher Clark wrote:

To convert the x86 boot logic from multiboot to boot module structures,
change the bootstrap map function to accept a boot module parameter.

To allow incremental change from multiboot to boot modules across all
x86 setup logic, provide a temporary inline wrapper that still accepts a
multiboot module parameter and use it where necessary. The wrapper is
placed in a new arch/x86 header  to avoid putting a static
inline function into an existing header that has no such functions
already. This new header will be expanded with additional functions in
subsequent patches in this series.

No functional change intended.

Signed-off-by: Christopher Clark 
Signed-off-by: Daniel P. Smith 



[...]


diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
index b72ae31a66..eb93cc3439 100644
--- a/xen/include/xen/bootinfo.h
+++ b/xen/include/xen/bootinfo.h
@@ -10,6 +10,9 @@
   #endif

   struct boot_module {
+paddr_t start;
+size_t size;


I think size should be paddr_t (instead of size_t) to make sure it is
the right size on both 64-bit and 32-bit architectures that support
64-bit addresses.



Thanks, that explanation does make sense - ack.



I've come back to reconsider this as it doesn't seem right to me to store a
non-address value (which this will always be) in a type explicitly defined
to hold an address: addresses may have architectural alignment requirements
whereas a size value is just a number of bytes so will not. The point of a
size_t value is that size_t is defined to be large enough to hold the size
of any valid object in memory, so I think this was right as-is.


"Any object in memory" implies virtual addresses (or more generally addresses
which can be used for accessing objects). This isn't the case when considering
physical addresses - there may be far more memory in a system than can be made
accessible all in one go.


That is not my understanding of it, but I could be wrong. My
understanding based on all the debates I have read online around this
topic is that the intent in the spec is that size_t has to be able to
hold a value that represents the largest object the CPU can manipulate
with general purpose operations. From which I understand to mean as
large as the largest register a CPU instruction may use for a size
argument to a general purpose instruction. On x86_64, that is a 64bit
register, as I don't believe the SSE/AVX registers are counted even
though the are used by compiler/libc implementations to optimize some
memory operations.


I can't see how this relates to my earlier remark.


Perhaps I misunderstood what your point was then. I thought you were 
taking the position that size_t could not be used to represent the 
largest object in memory addressable by a single CPU operation.



  From what I have seen for Xen, this is currently reflected in the x86
code base, as size_t is 32bits for the early 32bit code and 64bits for
Xen proper.

That aside, another objection I have to the use of paddr_t is that it is
type abuse. Types are meant to convey context to the intended use of the
variable and enable the ability to enforce proper usage of the variable,
otherwise we might as well just use u64/uint64_t and be done. The
field's purpose is to convey a size of an object,


You use "object" here again, when in physical address space (with paging
enabled) this isn't an appropriate term.


Because that is the language used in the C spec to refer to instances in 
memory,


"Object: region of data storage in the execution environment, the 
contents of which can represent values"


ISO/IEC 9899:1999(E) - 3.14: 
https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf




With the following two interpretations of the spec for size_t to mean 
(any emphasis being mine),



"size_t is an unsigned integer type used to represent the size of any 
**object** (including arrays) in the particular implementation."


Wikipedia - size_t: https://en.wikipedia.org/wiki/C_data_types#stddef.h


"size_t can store the maximum size of a theoretically possible 
**object** of any type (including array)."


CPP Ref - size_t: (https://en.cppreference.com/w/c/types/size_t)



and labeling it a type
that is intended for physical address objects violates both intents
behind declaring a type, it asserts an invalid context and enables
violations of type checking.


It is type abuse to a certain extent, yes, but what do you do? We could
invent psize_t, but that would (afaics) always match paddr_t. uint64_t
otoh may be too larger for 32-bit platforms which only know a 32-bit
wide physical address space.


Why invent a new type?

[xen-unstable-smoke test] 182034: tolerable all pass - PUSHED

2023-07-27 Thread osstest service owner

flight 182034 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/182034/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  baa6ea7003868d1a339d06b17fd32d41b851d571
baseline version:
 xen  f126d7eeba33a1de04f6d9f6f64855637d4eadb9

Last test of basis   182028  2023-07-26 23:00:27 Z0 days
Testing same since   182034  2023-07-27 08:02:11 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Juergen Gross 
  Peter Hoyes 
  Rahul Singh 
  Roger Pau Monné 
  Shawn Anastasio 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   f126d7eeba..baa6ea7003  baa6ea7003868d1a339d06b17fd32d41b851d571 -> smoke

Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-27 Thread Roger Pau Monné

On Thu, Jul 27, 2023 at 12:56:54AM +, Volodymyr Babchuk wrote:
> Hi Roger,
> 
> Roger Pau Monné  writes:
> 
> > On Wed, Jul 26, 2023 at 01:17:58AM +, Volodymyr Babchuk wrote:
> >> 
> >> Hi Roger,
> >> 
> >> Roger Pau Monné  writes:
> >> 
> >> > On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
> >> >> From: Oleksandr Andrushchenko 
> >> >> @@ -498,6 +537,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
> >> >> unsigned int size,
> >> >>  ASSERT(data_offset < size);
> >> >>  }
> >> >>  spin_unlock(>vpci->lock);
> >> >> +unlock_locks(d);
> >> >
> >> > There's one issue here, some handlers will cal pcidevs_lock(), which
> >> > will result in a lock over inversion, as in the previous patch we
> >> > agreed that the locking order was pcidevs_lock first, d->pci_lock
> >> > after.
> >> >
> >> > For example the MSI control_write() handler will call
> >> > vpci_msi_arch_enable() which takes the pcidevs lock.  I think I will
> >> > have to look into using a dedicated lock for MSI related handling, as
> >> > that's the only place where I think we have this pattern of taking the
> >> > pcidevs_lock after the d->pci_lock.
> >> 
> >> I'll mention this in the commit message. Is there something else that I
> >> should do right now?
> >
> > Well, I don't think we want to commit this as-is with a known lock
> > inversion.
> >
> > The functions that require the pcidevs lock are:
> >
> > pt_irq_{create,destroy}_bind()
> > unmap_domain_pirq()
> >
> > AFAICT those functions require the lock in order to assert that the
> > underlying device doesn't go away, as they do also use d->event_lock
> > in order to get exclusive access to the data fields.  Please double
> > check that I'm not mistaken.
> 
> You are right, all three function does not access any of PCI state
> directly. However...
> 
> > If that's accurate you will have to check the call tree that spawns
> > from those functions in order to modify the asserts to check for
> > either the pcidevs or the per-domain pci_list lock being taken.
> 
> ... I checked calls for PT_IRQ_TYPE_MSI case, there is only call that
> bothers me: hvm_pi_update_irte(), which calls IO-MMU code via
> vmx_pi_update_irte():
> 
> amd_iommu_msi_msg_update_ire() or msi_msg_write_remap_rte().

That path is only for VT-d, so strictly speaking you only need to worry
about msi_msg_write_remap_rte().

msi_msg_write_remap_rte() does take the IOMMU intremap lock.

There are also existing callers of iommu_update_ire_from_msi() that
call the functions without the pcidevs locked.  See
hpet_msi_set_affinity() for example.

Thanks, Roger.

Re: [PATCH v3 4/4] x86/iommu: pass full IO-APIC RTE for remapping table update

2023-07-27 Thread Jan Beulich

On 26.07.2023 14:55, Roger Pau Monne wrote:
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -92,7 +92,7 @@ int cf_check intel_iommu_get_reserved_device_memory(
>  unsigned int cf_check io_apic_read_remap_rte(
>  unsigned int apic, unsigned int reg);
>  void cf_check io_apic_write_remap_rte(
> -unsigned int apic, unsigned int reg, unsigned int value);
> +unsigned int apic, unsigned int ioapic_pin, uint64_t rte);

Forgot to rename the middle parameter to "pin"?

> @@ -364,48 +363,40 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu 
> *iommu,
>  
>  new_ire = *iremap_entry;
>  
> -if ( rte_upper )
> -{
> -if ( x2apic_enabled )
> -new_ire.remap.dst = value;
> -else
> -new_ire.remap.dst = (value >> 24) << 8;
> -}
> +if ( x2apic_enabled )
> +new_ire.remap.dst = new_rte.dest.dest32;
>  else
> -{
> -*(((u32 *)_rte) + 0) = value;
> -new_ire.remap.fpd = 0;
> -new_ire.remap.dm = new_rte.dest_mode;
> -new_ire.remap.tm = new_rte.trigger;
> -new_ire.remap.dlm = new_rte.delivery_mode;
> -/* Hardware require RH = 1 for LPR delivery mode */
> -new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> -new_ire.remap.avail = 0;
> -new_ire.remap.res_1 = 0;
> -new_ire.remap.vector = new_rte.vector;
> -new_ire.remap.res_2 = 0;
> -
> -set_ioapic_source_id(IO_APIC_ID(apic), _ire);
> -new_ire.remap.res_3 = 0;
> -new_ire.remap.res_4 = 0;
> -new_ire.remap.p = 1; /* finally, set present bit */
> -
> -/* now construct new ioapic rte entry */
> -remap_rte->vector = new_rte.vector;
> -remap_rte->delivery_mode = 0;/* has to be 0 for remap format */
> -remap_rte->index_15 = (index >> 15) & 0x1;
> -remap_rte->index_0_14 = index & 0x7fff;
> -
> -remap_rte->delivery_status = new_rte.delivery_status;
> -remap_rte->polarity = new_rte.polarity;
> -remap_rte->irr = new_rte.irr;
> -remap_rte->trigger = new_rte.trigger;
> -remap_rte->mask = new_rte.mask;
> -remap_rte->reserved = 0;
> -remap_rte->format = 1;/* indicate remap format */
> -}
> -
> -update_irte(iommu, iremap_entry, _ire, !init);
> +new_ire.remap.dst = (new_rte.dest.dest32 >> 24) << 8;

I realize this was this way before, but I wonder if we couldn't use
GET_xAPIC_ID() here to reduce the open-coding at least a bit.

> @@ -439,36 +430,45 @@ unsigned int cf_check io_apic_read_remap_rte(
>  }
>  
>  void cf_check io_apic_write_remap_rte(
> -unsigned int apic, unsigned int reg, unsigned int value)
> +unsigned int apic, unsigned int pin, uint64_t rte)
>  {
> -unsigned int pin = (reg - 0x10) / 2;
> +struct IO_xAPIC_route_entry new_rte = { .raw = rte };
>  struct IO_xAPIC_route_entry old_rte = { };
> -struct IO_APIC_route_remap_entry *remap_rte;
> -unsigned int rte_upper = (reg & 1) ? 1 : 0;
>  struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
> -int saved_mask;
> -
> -old_rte = __ioapic_read_entry(apic, pin, true);
> +bool masked = true;
> +int rc;
>  
> -remap_rte = (struct IO_APIC_route_remap_entry *) _rte;
> -
> -/* mask the interrupt while we change the intremap table */
> -saved_mask = remap_rte->mask;
> -remap_rte->mask = 1;
> -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
> -remap_rte->mask = saved_mask;
> -
> -if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
> -   _rte, rte_upper, value) )
> +if ( !cpu_has_cx16 )
>  {
> -__io_apic_write(apic, reg, value);
> +   /*
> +* Cannot atomically update the IRTE entry: mask the IO-APIC pin to
> +* avoid interrupts seeing an inconsistent IRTE entry.
> +*/
> +old_rte = __ioapic_read_entry(apic, pin, true);
> +if ( !old_rte.mask )
> +{
> +masked = false;
> +old_rte.mask = 1;
> +__ioapic_write_entry(apic, pin, true, old_rte);
> +}
> +}
>  
> -/* Recover the original value of 'mask' bit */
> -if ( rte_upper )
> -__io_apic_write(apic, reg & ~1, *(u32 *)_rte);
> +rc = ioapic_rte_to_remap_entry(iommu, apic, pin, _rte, new_rte);
> +if ( rc )
> +{
> +if ( !masked )
> +{
> +/* Recover the original value of 'mask' bit */
> +old_rte.mask = 0;
> +__ioapic_write_entry(apic, pin, true, old_rte);
> +}
> +dprintk(XENLOG_ERR VTDPREFIX,
> +"failed to update IRTE for IO-APIC#%u pin %u: %d\n",
> +apic, pin, rc);

Afaics you don't alter the error behavior of ioapic_rte_to_remap_entry(),
and in the sole (pre-existing) case of an error a debug log message is
already being issued. Why the addition?

Jan

[xen-unstable test] 182027: regressions - trouble: fail/pass/starved

2023-07-27 Thread osstest service owner

flight 182027 xen-unstable real [real]
flight 182035 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/182027/
http://logs.test-lab.xenproject.org/osstest/logs/182035/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 20 guest-start.2  fail in 182035 REGR. vs. 182012

Tests which are failing intermittently (not blocking):
 test-amd64-i386-freebsd10-amd64  7 xen-install  fail pass in 182035-retest
 test-amd64-i386-freebsd10-i386  7 xen-install   fail pass in 182035-retest
 test-amd64-i386-pair11 xen-install/dst_host fail pass in 182035-retest
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail pass in 
182035-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 182012
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 182012
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 182012
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 182012
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 182012
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 182012
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 182012
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 182012
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 182012
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 182012
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 182012
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 182012
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw  3 hosts-allocate   starved  n/a
 test-arm64-arm64-libvirt-xsm  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-vhd   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-credit2   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-credit1   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-thunderx  3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl   3 hosts-allocate   starved  n/a
 test-arm64-arm64-xl-xsm   3 hosts-allocate   starved  n/a

version targeted for testing:
 xen  3d2d4ea026df73c37a7df7e216443cbf652ff892
baseline version:
 xen  0b1171be87698bc7d14760383c0770aeb6e41dd4

Last test of basis   182012  2023-07-25 19:27:43 Z1 days
Testing same since   182027  2023-07-26 21:35:59 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Federico Serafini 
  Jan Beulich 
  Leo Yan 
  Nicola Vetrini

Re: [PATCH v3 2/4] x86/ioapic: RTE modifications must use ioapic_write_entry

2023-07-27 Thread Jan Beulich

On 26.07.2023 14:55, Roger Pau Monne wrote:
> Do not allow to write to RTE registers using io_apic_write and instead
> require changes to RTE to be performed using ioapic_write_entry.
> 
> This is in preparation for passing the full contents of the RTE to the
> IOMMU interrupt remapping handlers, so remapping entries for IO-APIC
> RTEs can be updated atomically when possible.
> 
> While immediately this commit might expand the number of MMIO accesses
> in order to update an IO-APIC RTE, further changes will benefit from
> getting the full RTE value passed to the IOMMU handlers, as the logic
> is greatly simplified when the IOMMU handlers can get the complete RTE
> value in one go.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich

[PATCH v6 7/9] swiotlb: determine potential physical address limit

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

The value returned by default_swiotlb_limit() should be constant, because
it is used to decide whether DMA can be used. To allow allocating memory
pools on the fly, use the maximum possible physical address rather than the
highest address used by the default pool.

For swiotlb_init_remap(), this is either an arch-specific limit used by
memblock_alloc_low(), or the highest directly mapped physical address if
the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the
highest address is determined by the GFP flags.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 14 ++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 66867d2188ba..9825fa14abe4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -109,6 +109,7 @@ struct io_tlb_pool {
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
+ * @phys_limit:Maximum allowed physical address.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -123,6 +124,7 @@ struct io_tlb_mem {
bool for_alloc;
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
+   u64 phys_limit;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6e985f65b9f5..ca3aa03f37ba 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,10 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (flags & SWIOTLB_ANY)
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
+   else
+   io_tlb_default_mem.phys_limit = ARCH_LOW_ADDRESS_LIMIT;
 #endif
 
if (!default_nareas)
@@ -409,6 +413,12 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
+   else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
+   else
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
 #endif
 
if (!default_nareas)
@@ -1398,7 +1408,11 @@ phys_addr_t default_swiotlb_base(void)
  */
 phys_addr_t default_swiotlb_limit(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   return io_tlb_default_mem.phys_limit;
+#else
return io_tlb_default_mem.defpool.end - 1;
+#endif
 }
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.25.1

[PATCH v6 6/9] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

Try to allocate a transient memory pool if no suitable slots can be found
and the respective SWIOTLB is allowed to grow. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   6 +
 include/linux/dma-mapping.h |   2 +
 include/linux/swiotlb.h |  29 +++-
 kernel/dma/direct.c |   2 +-
 kernel/dma/swiotlb.c| 316 +++-
 5 files changed, 345 insertions(+), 10 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index d9754a68ba95..5fd89c9d005c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -626,6 +626,8 @@ struct device_physical_location {
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
+ * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
+ * @dma_io_tlb_lock:   Protects changes to the list of active pools.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -731,6 +733,10 @@ struct device {
 #endif
 #ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
+#endif
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head dma_io_tlb_pools;
+   spinlock_t dma_io_tlb_lock;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index e13050eb9777..f0ccca16a0ac 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -418,6 +418,8 @@ static inline void dma_sync_sgtable_for_device(struct 
device *dev,
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
+
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
 {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 57be2a0a9fbf..66867d2188ba 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -80,6 +80,9 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @area_nslabs: Number of slots in each area.
  * @areas: Array of memory area descriptors.
  * @slots: Array of slot descriptors.
+ * @node:  Member of the IO TLB memory pool list.
+ * @rcu:   RCU head for swiotlb_dyn_free().
+ * @transient:  %true if transient memory pool.
  */
 struct io_tlb_pool {
phys_addr_t start;
@@ -91,6 +94,11 @@ struct io_tlb_pool {
unsigned int area_nslabs;
struct io_tlb_area *areas;
struct io_tlb_slot *slots;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head node;
+   struct rcu_head rcu;
+   bool transient;
+#endif
 };
 
 /**
@@ -122,6 +130,20 @@ struct io_tlb_mem {
 #endif
 };
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr);
+
+#else
+
+static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
+   phys_addr_t paddr)
+{
+   return >dma_io_tlb_mem->defpool;
+}
+
+#endif
+
 /**
  * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
  * @dev:Device which has mapped the buffer.
@@ -137,7 +159,12

Re: [XEN PATCH v4 4/4] xen/x86: address violations of MISRA C:2012 Rule 7.2

2023-07-27 Thread Jan Beulich

On 26.07.2023 13:03, Simone Ballarin wrote:
> @@ -70,15 +70,15 @@ static const uint8_t sr_mask[8] = {
>  };
>  
>  static const uint8_t gr_mask[9] = {
> -(uint8_t)~0xf0, /* 0x00 */
> -(uint8_t)~0xf0, /* 0x01 */
> -(uint8_t)~0xf0, /* 0x02 */
> -(uint8_t)~0xe0, /* 0x03 */
> -(uint8_t)~0xfc, /* 0x04 */
> -(uint8_t)~0x84, /* 0x05 */
> -(uint8_t)~0xf0, /* 0x06 */
> -(uint8_t)~0xf0, /* 0x07 */
> -(uint8_t)~0x00, /* 0x08 */
> +(uint8_t)~0xf0,
> +(uint8_t)~0xf0,
> +(uint8_t)~0xf0,
> +(uint8_t)~0xe0,
> +(uint8_t)~0xfc,
> +(uint8_t)~0x84,
> +(uint8_t)~0xf0,
> +(uint8_t)~0xf0,
> +(uint8_t)~0x00,
>  };

Hmm, this stray change is _still_ there.

> --- a/xen/arch/x86/include/asm/hvm/trace.h
> +++ b/xen/arch/x86/include/asm/hvm/trace.h
> @@ -58,7 +58,7 @@
>  #define DO_TRC_HVM_VLAPIC   DEFAULT_HVM_MISC
>  
>  
> -#define TRC_PAR_LONG(par) ((par)&0x),((par)>>32)
> +#define TRC_PAR_LONG(par) ((uint32_t)par), ((par) >> 32)

You've lost parentheses around "par".

> @@ -93,7 +93,7 @@
>  HVMTRACE_ND(evt, 0, 0)
>  
>  #define HVMTRACE_LONG_1D(evt, d1)  \
> -   HVMTRACE_2D(evt ## 64, (d1) & 0x, (d1) >> 32)
> +   HVMTRACE_2D(evt ## 64, (uint32_t)d1, (d1) >> 32)

Same for "d1" here.

Both of these are, btw., indications that you should have dropped Stefano's
R-b.

> --- a/xen/arch/x86/include/asm/msr-index.h
> +++ b/xen/arch/x86/include/asm/msr-index.h
> @@ -30,7 +30,7 @@
>  
>  #define MSR_INTEL_CORE_THREAD_COUNT 0x0035
>  #define  MSR_CTC_THREAD_MASK0x
> -#define  MSR_CTC_CORE_MASK  0x
> +#define  MSR_CTC_CORE_MASK  0xU
>  
>  #define MSR_SPEC_CTRL   0x0048
>  #define  SPEC_CTRL_IBRS (_AC(1, ULL) <<  0)
> @@ -168,7 +168,7 @@
>  #define MSR_UARCH_MISC_CTRL 0x1b01
>  #define  UARCH_CTRL_DOITM   (_AC(1, ULL) <<  0)
>  
> -#define MSR_EFER0xc080 /* Extended Feature 
> Enable Register */
> +#define MSR_EFER_AC(0xc080, U)  /* Extended 
> Feature Enable Register */

I understand you use _AC() here because the constant is used in assembly
code. But I don't understand why you use it only here, and not throughout
at least the "modern" portion of the file. I wonder what other x86
maintainers think about this.

As a minor aspect, I also don't really see why you insert a 2nd blank
ahead of the comment.

>  #define  EFER_SCE   (_AC(1, ULL) <<  0) /* SYSCALL 
> Enable */
>  #define  EFER_LME   (_AC(1, ULL) <<  8) /* Long Mode 
> Enable */
>  #define  EFER_LMA   (_AC(1, ULL) << 10) /* Long Mode 
> Active */
> @@ -181,35 +181,35 @@
>  (EFER_SCE | EFER_LME | EFER_LMA | EFER_NXE | EFER_SVME | EFER_FFXSE | \
>   EFER_AIBRSE)
>  
> -#define MSR_STAR0xc081 /* legacy mode 
> SYSCALL target */
> -#define MSR_LSTAR   0xc082 /* long mode SYSCALL 
> target */
> -#define MSR_CSTAR   0xc083 /* compat mode 
> SYSCALL target */
> -#define MSR_SYSCALL_MASK0xc084 /* EFLAGS mask for 
> syscall */
> -#define MSR_FS_BASE 0xc100 /* 64bit FS base */
> -#define MSR_GS_BASE 0xc101 /* 64bit GS base */
> -#define MSR_SHADOW_GS_BASE  0xc102 /* SwapGS GS shadow */
> -#define MSR_TSC_AUX 0xc103 /* Auxiliary TSC */
> +#define MSR_STAR0xc081U /* legacy mode 
> SYSCALL target */
> +#define MSR_LSTAR   0xc082U /* long mode SYSCALL 
> target */
> +#define MSR_CSTAR   0xc083U /* compat mode 
> SYSCALL target */
> +#define MSR_SYSCALL_MASK0xc084U /* EFLAGS mask for 
> syscall */
> +#define MSR_FS_BASE 0xc100U /* 64bit FS base */
> +#define MSR_GS_BASE 0xc101U /* 64bit GS base */
> +#define MSR_SHADOW_GS_BASE  0xc102U /* SwapGS GS shadow 
> */
> +#define MSR_TSC_AUX 0xc103U /* Auxiliary TSC */
>  
> -#define MSR_K8_SYSCFG   0xc0010010
> +#define MSR_K8_SYSCFG   0xc0010010U
>  #define  SYSCFG_MTRR_FIX_DRAM_EN(_AC(1, ULL) << 18)
>  #define  SYSCFG_MTRR_FIX_DRAM_MOD_EN(_AC(1, ULL) << 19)
>  #define  SYSCFG_MTRR_VAR_DRAM_EN(_AC(1, ULL) << 20)
>  #define  SYSCFG_MTRR_TOM2_EN(_AC(1, ULL) << 21)
>  #define  SYSCFG_TOM2_FORCE_WB   (_AC(1, ULL) << 22)
>  
> -#define MSR_K8_IORR_BASE0   0xc0010016
> -#define MSR_K8_IORR_MASK0   0xc0010017
> -#define MSR_K8_IORR_BASE1

[PATCH v6 9/9] swiotlb: search the software IO TLB only if the device makes use of it

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

Skip searching the software IO TLB if a device has never used it, making
sure these devices are not affected by the introduction of multiple IO TLB
memory pools.

Additional memory barrier is required to ensure that the new value of the
flag is visible to other CPUs after mapping a new bounce buffer. For
efficiency, the flag check should be inlined, and then the memory barrier
must be moved to is_swiotlb_buffer(). However, it can replace the existing
barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer()
first to verify that the buffer address belongs to the software IO TLB.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |  2 ++
 include/linux/swiotlb.h |  7 ++-
 kernel/dma/swiotlb.c| 14 ++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 5fd89c9d005c..6fc808d22bfd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -628,6 +628,7 @@ struct device_physical_location {
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
  * @dma_io_tlb_lock:   Protects changes to the list of active pools.
+ * @dma_uses_io_tlb: %true if device has used the software IO TLB.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -737,6 +738,7 @@ struct device {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
struct list_head dma_io_tlb_pools;
spinlock_t dma_io_tlb_lock;
+   bool dma_uses_io_tlb;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8371c92a0271..b4536626f8ff 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -172,8 +172,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
if (!mem)
return false;
 
-   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC))
+   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {
+   /* Pairs with smp_wmb() in swiotlb_find_slots() and
+* swiotlb_dyn_alloc(), which modify the RCU lists.
+*/
+   smp_rmb();
return swiotlb_find_pool(dev, paddr);
+   }
return paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1560a3e484b9..1fe64573d828 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -730,7 +730,7 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
 
add_mem_pool(mem, pool);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
+   /* Pairs with smp_rmb() in is_swiotlb_buffer(). */
smp_wmb();
 }
 
@@ -764,11 +764,6 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr)
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
 
-   /* Pairs with smp_wmb() in swiotlb_find_slots() and
-* swiotlb_dyn_alloc(), which modify the RCU lists.
-*/
-   smp_rmb();
-
rcu_read_lock();
list_for_each_entry_rcu(pool, >pools, node) {
if (paddr >= pool->start && paddr < pool->end)
@@ -813,6 +808,7 @@ void swiotlb_dev_init(struct device *dev)
 #ifdef CONFIG_SWIOTLB_DYNAMIC
INIT_LIST_HEAD(>dma_io_tlb_pools);
spin_lock_init(>dma_io_tlb_lock);
+   dev->dma_uses_io_tlb = false;
 #endif
 }
 
@@ -1157,9 +1153,11 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
list_add_rcu(>node, >dma_io_tlb_pools);
spin_unlock_irqrestore(>dma_io_tlb_lock, flags);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
-   smp_wmb();
 found:
+   dev->dma_uses_io_tlb = true;
+   /* Pairs with smp_rmb() in is_swiotlb_buffer() */
+   smp_wmb();
+
*retpool = pool;
return index;
 }
-- 
2.25.1

[PATCH v6 8/9] swiotlb: allocate a new memory pool when existing pools are full

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

When swiotlb_find_slots() cannot find suitable slots, schedule the
allocation of a new memory pool. It is not possible to allocate the pool
immediately, because this code may run in interrupt context, which is not
suitable for large memory allocations. This means that the memory pool will
be available too late for the currently requested mapping, but the stress
on the software IO TLB allocator is likely to continue, and subsequent
allocations will benefit from the additional pool eventually.

Keep all memory pools for an allocator in an RCU list to avoid locking on
the read side. For modifications, add a new spinlock to struct io_tlb_mem.

The spinlock also protects updates to the total number of slabs (nslabs in
struct io_tlb_mem), but not reads of the value. Readers may therefore
encounter a stale value, but this is not an issue:

- swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero
  value. This is ensured by the existence of the default memory pool,
  allocated at boot.

- The exact value is used only for non-critical purposes (debugfs, kernel
  messages).

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |   8 +++
 kernel/dma/swiotlb.c| 148 +---
 2 files changed, 131 insertions(+), 25 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 9825fa14abe4..8371c92a0271 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct page;
@@ -104,12 +105,16 @@ struct io_tlb_pool {
 /**
  * struct io_tlb_mem - Software IO TLB allocator
  * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @pool:  IO TLB memory pool descriptor (if not dynamic).
  * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
  * @phys_limit:Maximum allowed physical address.
+ * @lock:  Lock to synchronize changes to the list.
+ * @pools: List of IO TLB memory pool descriptors (if dynamic).
+ * @dyn_alloc: Dynamic IO TLB pool allocation work.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -125,6 +130,9 @@ struct io_tlb_mem {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
u64 phys_limit;
+   spinlock_t lock;
+   struct list_head pools;
+   struct work_struct dyn_alloc;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ca3aa03f37ba..1560a3e484b9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -79,8 +79,23 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+static void swiotlb_dyn_alloc(struct work_struct *work);
+
+static struct io_tlb_mem io_tlb_default_mem = {
+   .lock = __SPIN_LOCK_UNLOCKED(io_tlb_default_mem.lock),
+   .pools = LIST_HEAD_INIT(io_tlb_default_mem.pools),
+   .dyn_alloc = __WORK_INITIALIZER(io_tlb_default_mem.dyn_alloc,
+   swiotlb_dyn_alloc),
+};
+
+#else  /* !CONFIG_SWIOTLB_DYNAMIC */
+
 static struct io_tlb_mem io_tlb_default_mem;
 
+#endif /* CONFIG_SWIOTLB_DYNAMIC */
+
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
 
@@ -278,6 +293,23 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool 
*mem, phys_addr_t start,
return;
 }
 
+/**
+ * add_mem_pool() - add a memory pool to the allocator
+ * @mem:   Software IO TLB allocator.
+ * @pool:  Memory pool to be added.
+ */
+static void add_mem_pool(struct io_tlb_mem *mem, struct io_tlb_pool *pool)
+{
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   spin_lock(>lock);
+   list_add_rcu(>node, >pools);
+   mem->nslabs += pool->nslabs;
+   spin_unlock(>lock);
+#else
+   mem->nslabs = pool->nslabs;
+#endif
+}
+
 static void __init *swiotlb_memblock_alloc(unsigned long nslabs,
unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs))
@@ -375,7 +407,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false,
 default_nareas);
-   io_tlb_default_mem.nslabs = nslabs;
+   add_mem_pool(_tlb_default_mem, mem);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -474,7 +506,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
swiotlb_init_io_tlb_pool(mem,

[PATCH v6 5/9] swiotlb: add a flag whether SWIOTLB is allowed to grow

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

Add a config option (CONFIG_SWIOTLB_DYNAMIC) to enable or disable dynamic
allocation of additional bounce buffers.

If this option is set, mark the default SWIOTLB as able to grow and
restricted DMA pools as unable.

However, if the address of the default memory pool is explicitly queried,
make the default SWIOTLB also unable to grow. This is currently used to set
up PCI BAR movable regions on some Octeon MIPS boards which may not be able
to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup()
for more details.

If a remap function is specified, it must be also called on any dynamically
allocated pools, but there are some issues:

- The remap function may block, so it should not be called from an atomic
  context.
- There is no corresponding unremap() function if the memory pool is
  freed.
- The only in-tree implementation (xen_swiotlb_fixup) requires that the
  number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE.

Keep it simple for now and disable growing the SWIOTLB if a remap function
was specified.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/Kconfig  | 13 +
 kernel/dma/swiotlb.c| 13 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 247f0ab8795a..57be2a0a9fbf 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -100,6 +100,7 @@ struct io_tlb_pool {
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @can_grow:  %true if more pools can be allocated dynamically.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -112,6 +113,9 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   bool can_grow;
+#endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 11d077003205..68c61fdf2b44 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -90,6 +90,19 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config SWIOTLB_DYNAMIC
+   bool "Dynamic allocation of DMA bounce buffers"
+   default n
+   depends on SWIOTLB
+   help
+ This enables dynamic resizing of the software IO TLB. The kernel
+ starts with one memory pool at boot and it will allocate additional
+ pools as needed. To reduce run-time kernel memory requirements, you
+ may have to specify a smaller size of the initial pool using
+ "swiotlb=" on the kernel command line.
+
+ If unsure, say N.
+
 config DMA_BOUNCE_UNALIGNED_KMALLOC
bool
depends on SWIOTLB
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 616113760b60..346857581b75 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -330,6 +330,11 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
io_tlb_default_mem.force_bounce =
swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -400,6 +405,11 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -1074,6 +1084,9 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
  */
 phys_addr_t default_swiotlb_base(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   io_tlb_default_mem.can_grow = false;
+#endif
return io_tlb_default_mem.defpool.start;
 }
 
-- 
2.25.1

[PATCH v6 4/9] swiotlb: separate memory pool data from other allocator data

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

Carve out memory pool specific fields from struct io_tlb_mem. The original
struct now contains shared data for the whole allocator, while the new
struct io_tlb_pool contains data that is specific to one memory pool of
(potentially) many.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   2 +-
 include/linux/swiotlb.h |  45 +++
 kernel/dma/swiotlb.c| 175 +---
 3 files changed, 140 insertions(+), 82 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index bbaeabd04b0d..d9754a68ba95 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -625,7 +625,7 @@ struct device_physical_location {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
- * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
+ * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 31625ae507ea..247f0ab8795a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 
 /**
- * struct io_tlb_mem - IO TLB Memory Pool Descriptor
- *
+ * struct io_tlb_pool - IO TLB memory pool descriptor
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -73,15 +72,34 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
  * may be remapped in the memory encrypted case and store virtual
  * address for bounce buffer operation.
- * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. For default swiotlb, this is command line adjustable via
- * setup_io_tlb_npages.
+ * @nslabs:The number of IO TLB slots between @start and @end. For the
+ * default swiotlb, this can be adjusted with a boot parameter,
+ * see setup_io_tlb_npages().
+ * @late_alloc:%true if allocated using the page allocator.
+ * @nareas:Number of areas in the pool.
+ * @area_nslabs: Number of slots in each area.
+ * @areas: Array of memory area descriptors.
+ * @slots: Array of slot descriptors.
+ */
+struct io_tlb_pool {
+   phys_addr_t start;
+   phys_addr_t end;
+   void *vaddr;
+   unsigned long nslabs;
+   bool late_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
+   struct io_tlb_slot *slots;
+};
+
+/**
+ * struct io_tlb_mem - Software IO TLB allocator
+ * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
- * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @nareas:  The area number in the pool.
- * @area_nslabs: The slot number in the area.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -89,18 +107,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
phys,
  * in debugfs.
  */
 struct io_tlb_mem {
-   phys_addr_t start;
-   phys_addr_t end;
-   void *vaddr;
+   struct io_tlb_pool defpool;
unsigned long nslabs;
struct dentry *debugfs;
-   bool late_alloc;
bool force_bounce;
bool for_alloc;
-   unsigned int nareas;
-   unsigned int area_nslabs;
-   struct io_tlb_area *areas;
-   struct io_tlb_slot *slots;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
@@ -122,7 +133,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
-   return mem && paddr >= mem->start && paddr < mem->end;
+   return mem && paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
 static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ef5d5e41a17f..616113760b60 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -209,7 +209,7 @@ void __init swiotlb_adjust_size(unsigned long size)
 
 void swiotlb_print_info(void)
 {
-   struct io_tlb_mem *mem = _tlb_default_mem;
+   struct io_tlb_pool *mem =

[PATCH v6 3/9] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

Add some kernel-doc comments and move the existing documentation of struct
io_tlb_slot to its correct location. The latter was forgotten in commit
942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").

Use the opportunity to give swiotlb_do_find_slots() a more descriptive name
and make it clear how it differs from swiotlb_find_slots().

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h | 15 +++---
 kernel/dma/swiotlb.c| 61 +
 2 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2d453b3e7771..31625ae507ea 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -76,10 +76,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
- * @list:  The free list describing the number of free entries available
- * from each index.
- * @orig_addr: The original address corresponding to a mapped entry.
- * @alloc_size:Size of the allocated buffer.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
@@ -111,6 +107,17 @@ struct io_tlb_mem {
 #endif
 };
 
+/**
+ * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
+ * @dev:Device which has mapped the buffer.
+ * @paddr:  Physical address within the DMA buffer.
+ *
+ * Check if @paddr points into a bounce buffer.
+ *
+ * Return:
+ * * %true if @paddr points into a bounce buffer
+ * * %false otherwise
+ */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0b173303e088..ef5d5e41a17f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,13 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+/**
+ * struct io_tlb_slot - IO TLB slot descriptor
+ * @orig_addr: The original address corresponding to a mapped entry.
+ * @alloc_size:Size of the allocated buffer.
+ * @list:  The free list describing the number of free entries available
+ * from each index.
+ */
 struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -635,11 +642,22 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int 
nslots)
 }
 #endif /* CONFIG_DEBUG_FS */
 
-/*
- * Find a suitable number of IO TLB entries size that will fit this request and
- * allocate a buffer from that IO TLB pool.
+/**
+ * swiotlb_area_find_slots() - search for slots in one IO TLB memory area
+ * @dev:   Device which maps the buffer.
+ * @area_index:Index of the IO TLB memory area to be searched.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Find a suitable sequence of IO TLB entries for the request and allocate
+ * a buffer from the given IO TLB memory area.
+ * This function takes care of locking.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
  */
-static int swiotlb_do_find_slots(struct device *dev, int area_index,
+static int swiotlb_area_find_slots(struct device *dev, int area_index,
phys_addr_t orig_addr, size_t alloc_size,
unsigned int alloc_align_mask)
 {
@@ -734,6 +752,19 @@ static int swiotlb_do_find_slots(struct device *dev, int 
area_index,
return slot_index;
 }
 
+/**
+ * swiotlb_find_slots() - search for slots in the whole swiotlb
+ * @dev:   Device which maps the buffer.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Search through the whole software IO TLB to find a sequence of slots that
+ * match the allocation constraints.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
+ */
 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size, unsigned int alloc_align_mask)
 {
@@ -742,8 +773,8 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
int i = start, index;
 
do {
-   index = swiotlb_do_find_slots(dev, i, orig_addr, alloc_size,
- alloc_align_mask);
+   index = swiotlb_area_find_slots(dev, i, orig_addr, alloc_size,
+   alloc_align_mask);
if (index >= 0)

[PATCH v6 2/9] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

SWIOTLB implementation details should not be exposed to the rest of the
kernel. This will allow to make changes to the implementation without
modifying non-swiotlb code.

To avoid breaking existing users, provide helper functions for the few
required fields.

As a bonus, using a helper function to initialize struct device allows to
get rid of an #ifdef in driver core.

Signed-off-by: Petr Tesarik 
---
 arch/mips/pci/pci-octeon.c |  2 +-
 drivers/base/core.c|  4 +---
 drivers/xen/swiotlb-xen.c  |  2 +-
 include/linux/swiotlb.h| 25 +++-
 kernel/dma/swiotlb.c   | 39 +-
 mm/slab_common.c   |  5 ++---
 6 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
index e457a18cbdc5..d19d9d456309 100644
--- a/arch/mips/pci/pci-octeon.c
+++ b/arch/mips/pci/pci-octeon.c
@@ -664,7 +664,7 @@ static int __init octeon_pci_setup(void)
 
/* BAR1 movable regions contiguous to cover the swiotlb */
octeon_bar1_pci_phys =
-   io_tlb_default_mem.start & ~((1ull << 22) - 1);
+   default_swiotlb_base() & ~((1ull << 22) - 1);
 
for (index = 0; index < 32; index++) {
union cvmx_pci_bar1_indexx bar1_index;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3dff5037943e..46d1d78c5beb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3108,9 +3108,7 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = _tlb_default_mem;
-#endif
+   swiotlb_dev_init(dev);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..946bd56f0ac5 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -381,7 +381,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, default_swiotlb_limit()) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4e52cd5e0bdc..2d453b3e7771 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -110,7 +110,6 @@ struct io_tlb_mem {
atomic_long_t used_hiwater;
 #endif
 };
-extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
@@ -128,13 +127,22 @@ static inline bool is_swiotlb_force_bounce(struct device 
*dev)
 
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
+void swiotlb_dev_init(struct device *dev);
 size_t swiotlb_max_mapping_size(struct device *dev);
+bool is_swiotlb_allocated(void);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+phys_addr_t default_swiotlb_base(void);
+phys_addr_t default_swiotlb_limit(void);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
 }
+
+static inline void swiotlb_dev_init(struct device *dev)
+{
+}
+
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
@@ -151,6 +159,11 @@ static inline size_t swiotlb_max_mapping_size(struct 
device *dev)
return SIZE_MAX;
 }
 
+static inline bool is_swiotlb_allocated(void)
+{
+   return false;
+}
+
 static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
@@ -159,6 +172,16 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline phys_addr_t default_swiotlb_base(void)
+{
+   return 0;
+}
+
+static inline phys_addr_t default_swiotlb_limit(void)
+{
+   return 0;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 66fc8ec9ae45..0b173303e088 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -71,7 +71,7 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
-struct io_tlb_mem io_tlb_default_mem;
+static struct io_tlb_mem io_tlb_default_mem;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
@@ -489,6 +489,15 @@ void __init swiotlb_exit(void)
memset(mem, 0, sizeof(*mem));
 }
 
+/**
+ * swiotlb_dev_init() - initialize swiotlb fields in  device
+ * @dev:   Device to be initialized.
+ */
+void swiotlb_dev_init(struct device *dev)
+{
+   dev->dma_io_tlb_mem = _tlb_default_mem;
+}
+
 /*
  * Return the offset into a iotlb slot

[PATCH v6 1/9] swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

If swiotlb is allocated, immediately return 0, so callers do not have to
check io_tlb_default_mem.nslabs explicitly.

Signed-off-by: Petr Tesarik 
---
 arch/arm/xen/mm.c | 10 --
 arch/x86/kernel/pci-dma.c | 12 ++--
 kernel/dma/swiotlb.c  |  3 +++
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 3d826c0b5fee..882cd70c7a2f 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -125,12 +125,10 @@ static int __init xen_mm_init(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   rc = swiotlb_init_late(swiotlb_size_or_default(),
-  xen_swiotlb_gfp(), NULL);
-   if (rc < 0)
-   return rc;
-   }
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  xen_swiotlb_gfp(), NULL);
+   if (rc < 0)
+   return rc;
 
cflush.op = 0;
cflush.a.dev_bus_addr = 0;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de6be0a3965e..08988b0a1c91 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -86,16 +86,16 @@ static void __init pci_xen_swiotlb_init(void)
 
 int pci_xen_swiotlb_init_late(void)
 {
+   int rc;
+
if (dma_ops == _swiotlb_dma_ops)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   int rc = swiotlb_init_late(swiotlb_size_or_default(),
-  GFP_KERNEL, xen_swiotlb_fixup);
-   if (rc < 0)
-   return rc;
-   }
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  GFP_KERNEL, xen_swiotlb_fixup);
+   if (rc < 0)
+   return rc;
 
/* XXX: this switches the dma ops under live devices! */
dma_ops = _swiotlb_dma_ops;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2b83e3ad9dca..66fc8ec9ae45 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -384,6 +384,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
bool retried = false;
int rc = 0;
 
+   if (io_tlb_default_mem.nslabs)
+   return 0;
+
if (swiotlb_force_disable)
return 0;
 
-- 
2.25.1

[PATCH v6 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-27 Thread Petr Tesarik

From: Petr Tesarik 

Motivation
==

The software IO TLB was designed with these assumptions:

1) It would not be used much. Small systems (little RAM) don't need it, and
   big systems (lots of RAM) would have modern DMA controllers and an IOMMU
   chip to handle legacy devices.
2) A small fixed memory area (64 MiB by default) is sufficient to
   handle the few cases which require a bounce buffer.
3) 64 MiB is little enough that it has no impact on the rest of the
   system.
4) Bounce buffers require large contiguous chunks of low memory. Such
   memory is precious and can be allocated only early at boot.

It turns out they are not always true:

1) Embedded systems may have more than 4GiB RAM but no IOMMU and legacy
   32-bit peripheral busses and/or DMA controllers.
2) CoCo VMs use bounce buffers for all I/O but may need substantially more
   than 64 MiB.
3) Embedded developers put as many features as possible into the available
   memory. A few dozen "missing" megabytes may limit what features can be
   implemented.
4) If CMA is available, it can allocate large continuous chunks even after
   the system has run for some time.

Goals
=

The goal of this work is to start with a small software IO TLB at boot and
expand it later when/if needed.

Design
==

This version of the patch series retains the current slot allocation
algorithm with multiple areas to reduce lock contention, but additional
slots can be added when necessary.

These alternatives have been considered:

- Allocate and free buffers as needed using direct DMA API. This works
  quite well, except in CoCo VMs where each allocation/free requires
  decrypting/encrypting memory, which is a very expensive operation.

- Allocate a very large software IO TLB at boot, but allow to migrate pages
  to/from it (like CMA does). For systems with CMA, this would mean two big
  allocations at boot. Finding the balance between CMA, SWIOTLB and rest of
  available RAM can be challenging. More importantly, there is no clear
  benefit compared to allocating SWIOTLB memory pools from the CMA.

Implementation Constraints
==

These constraints have been taken into account:

1) Minimize impact on devices which do not benefit from the change.
2) Minimize the number of memory decryption/encryption operations.
3) Avoid contention on a lock or atomic variable to preserve parallel
   scalability.

Additionally, the software IO TLB code is also used to implement restricted
DMA pools. These pools are restricted to a pre-defined physical memory
region and must not use any other memory. In other words, dynamic
allocation of memory pools must be disabled for restricted DMA pools.

Data Structures
===

The existing struct io_tlb_mem is the central type for a SWIOTLB allocator,
but it now contains multiple memory pools::

  io_tlb_mem
  +-+   io_tlb_pool
  | SWIOTLB |   +---+   +---+   +---+
  |allocator|-->|default|-->|dynamic|-->|dynamic|-->...
  | |   |memory |   |memory |   |memory |
  +-+   | pool  |   | pool  |   | pool  |
+---+   +---+   +---+

The allocator structure contains global state (such as flags and counters)
and structures needed to schedule new allocations. Each memory pool
contains the actual buffer slots and metadata. The first memory pool in the
list is the default memory pool allocated statically at early boot.

New memory pools are allocated from a kernel worker thread. That's because
bounce buffers are allocated when mapping a DMA buffer, which may happen in
interrupt context where large atomic allocations would probably fail.
Allocation from process context is much more likely to succeed, especially
if it can use CMA.

Nonetheless, the onset of a load spike may fill up the SWIOTLB before the
worker has a chance to run. In that case, try to allocate a small transient
memory pool to accommodate the request. If memory is encrypted and the
device cannot do DMA to encrypted memory, this buffer is allocated from the
coherent atomic DMA memory pool. Reducing the size of SWIOTLB may therefore
require increasing the size of the coherent pool with the "coherent_pool"
command-line parameter.

Performance
===

All testing compared a vanilla v6.4-rc6 kernel with a fully patched
kernel. The kernel was booted with "swiotlb=force" to allow stress-testing
the software IO TLB on a high-performance device that would otherwise not
need it. CONFIG_DEBUG_FS was set to 'y' to match the configuration of
popular distribution kernels; it is understood that parallel workloads
suffer from contention on the recently added debugfs atomic counters.

These benchmarks were run:

- small: single-threaded I/O of 4 KiB blocks,
- big: single-threaded I/O of 64 KiB blocks,
- 4way: 4-way parallel I/O of 4 KiB blocks.

In all tested cases, the default 64 MiB SWIOTLB would be sufficient (but
wasteful). The "default" pair of columns shows performance impact when
booted

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Jan Beulich

On 27.07.2023 13:46, Daniel P. Smith wrote:
> 
> 
> On 7/21/23 02:14, Jan Beulich wrote:
>> On 21.07.2023 00:12, Christopher Clark wrote:
>>> On Thu, Jul 13, 2023 at 11:51 PM Christopher Clark <
>>> christopher.w.cl...@gmail.com> wrote:
>>>


 On Sat, Jul 8, 2023 at 11:47 AM Stefano Stabellini 
 wrote:

> On Sat, 1 Jul 2023, Christopher Clark wrote:
>> To convert the x86 boot logic from multiboot to boot module structures,
>> change the bootstrap map function to accept a boot module parameter.
>>
>> To allow incremental change from multiboot to boot modules across all
>> x86 setup logic, provide a temporary inline wrapper that still accepts a
>> multiboot module parameter and use it where necessary. The wrapper is
>> placed in a new arch/x86 header  to avoid putting a static
>> inline function into an existing header that has no such functions
>> already. This new header will be expanded with additional functions in
>> subsequent patches in this series.
>>
>> No functional change intended.
>>
>> Signed-off-by: Christopher Clark 
>> Signed-off-by: Daniel P. Smith 
>>
>
> [...]
>
>> diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
>> index b72ae31a66..eb93cc3439 100644
>> --- a/xen/include/xen/bootinfo.h
>> +++ b/xen/include/xen/bootinfo.h
>> @@ -10,6 +10,9 @@
>>   #endif
>>
>>   struct boot_module {
>> +paddr_t start;
>> +size_t size;
>
> I think size should be paddr_t (instead of size_t) to make sure it is
> the right size on both 64-bit and 32-bit architectures that support
> 64-bit addresses.
>

 Thanks, that explanation does make sense - ack.

>>>
>>> I've come back to reconsider this as it doesn't seem right to me to store a
>>> non-address value (which this will always be) in a type explicitly defined
>>> to hold an address: addresses may have architectural alignment requirements
>>> whereas a size value is just a number of bytes so will not. The point of a
>>> size_t value is that size_t is defined to be large enough to hold the size
>>> of any valid object in memory, so I think this was right as-is.
>>
>> "Any object in memory" implies virtual addresses (or more generally addresses
>> which can be used for accessing objects). This isn't the case when 
>> considering
>> physical addresses - there may be far more memory in a system than can be 
>> made
>> accessible all in one go.
> 
> That is not my understanding of it, but I could be wrong. My 
> understanding based on all the debates I have read online around this 
> topic is that the intent in the spec is that size_t has to be able to 
> hold a value that represents the largest object the CPU can manipulate 
> with general purpose operations. From which I understand to mean as 
> large as the largest register a CPU instruction may use for a size 
> argument to a general purpose instruction. On x86_64, that is a 64bit 
> register, as I don't believe the SSE/AVX registers are counted even 
> though the are used by compiler/libc implementations to optimize some 
> memory operations.

I can't see how this relates to my earlier remark.

>  From what I have seen for Xen, this is currently reflected in the x86 
> code base, as size_t is 32bits for the early 32bit code and 64bits for 
> Xen proper.
> 
> That aside, another objection I have to the use of paddr_t is that it is 
> type abuse. Types are meant to convey context to the intended use of the 
> variable and enable the ability to enforce proper usage of the variable, 
> otherwise we might as well just use u64/uint64_t and be done. The 
> field's purpose is to convey a size of an object,

You use "object" here again, when in physical address space (with paging
enabled) this isn't an appropriate term.

> and labeling it a type 
> that is intended for physical address objects violates both intents 
> behind declaring a type, it asserts an invalid context and enables 
> violations of type checking.

It is type abuse to a certain extent, yes, but what do you do? We could
invent psize_t, but that would (afaics) always match paddr_t. uint64_t
otoh may be too larger for 32-bit platforms which only know a 32-bit
wide physical address space.

Jan

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Daniel P. Smith





On 7/21/23 18:08, Stefano Stabellini wrote:

On Fri, 21 Jul 2023, Jan Beulich wrote:

On 21.07.2023 00:12, Christopher Clark wrote:

On Thu, Jul 13, 2023 at 11:51 PM Christopher Clark <
christopher.w.cl...@gmail.com> wrote:




On Sat, Jul 8, 2023 at 11:47 AM Stefano Stabellini 
wrote:


On Sat, 1 Jul 2023, Christopher Clark wrote:

To convert the x86 boot logic from multiboot to boot module structures,
change the bootstrap map function to accept a boot module parameter.

To allow incremental change from multiboot to boot modules across all
x86 setup logic, provide a temporary inline wrapper that still accepts a
multiboot module parameter and use it where necessary. The wrapper is
placed in a new arch/x86 header  to avoid putting a static
inline function into an existing header that has no such functions
already. This new header will be expanded with additional functions in
subsequent patches in this series.

No functional change intended.

Signed-off-by: Christopher Clark 
Signed-off-by: Daniel P. Smith 



[...]


diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
index b72ae31a66..eb93cc3439 100644
--- a/xen/include/xen/bootinfo.h
+++ b/xen/include/xen/bootinfo.h
@@ -10,6 +10,9 @@
  #endif

  struct boot_module {
+paddr_t start;
+size_t size;


I think size should be paddr_t (instead of size_t) to make sure it is
the right size on both 64-bit and 32-bit architectures that support
64-bit addresses.



Thanks, that explanation does make sense - ack.



I've come back to reconsider this as it doesn't seem right to me to store a
non-address value (which this will always be) in a type explicitly defined
to hold an address: addresses may have architectural alignment requirements
whereas a size value is just a number of bytes so will not. The point of a
size_t value is that size_t is defined to be large enough to hold the size
of any valid object in memory, so I think this was right as-is.


"Any object in memory" implies virtual addresses (or more generally addresses
which can be used for accessing objects). This isn't the case when considering
physical addresses - there may be far more memory in a system than can be made
accessible all in one go.


Right. And I think size_t is defined as 32-bit in Xen which is a
problem.


In x86 32bit early boot code it is 32bits, but in Xen proper it is 
64bits. That is why in the 32bit HL code, a second set of structures was 
used with macros to ensure the structures used 64bit values for field 
types that are not 64bits in 32bit mode code.


v/r,
dps

Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-27 Thread Daniel P. Smith





On 7/21/23 02:14, Jan Beulich wrote:

On 21.07.2023 00:12, Christopher Clark wrote:

On Thu, Jul 13, 2023 at 11:51 PM Christopher Clark <
christopher.w.cl...@gmail.com> wrote:




On Sat, Jul 8, 2023 at 11:47 AM Stefano Stabellini 
wrote:


On Sat, 1 Jul 2023, Christopher Clark wrote:

To convert the x86 boot logic from multiboot to boot module structures,
change the bootstrap map function to accept a boot module parameter.

To allow incremental change from multiboot to boot modules across all
x86 setup logic, provide a temporary inline wrapper that still accepts a
multiboot module parameter and use it where necessary. The wrapper is
placed in a new arch/x86 header  to avoid putting a static
inline function into an existing header that has no such functions
already. This new header will be expanded with additional functions in
subsequent patches in this series.

No functional change intended.

Signed-off-by: Christopher Clark 
Signed-off-by: Daniel P. Smith 



[...]


diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
index b72ae31a66..eb93cc3439 100644
--- a/xen/include/xen/bootinfo.h
+++ b/xen/include/xen/bootinfo.h
@@ -10,6 +10,9 @@
  #endif

  struct boot_module {
+paddr_t start;
+size_t size;


I think size should be paddr_t (instead of size_t) to make sure it is
the right size on both 64-bit and 32-bit architectures that support
64-bit addresses.



Thanks, that explanation does make sense - ack.



I've come back to reconsider this as it doesn't seem right to me to store a
non-address value (which this will always be) in a type explicitly defined
to hold an address: addresses may have architectural alignment requirements
whereas a size value is just a number of bytes so will not. The point of a
size_t value is that size_t is defined to be large enough to hold the size
of any valid object in memory, so I think this was right as-is.


"Any object in memory" implies virtual addresses (or more generally addresses
which can be used for accessing objects). This isn't the case when considering
physical addresses - there may be far more memory in a system than can be made
accessible all in one go.


That is not my understanding of it, but I could be wrong. My 
understanding based on all the debates I have read online around this 
topic is that the intent in the spec is that size_t has to be able to 
hold a value that represents the largest object the CPU can manipulate 
with general purpose operations. From which I understand to mean as 
large as the largest register a CPU instruction may use for a size 
argument to a general purpose instruction. On x86_64, that is a 64bit 
register, as I don't believe the SSE/AVX registers are counted even 
though the are used by compiler/libc implementations to optimize some 
memory operations.


From what I have seen for Xen, this is currently reflected in the x86 
code base, as size_t is 32bits for the early 32bit code and 64bits for 
Xen proper.


That aside, another objection I have to the use of paddr_t is that it is 
type abuse. Types are meant to convey context to the intended use of the 
variable and enable the ability to enforce proper usage of the variable, 
otherwise we might as well just use u64/uint64_t and be done. The 
field's purpose is to convey a size of an object, and labeling it a type 
that is intended for physical address objects violates both intents 
behind declaring a type, it asserts an invalid context and enables 
violations of type checking.


V/r,
Daniel P. Smith

Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-27 Thread Jan Beulich

On 27.07.2023 12:31, Volodymyr Babchuk wrote:
> 
> Hi Jan
> 
> Jan Beulich  writes:
> 
>> On 27.07.2023 02:56, Volodymyr Babchuk wrote:
>>> Hi Roger,
>>>
>>> Roger Pau Monné  writes:
>>>
 On Wed, Jul 26, 2023 at 01:17:58AM +, Volodymyr Babchuk wrote:
>
> Hi Roger,
>
> Roger Pau Monné  writes:
>
>> On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
>>> From: Oleksandr Andrushchenko 
>>> @@ -498,6 +537,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
>>> unsigned int size,
>>>  ASSERT(data_offset < size);
>>>  }
>>>  spin_unlock(>vpci->lock);
>>> +unlock_locks(d);
>>
>> There's one issue here, some handlers will cal pcidevs_lock(), which
>> will result in a lock over inversion, as in the previous patch we
>> agreed that the locking order was pcidevs_lock first, d->pci_lock
>> after.
>>
>> For example the MSI control_write() handler will call
>> vpci_msi_arch_enable() which takes the pcidevs lock.  I think I will
>> have to look into using a dedicated lock for MSI related handling, as
>> that's the only place where I think we have this pattern of taking the
>> pcidevs_lock after the d->pci_lock.
>
> I'll mention this in the commit message. Is there something else that I
> should do right now?

 Well, I don't think we want to commit this as-is with a known lock
 inversion.

 The functions that require the pcidevs lock are:

 pt_irq_{create,destroy}_bind()
 unmap_domain_pirq()

 AFAICT those functions require the lock in order to assert that the
 underlying device doesn't go away, as they do also use d->event_lock
 in order to get exclusive access to the data fields.  Please double
 check that I'm not mistaken.
>>>
>>> You are right, all three function does not access any of PCI state
>>> directly. However...
>>>
 If that's accurate you will have to check the call tree that spawns
 from those functions in order to modify the asserts to check for
 either the pcidevs or the per-domain pci_list lock being taken.
>>>
>>> ... I checked calls for PT_IRQ_TYPE_MSI case, there is only call that
>>> bothers me: hvm_pi_update_irte(), which calls IO-MMU code via
>>> vmx_pi_update_irte():
>>>
>>> amd_iommu_msi_msg_update_ire() or msi_msg_write_remap_rte().
>>>
>>> Both functions read basic pdev fields like sbfd or type. I see no
>>> problem there, as values of those fields are not supposed to be changed.
>>
>> But whether fields are basic or will never change doesn't matter when
>> the pdev struct itself suddenly disappears.
> 
> This is not a problem, as it is expected that d->pci_lock is being held,
> so pdev structure will not disappear. I am trying to answer another
> question: is d->pci_lock enough or pcidevs_lock is also should required?

To answer such questions, may I ask that you first firmly write down
(and submit) what each of the locks guards?

Jan

Re: [PATCH v2] vpci: add permission checks to map_range()

2023-07-27 Thread Jan Beulich

On 27.07.2023 13:07, Daniel P. Smith wrote:
> 
> 
> On 7/27/23 03:56, Jan Beulich wrote:
>> On 26.07.2023 16:01, Roger Pau Monne wrote:
>>> Just like it's done for the XEN_DOMCTL_memory_mapping hypercall, add
>>> the permissions checks to vPCI map_range(), which is used to map the
>>> BARs into the domain p2m.
>>>
>>> Adding those checks requires that for x86 PVH hardware domain builder
>>> the permissions are set before initializing the IOMMU, or else
>>> attempts to initialize vPCI done as part of IOMMU device setup will
>>> fail due to missing permissions to create the BAR mappings.
>>>
>>> While moving the call to dom0_setup_permissions() convert the panic()
>>> used for error handling to a printk, the caller will already panic if
>>> required.
>>>
>>> Fixes: 9c244fdef7e7 ('vpci: add header handlers')
>>> Signed-off-by: Roger Pau Monné 
>>
>> I've committed this, but despite the Fixes: tag I'm not sure this
>> wants backporting. Thoughts?
> 
>  From a cursory review thus far, since this introduced a new XSM hook 
> site, shouldn't this have at least had an Rb by an XSM 
> reviewer/maintainer?

Probably, but already back then I said this model isn't going to work
flawlessly.

> I would have replied sooner, but have been on holiday for last two weeks.

I guess there was no way for us to know without you sending a note to
private@ (which, I will admit, you may not even have been aware of).

Jan

Re: [PATCH v7 11/15] xenpm: Print HWP/CPPC parameters

2023-07-27 Thread Anthony PERARD

On Wed, Jul 26, 2023 at 01:09:41PM -0400, Jason Andryuk wrote:
> @@ -772,6 +812,32 @@ static void print_cpufreq_para(int cpuid, struct 
> xc_get_cpufreq_para *p_cpufreq)
> p_cpufreq->u.s.scaling_min_freq,
> p_cpufreq->u.s.scaling_cur_freq);
>  }
> +else
> +{

I feel like this could be confusing. In this function, we have both:
if ( hwp ) { this; } else { that; }
and
if ( !hwp ) { that; } else { this; }

If we could have the condition in the same order, or use the same
condition for both "true" blocks, that would be nice.

> +const xc_cppc_para_t *cppc = _cpufreq->u.cppc_para;
> +
> +printf("cppc variables   :\n");
> +printf("  hardware limits: lowest [%u] lowest nonlinear [%u]\n",
> +   cppc->lowest, cppc->lowest_nonlinear);

All these %u should be %"PRIu32", right? Even if the rest of the
function is bogus... and even if it's probably be a while before %PRIu32
is different from %u.

Thanks,

-- 
Anthony PERARD

1 2 >

1 - 100 of 194 matches

Mail list logo