Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Oleksandr Andrushchenko



On 31.08.21 11:35, Jan Beulich wrote:

On 31.08.2021 10:14, Oleksandr Andrushchenko wrote:

On 31.08.21 11:05, Jan Beulich wrote:

On 31.08.2021 09:56, Oleksandr Andrushchenko wrote:

On 31.08.21 10:47, Jan Beulich wrote:

On 31.08.2021 09:06, Oleksandr Andrushchenko wrote:

On 31.08.21 09:51, Jan Beulich wrote:

On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:

On 30.08.21 16:04, Jan Beulich wrote:

@@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
   * Check for overlaps with other BARs. Note that only BARs that are
   * currently mapped (enabled) are checked for overlaps.
   */
-for_each_pdev ( pdev->domain, tmp )
+for ( d = pdev->domain; ; d = dom_xen ) {//todo

I am not quite sure this will be correct for the cases where pdev->domain != 
dom0,
e.g. in the series for PCI passthrough for Arm this can be any guest. For such 
cases
we'll force running the loop for dom_xen which I am not sure is desirable.

It is surely not desirable, but it also doesn't happen - see the
is_hardware_domain() check further down (keeping context below).

Right

Another question is why such a hidden device has its pdev->domain not set 
correctly,
so we need to work this around?

Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
("PCI: don't allow guest assignment of devices used by Xen")
introducing that temporary override. To permit limited
visibility to Dom0, these devices still need setting up in the
IOMMU for Dom0. Consequently BAR overlap detection also needs
to take these into account (i.e. the goal here is not just to
prevent triggering the ASSERT() in question).

So, why don't we set pdev->domain = dom_xen for such devices and call
modify_bars or something from pci_hide_device for instance (I didn't get too
much into implementation details though)? If pci_hide_device already handles
such exceptions, so it should also take care of the correct BAR overlaps etc.

How would it? It runs long before Dom0 gets created, let alone when
Dom0 may make adjustments to the BAR arrangement.

So, why don't we call "yet another hide function" while creating Dom0 for that
exactly reason, e.g. BAR overlap handling? E.g. make it 2-stage hide for special
devices such as console etc.

This might be an option, but is imo going to result not only in more
code churn, but also in redundant code. After all what modify_bars()
needs is the union of BARs from Dom0's and DomXEN's devices.

To me DomXEN here is yet another workaround as strictly speaking
vpci code didn't need and doesn't(?) need it at the moment. Yes, at least on 
Arm.
So, I do understand why you want it there, but this then does need a very
good description of what is happening and why...


The temporary overriding of pdev->domain is because other IOMMU code
takes the domain to act upon from that field.

So, you mean pdev->domain in that case is pointing to what?

Did you look at the function I've pointed you at? DomXEN there gets
temporarily overridden to Dom0.

This looks like yet another workaround to me which is not cute.
So, the overall solution is spread over multiple subsystems, each
introducing something which is hard to follow

If you have any better suggestions, I'm all ears. Or feel free to send
patches.


Unfortunately I don't have any. But, could you please at least document a bit

more what is happening here with DomXEN: either in the commit message or

in the code itself, so it is easier to understand why vpci has this code at 
all...

Thank you,

Oleksandr



Jan





[linux-linus test] 164673: regressions - FAIL

2021-08-31 Thread osstest service owner
flight 164673 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164673/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1  13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl-thunderx 13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm 13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2  13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl  13 debian-fixup fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm  13 debian-fixup fail REGR. vs. 152332

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152332
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152332
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check 

[xen-unstable test] 164667: regressions - FAIL

2021-08-31 Thread osstest service owner
flight 164667 xen-unstable real [real]
flight 164685 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/164667/
http://logs.test-lab.xenproject.org/osstest/logs/164685/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-amd  8 xen-boot  fail REGR. vs. 164477
 test-amd64-amd64-dom0pvh-xl-intel  8 xen-bootfail REGR. vs. 164477

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-examine  4 memdisk-try-append  fail pass in 164685-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 164477
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 164477
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 164477
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 164477
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 164477
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 164477
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 164477
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 164477
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 164477
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 164477
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 164477
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  daaf007eb3467f900a2e20fadbc4c6f3bfcaa356
baseline version:
 xen  a931e8e64af07bd333a31f3b71a3f8f3e7910857

Last test of basis   164477  2021-08-25 09:09:48 Z6 days

[ovmf test] 164674: all pass - PUSHED

2021-08-31 Thread osstest service owner
flight 164674 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164674/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf cae735f61328d64e2e8991036707b9e78c0f5f63
baseline version:
 ovmf 77d5fa80246e8784f89e109ff9dadfeb7089ff85

Last test of basis   164630  2021-08-30 01:10:45 Z2 days
Testing same since   164674  2021-08-31 02:56:52 Z1 days1 attempts


People who touched revisions under test:
  Cheng Zhou 
  Grzegorz Bernacki 
  Hao A Wu 
  Marvin H?user 
  zhoucheng 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   77d5fa8024..cae735f613  cae735f61328d64e2e8991036707b9e78c0f5f63 -> 
xen-tested-master



[libvirt test] 164677: regressions - FAIL

2021-08-31 Thread osstest service owner
flight 164677 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164677/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  dea67b1de03dd0544b8d05322f4300abd44221bd
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  417 days
Failing since151818  2020-07-11 04:18:52 Z  416 days  407 attempts
Testing same since   164677  2021-08-31 06:51:17 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  Dmytro Linkin 
  Eiichi Tsukata 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Ian Wienand 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jinsheng Zhang 
  Jiri Denemark 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Richard W.M. Jones 
  Ricky Tigg 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  Shi Lei 
  simmon 
  Simon Chopin 
  Simon Gaiser 
  Simon Rowe 
  Stefan Bader 
  Stefan Berger 
  Stefan Berger 
  Stefan Hajnoczi 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Tomáš Janoušek 
  Tuguoyi 
  Ville Skyttä 
  Vinayak Kale 
  Wang Xin 
  WangJian 
  Weblate 
  Wei Liu 
  Wei Liu 
  William Douglas 
  Yalei Li <274268...@qq.com>
  Yalei Li 
  Yang Fei 
  Yang Hang 
  Yanqiu Zhang 
  Yaroslav Kargin 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zbigniew Jędrzejewski-Szmek 
  zhangjl02 
  Zheng Chuan 
  zhenwei pi 
  Zhenyu Ye 
  Zhenyu Zheng 
  Zhenzhong Duan 

jobs:
 build-amd64-xsm  pass   

[xen-unstable-smoke test] 164684: tolerable all pass - PUSHED

2021-08-31 Thread osstest service owner
flight 164684 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164684/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  96607a8e680e7f965ca868d11f8b0636317d2618
baseline version:
 xen  6b4f6a31ace125d658a581e8d10809e4fccdc272

Last test of basis   164682  2021-08-31 16:02:54 Z0 days
Testing same since   164684  2021-08-31 20:01:34 Z0 days1 attempts


People who touched revisions under test:
  Fabrice Fontaine 
  Ian Jackson 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   6b4f6a31ac..96607a8e68  96607a8e680e7f965ca868d11f8b0636317d2618 -> smoke



Re: Disable IOMMU in Dom0

2021-08-31 Thread Stefano Stabellini
On Tue, 31 Aug 2021, Roman Skakun wrote:
> Hi, Stefano!
> 
> I have seen your negotiation of disabling xen-swiotlb for devices which are 
> controlled by IOMMU in Dom0:
> https://patchwork.kernel.org/project/xen-devel/patch/alpine.DEB.2.21.2102161333090.3234@sstabellini-ThinkPad-T480s/
> 
> As I was thinking to create a common implementation because I have the issue
> when device controlled by IOMMU using xen-swiotlb fops and bounce buffer as 
> the result.
> https://lore.kernel.org/xen-devel/060b5741-922c-115c-7e8c-97d8aa5f4...@xen.org/T/
> 
> Do you have any future plans to finish implementation for upstream?

Hi Roman,

The feature is already upstream in Linux as f5079a9a2, and the new
feature flags are XENFEAT_direct_mapped and XENFEAT_not_direct_mapped.

If you have a setup where Dom0 is not 1:1 mapped (which is not currently
possible with upstream Xen but it is possible with cache coloring) and
uses the IOMMU to make device DMA work like regular DomUs, then passing
XENFEAT_not_direct_mapped to Linux would make it work. Without
XENFEAT_not_direct_mapped, Linux would try to use swiotlb-xen which has
code that relies on Linux being 1:1 mapped to work properly.

Is that the same problem that you have, or is dom0 1:1 mapped in your
case? If dom0 is 1:1 mapped then swiotlb-xen should work regardless of
whether the IOMMU is enabled or disabled.

I hope this helps.

Cheers,

Stefano



RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map

2021-08-31 Thread Stefano Stabellini
On Tue, 31 Aug 2021, Wei Chen wrote:
> Hi Stefano,
> 
> > -Original Message-
> > From: Stefano Stabellini 
> > Sent: 2021年8月31日 8:48
> > To: Wei Chen 
> > Cc: xen-devel@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org;
> > jbeul...@suse.com; Bertrand Marquis 
> > Subject: Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse
> > device tree NUMA distance map
> > 
> > On Wed, 11 Aug 2021, Wei Chen wrote:
> > > A NUMA aware device tree will provide a "distance-map" node to
> > > describe distance between any two nodes. This patch introduce a
> > > new helper to parse this distance map.
> > >
> > > Signed-off-by: Wei Chen 
> > > ---
> > >  xen/arch/arm/numa_device_tree.c | 67 +
> > >  1 file changed, 67 insertions(+)
> > >
> > > diff --git a/xen/arch/arm/numa_device_tree.c
> > b/xen/arch/arm/numa_device_tree.c
> > > index bbe081dcd1..6e0d1d3d9f 100644
> > > --- a/xen/arch/arm/numa_device_tree.c
> > > +++ b/xen/arch/arm/numa_device_tree.c
> > > @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt,
> > int node,
> > >
> > >  return 0;
> > >  }
> > > +
> > > +/* Parse NUMA distance map v1 */
> > > +int __init
> > > +device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
> > > +{
> > > +const struct fdt_property *prop;
> > > +const __be32 *matrix;
> > > +int entry_count, len, i;
> > > +
> > > +printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> > > +
> > > +prop = fdt_get_property(fdt, node, "distance-matrix", );
> > > +if ( !prop )
> > > +{
> > > +printk(XENLOG_WARNING
> > > +   "NUMA: No distance-matrix property in distance-map\n");
> > > +
> > > +return -EINVAL;
> > > +}
> > > +
> > > +if ( len % sizeof(uint32_t) != 0 )
> > > +{
> > > +printk(XENLOG_WARNING
> > > +   "distance-matrix in node is not a multiple of u32\n");
> > > +return -EINVAL;
> > > +}
> > > +
> > > +entry_count = len / sizeof(uint32_t);
> > > +if ( entry_count <= 0 )
> > > +{
> > > +printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> > > +
> > > +return -EINVAL;
> > > +}
> > > +
> > > +matrix = (const __be32 *)prop->data;
> > > +for ( i = 0; i + 2 < entry_count; i += 3 )
> > > +{
> > > +uint32_t from, to, distance;
> > > +
> > > +from = dt_read_number(matrix, 1);
> > > +matrix++;
> > > +to = dt_read_number(matrix, 1);
> > > +matrix++;
> > > +distance = dt_read_number(matrix, 1);
> > > +matrix++;
> > > +
> > > +if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> > > +(from != to && distance <= NUMA_LOCAL_DISTANCE) )
> > > +{
> > > +printk(XENLOG_WARNING
> > > +   "Invalid nodes' distance from node#%d to node#%d
> > = %d\n",
> > > +   from, to, distance);
> > > +return -EINVAL;
> > > +}
> > > +
> > > +printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d
> > = %d\n",
> > > +   from, to, distance);
> > > +numa_set_distance(from, to, distance);
> > > +
> > > +/* Set default distance of node B->A same as A->B */
> > > +if (to > from)
> > > + numa_set_distance(to, from, distance);
> > 
> > I am a bit unsure about this last 2 lines: why calling numa_set_distance
> > in the opposite direction only when to > from? Wouldn't it be OK to
> > always do both:
> > 
> > numa_set_distance(from, to, distance);
> > numa_set_distance(to, from, distance);
> > 
> > ?
> > 
> I borrowed this code from Linux, but here is my understanding:
> 
> First, I read some notes in Documentation/devicetree/bindings/numa.txt
> 1. Each entry represents distance from first node to second node.
> The distances are equal in either direction.
> 2. distance-matrix should have entries in lexicographical ascending
> order of nodes.
> 
> Here is an example of distance-map node in DTB:
> Sample#1, full list:
>   distance-map {
>compatible = "numa-distance-map-v1";
>distance-matrix = <0 0  10>,
>  <0 1  20>,
>  <0 2  40>,
>  <0 3  20>,
>  <1 0  20>,
>  <1 1  10>,
>  <1 2  20>,
>  <1 3  40>,
>  <2 0  40>,
>  <2 1  20>,
>  <2 2  10>,
>  <2 3  20>,
>  <3 0  20>,
>  <3 1  40>,
>  <3 2  20>,
>  

Re: [PATCH 4/4] x86/PV: properly set shadow allocation for Dom0

2021-08-31 Thread Tim Deegan
At 15:03 +0200 on 30 Aug (1630335824), Jan Beulich wrote:
> Leaving shadow setup just to the L1TF tasklet means running Dom0 on a
> minimally acceptable shadow memory pool, rather than what normally
> would be used (also, for example, for PVH). Populate the pool before
> triggering the tasklet, on a best effort basis (again like done for
> PVH).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 



[xen-4.13-testing test] 164644: regressions - FAIL

2021-08-31 Thread osstest service owner
flight 164644 xen-4.13-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164644/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-amd  8 xen-boot  fail REGR. vs. 163761
 test-amd64-amd64-dom0pvh-xl-intel  8 xen-bootfail REGR. vs. 163761

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail in 164577 pass in 
164644
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail in 164577 pass in 
164644
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 
164577

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 163761
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 163761
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 163761
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 163761
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 163761
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 163761
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 163761
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 163761
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 163761
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 163761
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 163761
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  dd29f4f4961d5c99660874c7ff090bd3c2ef6e5b
baseline version:
 xen  

Disable IOMMU in Dom0

2021-08-31 Thread Roman Skakun
Hi, Stefano!

I have seen your negotiation of disabling xen-swiotlb for devices which are 
controlled by IOMMU in Dom0:
https://patchwork.kernel.org/project/xen-devel/patch/alpine.DEB.2.21.2102161333090.3234@sstabellini-ThinkPad-T480s/

As I was thinking to create a common implementation because I have the issue
when device controlled by IOMMU using xen-swiotlb fops and bounce buffer as the 
result.
https://lore.kernel.org/xen-devel/060b5741-922c-115c-7e8c-97d8aa5f4...@xen.org/T/

Do you have any future plans to finish implementation for upstream?

Cheers!


[xen-unstable-smoke test] 164682: tolerable all pass - PUSHED

2021-08-31 Thread osstest service owner
flight 164682 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164682/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  6b4f6a31ace125d658a581e8d10809e4fccdc272
baseline version:
 xen  daaf007eb3467f900a2e20fadbc4c6f3bfcaa356

Last test of basis   164649  2021-08-30 14:00:34 Z1 days
Testing same since   164682  2021-08-31 16:02:54 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   daaf007eb3..6b4f6a31ac  6b4f6a31ace125d658a581e8d10809e4fccdc272 -> smoke



Re: [XEN RFC PATCH 25/40] xen/arm: unified entry to parse all NUMA data from device tree

2021-08-31 Thread Julien Grall

Hi Stefano,

On 31/08/2021 01:54, Stefano Stabellini wrote:

On Wed, 11 Aug 2021, Wei Chen wrote:

In this API, we scan whole device tree to parse CPU node id, memory
node id and distance-map. Though early_scan_node will invoke has a
handler to process memory nodes. If we want to parse memory node id
in this handler, we have to embeded NUMA parse code in this handler.
But we still need to scan whole device tree to find CPU NUMA id and
distance-map. In this case, we include memory NUMA id parse in this
API too. Another benefit is that we have a unique entry for device
tree NUMA data parse.

Signed-off-by: Wei Chen 
---
  xen/arch/arm/numa_device_tree.c | 31 ---
  xen/include/asm-arm/numa.h  |  1 +
  2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index 6e0d1d3d9f..27ffb72f7b 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -131,7 +131,8 @@ save_memblk:
  }
  
  /* Parse CPU NUMA node info */

-int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
+static int __init
+device_tree_parse_numa_cpu_node(const void *fdt, int node)
  {
  uint32_t nid;
  
@@ -147,7 +148,7 @@ int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)

  }
  
  /* Parse memory node NUMA info */

-int __init
+static int __init
  device_tree_parse_numa_memory_node(const void *fdt, int node,
  const char *name, uint32_t addr_cells, uint32_t size_cells)
  {
@@ -202,7 +203,7 @@ device_tree_parse_numa_memory_node(const void *fdt, int 
node,
  }
  
  /* Parse NUMA distance map v1 */

-int __init
+static int __init
  device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
  {
  const struct fdt_property *prop;
@@ -267,3 +268,27 @@ device_tree_parse_numa_distance_map_v1(const void *fdt, 
int node)
  
  return 0;

  }
+
+static int __init fdt_scan_numa_nodes(const void *fdt,
+int node, const char *uname, int depth,
+u32 address_cells, u32 size_cells, void *data)
+{
+int ret = 0;
+
+if ( fdt_node_check_type(fdt, node, "cpu") == 0 )
+ret = device_tree_parse_numa_cpu_node(fdt, node);
+else if ( fdt_node_check_type(fdt, node, "memory") == 0 )
+ret = device_tree_parse_numa_memory_node(fdt, node, uname,
+address_cells, size_cells);
+else if ( fdt_node_check_compatible(fdt, node,
+"numa-distance-map-v1") == 0 )
+ret = device_tree_parse_numa_distance_map_v1(fdt, node);
+
+return ret;
+}


Julien, do you have an opinion on whether it might be worth reusing the
existing early_scan_node function for this to avoiding another full FDT
scan (to avoid another call to device_tree_for_each_node)?


I don't like the full FDT scan and actually drafted an e-mail in 
reply-to [1] to suggest parse all the NUMA information from 
early_scan_node().


However, we don't know whether ACPI or DT will be used at the time 
early_scan_node() is called. So we will need to revert any change which 
can make the code a little more awkward.


So I decided to drop my e-mail because I prefer the full DT scan for 
now. We can look at optimizing later if this becomes a pain point.


Cheers,

[1] 
https://lore.kernel.org/xen-devel/db9pr08mb6857604b3d4b690f2b8832bd9e...@db9pr08mb6857.eurprd08.prod.outlook.com/


--
Julien Grall



RE: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-08-31 Thread Michael Kelley
From: Christoph Hellwig  Sent: Monday, August 30, 2021 5:01 AM
> 
> Sorry for the delayed answer, but I look at the vmap_pfn usage in the
> previous version and tried to come up with a better version.  This
> mostly untested branch:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap
> 
> get us there for swiotlb and the channel infrastructure  I've started
> looking at the network driver and didn't get anywhere due to other work.
> 
> As far as I can tell the network driver does gigantic multi-megabyte
> vmalloc allocation for the send and receive buffers, which are then
> passed to the hardware, but always copied to/from when interacting
> with the networking stack.  Did I see that right?  Are these big
> buffers actually required unlike the normal buffer management schemes
> in other Linux network drivers?
> 
> If so I suspect the best way to allocate them is by not using vmalloc
> but just discontiguous pages, and then use kmap_local_pfn where the
> PFN includes the share_gpa offset when actually copying from/to the
> skbs.

As a quick overview, I think there are four places where the
shared_gpa_boundary must be applied to adjust the guest physical
address that is used.  Each requires mapping a corresponding
virtual address range.  Here are the four places:

1)  The so-called "monitor pages" that are a core communication
mechanism between the guest and Hyper-V.  These are two single
pages, and the mapping is handled by calling memremap() for
each of the two pages.  See Patch 7 of Tianyu's series.

2)  The VMbus channel ring buffers.  You have proposed using
your new  vmap_phys_range() helper, but I don't think that works
here.  More details below.

3)  The network driver send and receive buffers.  vmap_phys_range()
should work here.

4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
should work here as well.

Case #2 above does unusual mapping.  The ring buffer consists of a ring
buffer header page, followed by one or more pages that are the actual
ring buffer.  The pages making up the actual ring buffer are mapped
twice in succession.  For example, if the ring buffer has 4 pages
(one header page and three ring buffer pages), the contiguous
virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
The duplicate contiguous mapping allows the code that is reading
or writing the actual ring buffer to not be concerned about wrap-around
because writing off the end of the ring buffer is automatically
wrapped-around by the mapping.  The amount of data read or
written in one batch never exceeds the size of the ring buffer, and
after a batch is read or written, the read or write indices are adjusted
to put them back into the range of the first mapping of the actual
ring buffer pages.  So there's method to the madness, and the
technique works pretty well.  But this kind of mapping is not
amenable to using vmap_phys_range().

Michael





Re: [PATCH] libs/light: fix tv_sec fprintf format

2021-08-31 Thread Ian Jackson
Fabrice Fontaine writes ("Re: [PATCH] libs/light: fix tv_sec fprintf format"):
> Le lun. 30 août 2021 à 10:31, Juergen Gross  a écrit :
> >
> > On 28.08.21 11:07, Fabrice Fontaine wrote:
> > > Don't assume tv_sec is a unsigned long, it is 64 bits on NetBSD 32 bits.
> > > Use %jd and cast to (intmax_t) instead
> > >
> > > Signed-off-by: Fabrice Fontaine 
> > > ---
> > >   tools/libs/light/libxl_domain.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/libs/light/libxl_domain.c 
> > > b/tools/libs/light/libxl_domain.c
> > > index c00c36c928..51a6127552 100644
> > > --- a/tools/libs/light/libxl_domain.c
> > > +++ b/tools/libs/light/libxl_domain.c
> > > @@ -1444,7 +1444,7 @@ static int libxl__mark_domid_recent(libxl__gc *gc, 
> > > uint32_t domid)
> > >   }
> > >   }
> > >
> > > -r = fprintf(nf, "%lu %u\n", ctxt.ts.tv_sec, domid);
> > > +r = fprintf(nf, "%jd %u\n", (intmax_t)ctxt.ts.tv_sec, domid);
> >
> > Any reason not to keep the unsigned attribute?
> Yes, for consistency, I applied the same fix that was merged seven
> months ago to libs/light/libxl_create.c:
> https://gitlab.com/xen-project/xen/-/commit/a8ac01aa3e3ea5e6a9a1620aa8fa7e9da3458120

Thanks,

Reviewed-by: Ian Jackson 

and pushed.

Ian.



Re: [PATCH v3 7/7] xen/arm: Sanitize CTR_EL0 and emulate it if needed

2021-08-31 Thread Bertrand Marquis
Hi Julien,

> On 31 Aug 2021, at 15:47, Julien Grall  wrote:
> 
> 
> 
> On 31/08/2021 14:17, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>>> On 27 Aug 2021, at 16:05, Julien Grall  wrote:
>>> 
>>> Hi Bertrand,
>>> 
>>> On 25/08/2021 14:18, Bertrand Marquis wrote:
 Sanitize CTR_EL0 value between cores.
 In most cases different values will taint Xen but if different
 i-cache policies are found, we choose the one which will be compatible
 between all cores in terms of invalidation/data cache flushing strategy.
>>> 
>>> I understand that all the CPUs in Xen needs to agree on the cache flush 
>>> strategy. However...
>>> 
 In this case we need to activate the TID2 bit in HCR to emulate the
 TCR_EL0 register for guests. This patch is not activating TID2 bit all
 the time to limit the overhead when possible.
>>> 
>>> as we discussed in an earlier version, a vCPU is unlikely (at least in 
>>> short/medium) to be able move across pCPU of different type. So the vCPU 
>>> would be pinned to a set of pCPUs. IOW, the guest would have to be 
>>> big.LITTLE aware and therefore would be able to do its own strategy 
>>> decision.
>>> 
>>> So I think we should be able to get away from trappings the registers.
>> I do agree that we should be able to get away from that in the long term once
>> we have cpupools properly set but right now this is the only way to have
>> something useable (I will not say right).
>> I will work on finding a way to setup properly cpupools (or something else as
>> we discussed earlier) but in the short term I think this is the best we can 
>> do.
> 
> My concern is you are making look like Xen will be able to deal nicely with 
> big.LITTLE when in fact there are a lot more potential issue by allow a vCPU 
> moving accross pCPU of different type (the errata is one example).

I agree and this is why Xen is tainted.

> 
>> An other solution would be to discard this patch from the serie for now until
>> I have worked a proper solution for this case.
>> Should we discard or merge or do you have an other idea ?
> Please correct me if I am wrong, at the moment, it doesn't look like this 
> patch will be part of the longer plan. If so, then I think it should be 
> parked for now.

Not sure it depends on what the final solution would be but this is highly 
possible I agree.

> 
> This would also have the advantage to avoid spending too much time on 
> resolving the emulation issue I mentioned in my previous answer.
> 
> No need to resend a new version of this series yet. You can wait until the 
> rest of the series get more feedback.

Ok, I will wait for feedback and next serie will not include this patch anymore.

> 
> [...]
> 
>> If we get interrupted, someone could program CSSELR differently and the next 
>> read
>> will not be reflecting what the guest actually wants to do
> 
> AFAICT, CSSELR is preserved during the context switch of vCPU. So that 
> someone would have to be Xen, right?
> 
> If so, what you describe would also be an issue even if we didn't trap the 
> register. Therefore, if Xen would ever use CSSELR, then that code would need 
> to save the value, use the register and then restore the value with 
> preemption disabled.

I could just remove the comment, I added it as information, but if you think it 
is misleading no problem.

Anyway as we will park this for now no need to discuss that further.

Cheers
Bertrand

> 
>> The code is not preemptible right now so this cannot be an issue but I added 
>> the
>>  comment more as a warning.
>> This is not something from the documentation, this is because value written
>> in CSSELR is defining what is read from CCSIDR
>>> 
 +WRITE_SYSREG(v->arch.csselr, CSSELR_EL1);
 +set_user_reg(regs, regidx, READ_SYSREG(CCSIDR_EL1));
 +}
> 
> Cheers,
> 
> -- 
> Julien Grall




Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

2021-08-31 Thread Jan Beulich
On 31.08.2021 17:25, Andrew Cooper wrote:
> On 31/08/2021 14:26, Jan Beulich wrote:
>> On 31.08.2021 15:16, Andrew Cooper wrote:
>>> On 30/08/2021 14:02, Jan Beulich wrote:
 Further permit "access" to differ in the "executable" attribute. While
 ideally only ROM regions would get mapped with X set, getting there is
 quite a bit of work. Therefore, as a temporary measure, permit X to
 vary. For Dom0 the more permissive of the types will be used, while for
 DomU it'll be the more restrictive one.
>>> Split behaviour between dom0 and domU based on types alone cannot
>>> possibly be correct.
>> True, but what do you do.
>>
>>> DomU's need to execute ROMs too, and this looks like will malfunction if
>>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>>
>>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>>> just take the latest access.
>> And how do we know that that's what is going to work?
> 
> Because it's the pre-existing behaviour.

Valid point. But for the DomU case there simply has not been any
pre-existing behavior. Hence my desire to be restrictive initially
there.

>>  We should
>> strictly accumulate for Dom0. And what we do for DomU is moot for
>> the moment, until PCI passthrough becomes a thing for PVH. Hence
>> I've opted to be restrictive there - I'd rather see things break
>> (and getting adjusted) when this future work actually gets carried
>> out, than leave things permissive for no-one to notice that it's
>> too permissive, leading to an XSA.
> 
> Restricting execute permissions is something unique to virt.  It doesn't
> exist in a non-virtualised system, as I and D side reads are
> indistinguishable outside of the core.
> 
> Furthermore, it is inexpressible on some systems/configurations.
> 
> Introspection is the only technology which should be restricting execute
> permissions in the p2m, and only when it takes responsibility for
> dealing with the fallout.

IOW are you saying that the calls to set_identity_p2m_entry()
(pre-dating XSA-378) were wrong to use p2m_access_rw? Because that's
what's getting the way here.

Plus, as a side note, then we don't even have e.g. IOMMUF_executable.

Jan




Re: [PATCH 3/4] x86/PVH: improve Dom0 memory size calculation

2021-08-31 Thread Jan Beulich
On 31.08.2021 16:07, Andrew Cooper wrote:
> On 30/08/2021 14:03, Jan Beulich wrote:
>> Assuming that the accounting for IOMMU page tables will also take care
>> of the P2M needs was wrong: dom0_paging_pages() can determine a far
>> higher value, high enough for the system to run out of memory while
>> setting up Dom0. Hence in the case of shared page tables the larger of
>> the two values needs to be used (without shared page tables the sum of
>> both continues to be applicable).
> 
> I'm afraid that I can't follow this at all.
> 
> What causes the system to run out of RAM while constructing dom0?

Without any "dom0_mem=" we set aside 128Mb. In my .config that was in
use I default to "dom0_mem=-255". Yet this was still far from enough to
cover the gap between the original IOMMU page table accounting and the
value returned from dom0_paging_pages(). Since this is what also gets
used to actually populate Dom0's P2M pool, with the P2M pool populated
there wasn't enough RAM anymore to reach the Dom0 size which
dom0_compute_nr_pages() did establish.

Putting it in a simplified formula, what we did so far when sharing
page tables was

RAM = total - IOMMU

but what we need is

RAM = total - max(IOMMU, P2M)

In the non-shared case we already correctly did

RAM = total - (IOMMU + P2M)

Jan




Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

2021-08-31 Thread Andrew Cooper
On 31/08/2021 14:26, Jan Beulich wrote:
> On 31.08.2021 15:16, Andrew Cooper wrote:
>> On 30/08/2021 14:02, Jan Beulich wrote:
>>> Further permit "access" to differ in the "executable" attribute. While
>>> ideally only ROM regions would get mapped with X set, getting there is
>>> quite a bit of work. Therefore, as a temporary measure, permit X to
>>> vary. For Dom0 the more permissive of the types will be used, while for
>>> DomU it'll be the more restrictive one.
>> Split behaviour between dom0 and domU based on types alone cannot
>> possibly be correct.
> True, but what do you do.
>
>> DomU's need to execute ROMs too, and this looks like will malfunction if
>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>
>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>> just take the latest access.
> And how do we know that that's what is going to work?

Because it's the pre-existing behaviour.

>  We should
> strictly accumulate for Dom0. And what we do for DomU is moot for
> the moment, until PCI passthrough becomes a thing for PVH. Hence
> I've opted to be restrictive there - I'd rather see things break
> (and getting adjusted) when this future work actually gets carried
> out, than leave things permissive for no-one to notice that it's
> too permissive, leading to an XSA.

Restricting execute permissions is something unique to virt.  It doesn't
exist in a non-virtualised system, as I and D side reads are
indistinguishable outside of the core.

Furthermore, it is inexpressible on some systems/configurations.

Introspection is the only technology which should be restricting execute
permissions in the p2m, and only when it takes responsibility for
dealing with the fallout.

~Andrew




Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-08-31 Thread Tianyu Lan

Hi Christoph:

On 8/30/2021 8:00 PM, Christoph Hellwig wrote:

Sorry for the delayed answer, but I look at the vmap_pfn usage in the
previous version and tried to come up with a better version.  This
mostly untested branch:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap


No problem. Thank you very much for your suggestion patches and they are 
very helpful.





get us there for swiotlb and the channel infrastructure  I've started
looking at the network driver and didn't get anywhere due to other work.

As far as I can tell the network driver does gigantic multi-megabyte
vmalloc allocation for the send and receive buffers, which are then
passed to the hardware, but always copied to/from when interacting
with the networking stack.  Did I see that right?  Are these big
buffers actually required unlike the normal buffer management schemes
in other Linux network drivers?



For send packet, netvsc tries batching packet in send buffer if 
possible. It passes the original skb pages directly to
hypervisor when send buffer is not enough or packet length is larger 
than section size. These packets are sent via 
vmbus_sendpacket_pagebuffer() finally. Please see netvsc_send() for 
detail. The following code is to check whether the packet could be 
copied into send buffer. If not, the packet will be sent with original 
skb pages.


1239/* batch packets in send buffer if possible */
1240msdp = >msd;
1241if (msdp->pkt)
1242msd_len = msdp->pkt->total_data_buflen;
1243
1244try_batch =  msd_len > 0 && msdp->count < net_device->max_pkt;
1245if (try_batch && msd_len + pktlen + net_device->pkt_align <
1246net_device->send_section_size) {
1247section_index = msdp->pkt->send_buf_index;
1248
1249} else if (try_batch && msd_len + packet->rmsg_size <
1250   net_device->send_section_size) {
1251section_index = msdp->pkt->send_buf_index;
1252packet->cp_partial = true;
1253
1254} else if (pktlen + net_device->pkt_align <
1255   net_device->send_section_size) {
1256section_index = 
netvsc_get_next_send_section(net_device);

1257if (unlikely(section_index == NETVSC_INVALID_INDEX)) {
1258++ndev_ctx->eth_stats.tx_send_full;
1259} else {
1260move_pkt_msd(_send, _skb, msdp);
1261msd_len = 0;
1262}
1263}
1264



For receive packet, the data is always copied from recv buffer.



If so I suspect the best way to allocate them is by not using vmalloc
but just discontiguous pages, and then use kmap_local_pfn where the
PFN includes the share_gpa offset when actually copying from/to the
skbs.

When netvsc needs to copy packet data to send buffer, it needs to 
caculate position with section_index and send_section_size.
Please seee netvsc_copy_to_send_buf() detail. So the contiguous virtual 
address of send buffer is necessary to copy data and batch packets.




Re: [PATCH v3 7/7] xen/arm: Sanitize CTR_EL0 and emulate it if needed

2021-08-31 Thread Julien Grall




On 31/08/2021 14:17, Bertrand Marquis wrote:

Hi Julien,


Hi Bertrand,




On 27 Aug 2021, at 16:05, Julien Grall  wrote:

Hi Bertrand,

On 25/08/2021 14:18, Bertrand Marquis wrote:

Sanitize CTR_EL0 value between cores.
In most cases different values will taint Xen but if different
i-cache policies are found, we choose the one which will be compatible
between all cores in terms of invalidation/data cache flushing strategy.


I understand that all the CPUs in Xen needs to agree on the cache flush 
strategy. However...


In this case we need to activate the TID2 bit in HCR to emulate the
TCR_EL0 register for guests. This patch is not activating TID2 bit all
the time to limit the overhead when possible.


as we discussed in an earlier version, a vCPU is unlikely (at least in 
short/medium) to be able move across pCPU of different type. So the vCPU would 
be pinned to a set of pCPUs. IOW, the guest would have to be big.LITTLE aware 
and therefore would be able to do its own strategy decision.

So I think we should be able to get away from trappings the registers.


I do agree that we should be able to get away from that in the long term once
we have cpupools properly set but right now this is the only way to have
something useable (I will not say right).
I will work on finding a way to setup properly cpupools (or something else as
we discussed earlier) but in the short term I think this is the best we can do.


My concern is you are making look like Xen will be able to deal nicely 
with big.LITTLE when in fact there are a lot more potential issue by 
allow a vCPU moving accross pCPU of different type (the errata is one 
example).




An other solution would be to discard this patch from the serie for now until
I have worked a proper solution for this case.

Should we discard or merge or do you have an other idea ?
Please correct me if I am wrong, at the moment, it doesn't look like 
this patch will be part of the longer plan. If so, then I think it 
should be parked for now.


This would also have the advantage to avoid spending too much time on 
resolving the emulation issue I mentioned in my previous answer.


No need to resend a new version of this series yet. You can wait until 
the rest of the series get more feedback.


[...]


If we get interrupted, someone could program CSSELR differently and the next 
read
will not be reflecting what the guest actually wants to do


AFAICT, CSSELR is preserved during the context switch of vCPU. So that 
someone would have to be Xen, right?


If so, what you describe would also be an issue even if we didn't trap 
the register. Therefore, if Xen would ever use CSSELR, then that code 
would need to save the value, use the register and then restore the 
value with preemption disabled.




The code is not preemptible right now so this cannot be an issue but I added the
  comment more as a warning.

This is not something from the documentation, this is because value written
in CSSELR is defining what is read from CCSIDR




+WRITE_SYSREG(v->arch.csselr, CSSELR_EL1);
+set_user_reg(regs, regidx, READ_SYSREG(CCSIDR_EL1));
+}


Cheers,

--
Julien Grall



Re: [PATCH 4/4] x86/PV: properly set shadow allocation for Dom0

2021-08-31 Thread Jan Beulich
On 31.08.2021 15:47, Andrew Cooper wrote:
> On 30/08/2021 14:03, Jan Beulich wrote:
>> @@ -933,7 +934,17 @@ int __init dom0_construct_pv(struct doma
>>  #ifdef CONFIG_SHADOW_PAGING
>>  if ( opt_dom0_shadow )
>>  {
>> +bool preempted;
>> +
>>  printk("Switching dom0 to using shadow paging\n");
>> +
>> +do {
>> +preempted = false;
>> +shadow_set_allocation(d, dom0_paging_pages(d, nr_pages),
>> +  );
>> +process_pending_softirqs();
>> +} while ( preempted );
> 
> This isn't correct.  The shadow pool is needed even without
> opt_dom0_shadow, because some downstreams have elected not to retain
> upstream's security vulnerability in default setting of opt_pv_l1tf_hwdom.

Are you suggesting to set up a (perhaps large) shadow pool just in
case we need to enable shadow mode on Dom0? And all of this memory
to then remain unused in the majority of cases?

Plus even if so, I'd view this as a 2nd, independent step, largely
orthogonal to the handling of "dom0=shadow". If somebody really
wanted that, I think this should be driven by an explicit setting
of the shadow pool size, indicating the admin is willing to waste
the memory.

I'm further puzzled by "not to retain upstream's security
vulnerability" - are you saying upstream is vulnerable in some way,
while perhaps you (XenServer) are not? In general I don't think I
view downstream decisions as a driving factor for what upstream
does, when the result is deliberately different behavior from
upstream.

> Also, dom0_paging_pages() isn't a trivial calculation, so should be
> called once and cached.

Sure, can do that. You did notice though that all I did is take
PVH's similar code?

Jan




Re: [PATCH v3 2/2] tools/xenstore: set open file descriptor limit for xenstored

2021-08-31 Thread Ian Jackson
Andrew Cooper writes ("Re: [PATCH v3 2/2] tools/xenstore: set open file 
descriptor limit for xenstored"):
> xenstored is TCB.  It needs a large number of FDs, and can be trusted
> with unlimited.

I baseically agree with this.

> Also, like xenconsoled, we can calculate an upper bound, which is
> derived from the ABI limit of 32k domids.

IMO the default should support at leaat this much.

However, I don't think you are right, because xenstored offers console
connections to (possibly arbitrarily many) local socket connections.

> All you're haggling over is the error semantics in the case of:
> 1) the upper bound calculation is wrong, or
> 2) there is an fd leak
> 
> Personally, I think a fixed calculation is right, so fd leaks can be
> spotted more obviously.
> 
> An admin knob is not helpful - higher than the upper bound is just
> wasteful, while lower will cause malfunctions.

I don't agree.  Firstly, on a technical level, your statement is true
only if the admin does not know they will be running a much smaller
number of domains.  Secondly, we have had several people saying they
want this to be configurable.  I think if several people say they want
something to be configurable, we should respect that, even if we think
the use cases for it are marginal.  If there are hazards in bad
settings of such a know, that can be dealt with in the docs.

Julien's point about not having the limit set by xenstored itself is
very well taken.

ISTM that the following scheme is in the intersection of everyone's
requirements:

 * The limit will be adjusted/imposed in the startup script.
 * An /etc/{default,sysconfig} parameter will be provided to
   adjust the setting.
 * The default should be `unlimtied` since we cannot calculate
   a safe upper bound for all configurations.
 * Systems like Citrix Hypervisor (XenServer) which can calculate
   a safe upper bound can do so, and adjust the default, enabling
   them to spot fd leaks.

Ian.



Re: [PATCH v3 2/2] tools/xenstore: set open file descriptor limit for xenstored

2021-08-31 Thread Julien Grall

Hi Andrew,

On 31/08/2021 13:37, Andrew Cooper wrote:

On 31/08/2021 13:30, Julien Grall wrote:

Hi Juergen,

On 31/08/2021 13:11, Juergen Gross wrote:

On 30.07.21 19:14, Julien Grall wrote:

Hi Ian,

On 30/07/2021 14:35, Ian Jackson wrote:

Juergen Gross writes ("[PATCH v3 2/2] tools/xenstore: set open file
descriptor limit for xenstored"):

Add a configuration item for the maximum number of domains xenstored
should support and set the limit of open file descriptors
accordingly.

For HVM domains there are up to 5 socket connections per domain (2 by
the xl daemon process, and 3 by qemu). So set the ulimit for
xenstored
to 5 * XENSTORED_MAX_DOMAINS + 100 (the "+ 100" is for some headroom,
like logging, event channel device, etc.).

...

+## Type: integer
+## Default: 32768
+#
+# Select maximum number of domains supported by xenstored.
+# Only evaluated if XENSTORETYPE is "daemon".
+#XENSTORED_MAX_N_DOMAINS=32768


I approve of doing something about the fd limit.  I have some qualms
about the documentation.

The documentation doesn't say what happens if this limit is exceeded.
Also the default of 32758 suggests that we actually support that many
domains.  I don't think we do...

I didn't find anything in SUPPORT.md about how many guests we support
but I wouldn't want this setting here to imply full support for 32768
domains.

If you don't want to tackle this can of works, maybe add this:

    # This just controls some resource limits for xenstored; if the
    # limit is exceeded, xenstored will stop being able to function
    # properly for additional guests.  The default value is so large
    # that it won't be exceeded in a supported configuration, but
    # should not be taken to mean that the whole Xen system is
    # guaranteed to work properly with that many guests.

Julien, did you ask for this to be made configurable ?  Having written
the text above, I wonder if it wouldn't just be better to
unconditionally set it to "unlimited" rather than offering footgun
dressed up like a tuneable...


So in v1 (see [1]), Juergen wanted to raise the limit. I assumed
this meant that the default limit (configured by the system may not
be enough).

I felt this was wrong to impose an higher limit on everyone when an
admin may know the maximum number of domains.

By "unlimited", do you mean the calling "ulimit" (or whatever is
used for configuring FDs) with unlimited?

If so, I would be OK with that. My main was was to move the raising
the limit outside Xenstored because:
   1) This is easier for an admin to tweak it (in particular the OOM)
   2) It feels wrong to me that the daemon chose the limits
   3) An admin can enforce it


Coming back to this series, I'm puzzled now.

Julien, you didn't want me to raise the limit to a specific number
covering the maximum possible number of domains, because you thought
this might result in xenstored hogging huge numbers of file descriptors
in case of a bug. Why is unlimited better then? This will make the
possible number even larger.


I don't think I suggested the unlimited number is better... My main
objection in your original approach is you set an arbitrary limit you
in Xenstored (which may not apply at all) and don't offer a way to the
admin to tweak it.

If the limit is set outside of Xenstored, then it becomes much easier
for someone to just tweak the init script. I don't have a strong
opinion on whether the default limit should be "unlimited" or a fixed
number.


xenstored is TCB.  It needs a large number of FDs, and can be trusted
with unlimited.

Also, like xenconsoled, we can calculate an upper bound, which is
derived from the ABI limit of 32k domids.

All you're haggling over is the error semantics in the case of:
1) the upper bound calculation is wrong, or
2) there is an fd leak

Personally, I think a fixed calculation is right, so fd leaks can be
spotted more obviously. >
An admin knob is not helpful - higher than the upper bound is just
wasteful, while lower will cause malfunctions.


A lot of users will never run more than a few handful (maybe hundreds?) 
number of domains... So setting the limit to allow 32K of domains sound 
really wasteful and wouldn't really help to spot FD leaks.


I don't really see the problem to allow an admin to say "I will never 
run more than N domains". So the limit can be lower and would still 
function normally. Do you mind explaining why you think otherwise?


Cheers,

--
Julien Grall



Re: [PATCH 3/4] x86/PVH: improve Dom0 memory size calculation

2021-08-31 Thread Andrew Cooper
On 30/08/2021 14:03, Jan Beulich wrote:
> Assuming that the accounting for IOMMU page tables will also take care
> of the P2M needs was wrong: dom0_paging_pages() can determine a far
> higher value, high enough for the system to run out of memory while
> setting up Dom0. Hence in the case of shared page tables the larger of
> the two values needs to be used (without shared page tables the sum of
> both continues to be applicable).

I'm afraid that I can't follow this at all.

What causes the system to run out of RAM while constructing dom0?

>
> While there also account for two further aspects in the PV case: With
> "iommu=dom0-passthrough" no IOMMU page tables would get allocated, so
> none need accounting for. And if shadow mode is to be enabled, setting
> aside a suitable amount for the P2M pool to get populated is also
> necessary (i.e. similar to the non-shared-page-tables case of PVH).
>
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -318,7 +318,7 @@ unsigned long __init dom0_compute_nr_pag
>  struct domain *d, struct elf_dom_parms *parms, unsigned long initrd_len)
>  {
>  nodeid_t node;
> -unsigned long avail = 0, nr_pages, min_pages, max_pages;
> +unsigned long avail = 0, nr_pages, min_pages, max_pages, iommu_pages = 0;
>  bool need_paging;
>  
>  /* The ordering of operands is to work around a clang5 issue. */
> @@ -337,18 +337,23 @@ unsigned long __init dom0_compute_nr_pag
>  avail -= d->max_vcpus - 1;
>  
>  /* Reserve memory for iommu_dom0_init() (rough estimate). */
> -if ( is_iommu_enabled(d) )
> +if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
>  {
>  unsigned int s;
>  
>  for ( s = 9; s < BITS_PER_LONG; s += 9 )
> -avail -= max_pdx >> s;
> +iommu_pages += max_pdx >> s;
> +
> +avail -= iommu_pages;
>  }
>  
> -need_paging = is_hvm_domain(d) &&
> -(!iommu_use_hap_pt(d) || !paging_mode_hap(d));
> +need_paging = is_hvm_domain(d)
> +  ? !iommu_use_hap_pt(d) || !paging_mode_hap(d)
> +  : opt_dom0_shadow;

As per patch 4, this condition needs adjusting.

~Andrew

>  for ( ; ; need_paging = false )
>  {
> +unsigned long paging_pages;
> +
>  nr_pages = get_memsize(_size, avail);
>  min_pages = get_memsize(_min_size, avail);
>  max_pages = get_memsize(_max_size, avail);
> @@ -377,11 +382,20 @@ unsigned long __init dom0_compute_nr_pag
>  nr_pages = min(nr_pages, max_pages);
>  nr_pages = min(nr_pages, avail);
>  
> -if ( !need_paging )
> -break;
> +paging_pages = paging_mode_enabled(d) || need_paging
> +   ? dom0_paging_pages(d, nr_pages) : 0;
>  
>  /* Reserve memory for shadow or HAP. */
> -avail -= dom0_paging_pages(d, nr_pages);
> +if ( !need_paging )
> +{
> +if ( paging_pages <= iommu_pages )
> +break;
> +
> +avail -= paging_pages - iommu_pages;
> +}
> +else
> +avail -= paging_pages;
> +iommu_pages = paging_pages;
>  }
>  
>  if ( is_pv_domain(d) &&
>





Re: [XEN PATCH] Config.mk: update OVMF to edk2-stable202108

2021-08-31 Thread Ian Jackson
Anthony PERARD writes ("[XEN PATCH] Config.mk: update OVMF to 
edk2-stable202108"):
> Update to the latest stable tag.

Thanks.  I am OK with this but I think we should hold off committing
it until the XSA fallout has been sorted.

Ian.



Re: [XEN PATCH] Config.mk: update OVMF to edk2-stable202108

2021-08-31 Thread Anthony PERARD
On Tue, Aug 31, 2021 at 03:01:54PM +0200, Jan Beulich wrote:
> On 31.08.2021 14:36, Anthony PERARD wrote:
> > Update to the latest stable tag.
> 
> Largely out of curiosity - if that's truly a tag, ...

Well, it's a git tag, but it's not signed by upstream :-(.

Upstream started to do a release for OVMF every 3 months somewhat
recently. They will stop taking new feature and only take fixes for a
couple of weeks, then tag the tree.

> > --- a/Config.mk
> > +++ b/Config.mk
> > @@ -244,7 +244,7 @@ QEMU_TRADITIONAL_URL ?= 
> > git://xenbits.xen.org/qemu-xen-traditional.git
> >  SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
> >  MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
> >  endif
> > -OVMF_UPSTREAM_REVISION ?= b37cfdd2807181aed2fee1e17bd7ec1190db266a
> > +OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5
> 
> ... why not refer to it here and instead spell out a hash?

Since the tag isn't signed, it's probably better to keep using a hash.

-- 
Anthony PERARD



Re: [PATCH 4/4] x86/PV: properly set shadow allocation for Dom0

2021-08-31 Thread Andrew Cooper
On 30/08/2021 14:03, Jan Beulich wrote:
> @@ -933,7 +934,17 @@ int __init dom0_construct_pv(struct doma
>  #ifdef CONFIG_SHADOW_PAGING
>  if ( opt_dom0_shadow )
>  {
> +bool preempted;
> +
>  printk("Switching dom0 to using shadow paging\n");
> +
> +do {
> +preempted = false;
> +shadow_set_allocation(d, dom0_paging_pages(d, nr_pages),
> +  );
> +process_pending_softirqs();
> +} while ( preempted );

This isn't correct.  The shadow pool is needed even without
opt_dom0_shadow, because some downstreams have elected not to retain
upstream's security vulnerability in default setting of opt_pv_l1tf_hwdom.

Also, dom0_paging_pages() isn't a trivial calculation, so should be
called once and cached.

~Andrew




RE: [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line

2021-08-31 Thread Wei Chen
Hi Stefano,

> -Original Message-
> From: Stefano Stabellini 
> Sent: 2021年8月31日 9:53
> To: Wei Chen 
> Cc: xen-devel@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org;
> jbeul...@suse.com; Bertrand Marquis 
> Subject: Re: [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to
> support NUMA switch in command line
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > Xen x86 has created a command line parameter "numa" as NUMA switch for
> > user to turn on/off NUMA. As device tree based NUMA has been enabled
> > for Arm, this parameter can be reused by Arm. So in this patch, we move
> > this parameter to common.
> >
> > Signed-off-by: Wei Chen 
> > ---
> >  xen/arch/x86/numa.c| 34 --
> >  xen/common/numa.c  | 35 ++-
> >  xen/include/xen/numa.h |  1 -
> >  3 files changed, 34 insertions(+), 36 deletions(-)
> >
> > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> > index 8b43be4aa7..380d8ed6fd 100644
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -11,7 +11,6 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -19,9 +18,6 @@
> >  #include 
> >  #include 
> >
> > -static int numa_setup(const char *s);
> > -custom_param("numa", numa_setup);
> > -
> >  #ifndef Dprintk
> >  #define Dprintk(x...)
> >  #endif
> > @@ -50,35 +46,6 @@ void numa_set_node(int cpu, nodeid_t node)
> >  cpu_to_node[cpu] = node;
> >  }
> >
> > -/* [numa=off] */
> > -static __init int numa_setup(const char *opt)
> > -{
> > -if ( !strncmp(opt,"off",3) )
> > -numa_off = true;
> > -else if ( !strncmp(opt,"on",2) )
> > -numa_off = false;
> > -#ifdef CONFIG_NUMA_EMU
> > -else if ( !strncmp(opt, "fake=", 5) )
> > -{
> > -numa_off = false;
> > -numa_fake = simple_strtoul(opt+5,NULL,0);
> > -if ( numa_fake >= MAX_NUMNODES )
> > -numa_fake = MAX_NUMNODES;
> > -}
> > -#endif
> > -#ifdef CONFIG_ACPI_NUMA
> > -else if ( !strncmp(opt,"noacpi",6) )
> > -{
> > -numa_off = false;
> > -acpi_numa = -1;
> > -}
> > -#endif
> > -else
> > -return -EINVAL;
> > -
> > -return 0;
> > -}
> > -
> >  /*
> >   * Setup early cpu_to_node.
> >   *
> > @@ -287,4 +254,3 @@ static __init int register_numa_trigger(void)
> >  return 0;
> >  }
> >  __initcall(register_numa_trigger);
> > -
> 
> spurious change

got it.


RE: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init

2021-08-31 Thread Wei Chen
Hi Stefano,

> -Original Message-
> From: Stefano Stabellini 
> Sent: 2021年8月31日 9:51
> To: Wei Chen 
> Cc: xen-devel@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org;
> jbeul...@suse.com; Bertrand Marquis 
> Subject: Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA
> in system init
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > Everything is ready, we can remove the fake NUMA node and
> > depends on device tree to create NUMA system.
> >
> > Signed-off-by: Wei Chen 
> > ---
> >  xen/arch/arm/numa.c| 45 ++
> >  xen/include/asm-arm/numa.h |  7 --
> >  2 files changed, 26 insertions(+), 26 deletions(-)
> >
> > diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> > index 2a18c97470..3b04220e60 100644
> > --- a/xen/arch/arm/numa.c
> > +++ b/xen/arch/arm/numa.c
> > @@ -18,6 +18,7 @@
> >   *
> >   */
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -83,28 +84,34 @@ void __init numa_init(bool acpi_off)
> >  paddr_t ram_size = 0;
> >  paddr_t ram_end = 0;
> >
> > -printk(XENLOG_WARNING
> > -"NUMA has not been supported yet, NUMA off!\n");
> > -/* Arm NUMA has not been implemented until this patch */
> > -numa_off = true;
> > +/* NUMA has been turned off through Xen parameters */
> > +if ( numa_off )
> > +goto mem_init;
> >
> > -/*
> > - * Set all cpu_to_node mapping to 0, this will make cpu_to_node
> > - * function return 0 as previous fake cpu_to_node API.
> > - */
> > -for ( idx = 0; idx < NR_CPUS; idx++ )
> > -cpu_to_node[idx] = 0;
> > -
> > -/*
> > - * Make node_to_cpumask, node_spanned_pages and node_start_pfn
> > - * return as previous fake APIs.
> > - */
> > -for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
> > -node_to_cpumask[idx] = cpu_online_map;
> > -node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
> > -node_start_pfn(idx) = (mfn_x(first_valid_mfn));
> > +/* Initialize NUMA from device tree when system is not ACPI booted
> */
> > +if ( acpi_off )
> > +{
> > +#ifdef CONFIG_DEVICE_TREE_NUMA
> > +int ret = numa_device_tree_init(device_tree_flattened);
> > +if ( !ret )
> > +goto mem_init;
> > +printk(XENLOG_WARNING
> > +   "Init NUMA from device tree failed, ret=%d\n", ret);
> > +#else
> > +printk(XENLOG_WARNING
> > +   "CONFIG_DEVICE_TREE_NUMA is not set, NUMA off!\n");
> 
> I don't think we want to see this warning every time at boot when
> CONFIG_DEVICE_TREE_NUMA is off. I'd set it to XENLOG_DEBUG or remove it.
> 

OK

> Also given that we have many stub functions in
> xen/include/asm-arm/numa.h already, maybe we could also have a stub
> function for numa_device_tree_init so that we won'd need an #ifdef
> CONFIG_DEVICE_TREE_NUMA here.
> 

Yes, it's a good idea. I will do it.

> 
> > +#endif
> > +numa_off = true;
> > +}
> > +else
> > +{
> > +/* We don't support NUMA for ACPI boot currently */
> > +printk(XENLOG_WARNING
> > +   "ACPI NUMA has not been supported yet, NUMA off!\n");
> > +numa_off = true;
> >  }
> >
> > +mem_init:
> >  /*
> >   * Find the minimal and maximum address of RAM, NUMA will
> >   * build a memory to node mapping table for the whole range.
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index a3982a94b6..425eb9aede 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -30,13 +30,6 @@ extern int numa_device_tree_init(const void *fdt);
> >  extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t
> distance);
> >  extern void arch_numa_init_failed_fallback(void);
> >
> > -/*
> > - * Temporary for fake NUMA node, when CPU, memory and distance
> > - * matrix will be read from DTB or ACPI SRAT. The following
> > - * symbols will be removed.
> > - */
> > -extern mfn_t first_valid_mfn;
> > -
> >  #else
> >
> >  /* Fake one node for now. See also node_online_map. */
> > --
> > 2.25.1
> >


RE: [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common

2021-08-31 Thread Wei Chen
Hi Stefano,

> -Original Message-
> From: Stefano Stabellini 
> Sent: 2021年8月31日 9:27
> To: Wei Chen 
> Cc: xen-devel@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org;
> jbeul...@suse.com; Bertrand Marquis 
> Subject: Re: [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to
> common
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > After the previous patches preparations, numa_scan_nodes can be
> > used by Arm and x86. So we move this function from x86 to common.
> > As node_cover_memory will not be used cross files, we restore its
> > static attribute in this patch.
> >
> > Signed-off-by: Wei Chen 
> > ---
> >  xen/arch/x86/srat.c| 52 
> >  xen/common/numa.c  | 54 +-
> >  xen/include/asm-x86/acpi.h |  3 ---
> >  xen/include/xen/numa.h |  3 ++-
> >  4 files changed, 55 insertions(+), 57 deletions(-)
> >
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index c979939fdd..c9f019c307 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -361,58 +361,6 @@ void __init srat_parse_regions(u64 addr)
> > pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
> >  }
> >
> > -/* Use the information discovered above to actually set up the nodes.
> */
> > -int __init numa_scan_nodes(u64 start, u64 end)
> > -{
> > -   int i;
> > -   nodemask_t all_nodes_parsed;
> > -
> > -   /* First clean up the node list */
> > -   for (i = 0; i < MAX_NUMNODES; i++)
> > -   cutoff_node(i, start, end);
> > -
> > -#ifdef CONFIG_ACPI_NUMA
> > -   if (acpi_numa <= 0)
> > -   return -1;
> > -#endif
> > -
> > -   if (!nodes_cover_memory()) {
> > -   bad_srat();
> > -   return -1;
> > -   }
> > -
> > -   memnode_shift = compute_hash_shift(node_memblk_range,
> num_node_memblks,
> > -   memblk_nodeid);
> > -
> > -   if (memnode_shift < 0) {
> > -   printk(KERN_ERR
> > -"SRAT: No NUMA node hash function found. Contact
> maintainer\n");
> > -   bad_srat();
> > -   return -1;
> > -   }
> > -
> > -   nodes_or(all_nodes_parsed, memory_nodes_parsed,
> processor_nodes_parsed);
> > -
> > -   /* Finally register nodes */
> > -   for_each_node_mask(i, all_nodes_parsed)
> > -   {
> > -   u64 size = nodes[i].end - nodes[i].start;
> > -   if ( size == 0 )
> > -   printk(KERN_WARNING "SRAT: Node %u has no memory. "
> > -  "BIOS Bug or mis-configured hardware?\n", i);
> > -
> > -   setup_node_bootmem(i, nodes[i].start, nodes[i].end);
> > -   }
> > -   for (i = 0; i < nr_cpu_ids; i++) {
> > -   if (cpu_to_node[i] == NUMA_NO_NODE)
> > -   continue;
> > -   if (!nodemask_test(cpu_to_node[i], _nodes_parsed))
> > -   numa_set_node(i, NUMA_NO_NODE);
> > -   }
> > -   numa_init_array();
> > -   return 0;
> > -}
> > -
> >  static unsigned node_to_pxm(nodeid_t n)
> >  {
> > unsigned i;
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 4152bbe83b..8ca13e27d1 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -195,7 +195,7 @@ void __init cutoff_node(int i, u64 start, u64 end)
> >
> >  /* Sanity check to catch more bad SRATs (they are amazingly common).
> > Make sure the PXMs cover all memory. */
> > -int __init nodes_cover_memory(void)
> > +static int __init nodes_cover_memory(void)
> >  {
> > int i;
> > uint32_t nr_banks = arch_meminfo_get_nr_bank();
> > @@ -271,6 +271,58 @@ void __init numa_init_array(void)
> >  }
> >  }
> >
> > +/* Use the information discovered above to actually set up the nodes.
> */
> > +int __init numa_scan_nodes(u64 start, u64 end)
> > +{
> > +   int i;
> > +   nodemask_t all_nodes_parsed;
> > +
> > +   /* First clean up the node list */
> > +   for (i = 0; i < MAX_NUMNODES; i++)
> > +   cutoff_node(i, start, end);
> > +
> > +#ifdef CONFIG_ACPI_NUMA
> > +   if (acpi_numa <= 0)
> > +   return -1;
> > +#endif
> > +
> > +   if (!nodes_cover_memory()) {
> > +   bad_srat();
> > +   return -1;
> > +   }
> > +
> > +   memnode_shift = compute_hash_shift(node_memblk_range,
> num_node_memblks,
> > +   memblk_nodeid);
> > +
> > +   if (memnode_shift < 0) {
> > +   printk(KERN_ERR
> > +"SRAT: No NUMA node hash function found. Contact
> maintainer\n");
> > +   bad_srat();
> > +   return -1;
> > +   }
> > +
> > +   nodes_or(all_nodes_parsed, memory_nodes_parsed,
> processor_nodes_parsed);
> > +
> > +   /* Finally register nodes */
> > +   for_each_node_mask(i, all_nodes_parsed)
> > +   {
> > +   u64 size = nodes[i].end - nodes[i].start;
> > +   if ( size == 0 )
> > +   printk(KERN_WARNING "SRAT: Node %u has no memory. "
> > +  "BIOS Bug or mis-configured hardware?\n", i);
> 
> Not all archs have a BIOS so I'd say 

RE: [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory to common

2021-08-31 Thread Wei Chen
Hi Stefano,

> -Original Message-
> From: Stefano Stabellini 
> Sent: 2021年8月31日 9:17
> To: Wei Chen 
> Cc: xen-devel@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org;
> jbeul...@suse.com; Bertrand Marquis 
> Subject: Re: [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory to
> common
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > Not only ACPU NUMA, but also Arm device tree based NUMA
> > will use nodes_cover_memory to do sanity check. So we move
> > this function from arch/x86 to common.
> >
> > Signed-off-by: Wei Chen 
> > ---
> >  xen/arch/x86/srat.c| 40 
> >  xen/common/numa.c  | 40 
> >  xen/include/xen/numa.h |  1 +
> >  3 files changed, 41 insertions(+), 40 deletions(-)
> >
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index dd3aa30843..dcebc7adec 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -308,46 +308,6 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> > num_node_memblks++;
> >  }
> >
> > -/* Sanity check to catch more bad SRATs (they are amazingly common).
> > -   Make sure the PXMs cover all memory. */
> > -static int __init nodes_cover_memory(void)
> > -{
> > -   int i;
> > -   uint32_t nr_banks = arch_meminfo_get_nr_bank();
> > -
> > -   for (i = 0; i < nr_banks; i++) {
> > -   int j, found;
> > -   unsigned long long start, end;
> > -
> > -   if (arch_meminfo_get_ram_bank_range(i, , )) {
> > -   continue;
> > -   }
> > -
> > -   do {
> > -   found = 0;
> > -   for_each_node_mask(j, memory_nodes_parsed)
> > -   if (start < nodes[j].end
> > -   && end > nodes[j].start) {
> > -   if (start >= nodes[j].start) {
> > -   start = nodes[j].end;
> > -   found = 1;
> > -   }
> > -   if (end <= nodes[j].end) {
> > -   end = nodes[j].start;
> > -   found = 1;
> > -   }
> > -   }
> > -   } while (found && start < end);
> > -
> > -   if (start < end) {
> > -   printk(KERN_ERR "SRAT: No PXM for e820 range: "
> > -   "%016Lx - %016Lx\n", start, end);
> > -   return 0;
> > -   }
> > -   }
> > -   return 1;
> > -}
> > -
> >  void __init acpi_numa_arch_fixup(void) {}
> >
> >  static uint64_t __initdata srat_region_mask;
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 79ab250543..74960885a6 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -193,6 +193,46 @@ void __init cutoff_node(int i, u64 start, u64 end)
> > }
> >  }
> >
> > +/* Sanity check to catch more bad SRATs (they are amazingly common).
> > +   Make sure the PXMs cover all memory. */
> > +int __init nodes_cover_memory(void)
> > +{
> > +   int i;
> > +   uint32_t nr_banks = arch_meminfo_get_nr_bank();
> > +
> > +   for (i = 0; i < nr_banks; i++) {
> > +   int j, found;
> > +   unsigned long long start, end;
> > +
> > +   if (arch_meminfo_get_ram_bank_range(i, , )) {
> > +   continue;
> > +   }
> > +
> > +   do {
> > +   found = 0;
> > +   for_each_node_mask(j, memory_nodes_parsed)
> > +   if (start < nodes[j].end
> > +   && end > nodes[j].start) {
> > +   if (start >= nodes[j].start) {
> > +   start = nodes[j].end;
> > +   found = 1;
> > +   }
> > +   if (end <= nodes[j].end) {
> > +   end = nodes[j].start;
> > +   found = 1;
> > +   }
> > +   }
> > +   } while (found && start < end);
> > +
> > +   if (start < end) {
> > +   printk(KERN_ERR "SRAT: No PXM for e820 range: "
> > +   "%016Lx - %016Lx\n", start, end);
> 
> I don't know if you are already doing this in a later patch but the
> message shouldn't say e820 as it doesn't exist on all architecture.
> Maybe "for address range" or "for memory range" would suffice.
> 
> Normally we don't do change together with code movement but in this case
> I think it would be OK.

OK, I will do it in next version.


Re: [PATCH 1/4] x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

2021-08-31 Thread Jan Beulich
On 31.08.2021 15:27, Andrew Cooper wrote:
> On 31/08/2021 14:19, Jan Beulich wrote:
> @@ -1095,6 +1101,17 @@ static int __init pvh_setup_acpi(struct
>  nr_pages = PFN_UP((d->arch.e820[i].addr & ~PAGE_MASK) +
>d->arch.e820[i].size);
>  
> +/* Memory below 1MB has been dealt with by pvh_populate_p2m(). */
> +if ( pfn < PFN_DOWN(MB(1)) )
> +{
> +if ( pfn + nr_pages <= PFN_DOWN(MB(1)) )
> +continue;
> +
> +/* This shouldn't happen, but is easy to deal with. */
 I'm not sure this comment is helpful.

 Under PVH, there is nothing special about the 1M boundary, and we can
 reasonably have something else here or crossing the boundary.
>>> As long as we have this special treatment of the low Mb, the boundary
>>> is meaningful imo. I'd see the comment go away when the handling in
>>> general gets streamlined.
>> I should have added: And as long as Dom0's E820 map gets cloned from
>> the host's, which will necessarily consider the 1Mb boundary special.
> 
> Not when you're booting virtualised in the first place.

You mean when Xen itself runs in PVH (not HVM) mode, and then in turn
has a PVH Dom0? And if the underlying Xen has not in turn cloned
the E820 from the host's? That's surely too exotic a case to warrant
removing this comment. If you insist, I can mention that case as a
possible exception.

Jan




Re: [PATCH 1/4] x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

2021-08-31 Thread Andrew Cooper
On 31/08/2021 14:19, Jan Beulich wrote:
 @@ -1095,6 +1101,17 @@ static int __init pvh_setup_acpi(struct
  nr_pages = PFN_UP((d->arch.e820[i].addr & ~PAGE_MASK) +
d->arch.e820[i].size);
  
 +/* Memory below 1MB has been dealt with by pvh_populate_p2m(). */
 +if ( pfn < PFN_DOWN(MB(1)) )
 +{
 +if ( pfn + nr_pages <= PFN_DOWN(MB(1)) )
 +continue;
 +
 +/* This shouldn't happen, but is easy to deal with. */
>>> I'm not sure this comment is helpful.
>>>
>>> Under PVH, there is nothing special about the 1M boundary, and we can
>>> reasonably have something else here or crossing the boundary.
>> As long as we have this special treatment of the low Mb, the boundary
>> is meaningful imo. I'd see the comment go away when the handling in
>> general gets streamlined.
> I should have added: And as long as Dom0's E820 map gets cloned from
> the host's, which will necessarily consider the 1Mb boundary special.

Not when you're booting virtualised in the first place.

~Andrew



Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

2021-08-31 Thread Jan Beulich
On 31.08.2021 15:16, Andrew Cooper wrote:
> On 30/08/2021 14:02, Jan Beulich wrote:
>> Further permit "access" to differ in the "executable" attribute. While
>> ideally only ROM regions would get mapped with X set, getting there is
>> quite a bit of work. Therefore, as a temporary measure, permit X to
>> vary. For Dom0 the more permissive of the types will be used, while for
>> DomU it'll be the more restrictive one.
> 
> Split behaviour between dom0 and domU based on types alone cannot
> possibly be correct.

True, but what do you do.

> DomU's need to execute ROMs too, and this looks like will malfunction if
> a ROM ends up in the region that HVMLoader relocated RAM from.
> 
> As this is a temporary bodge emergency bugfix, don't try to be clever -
> just take the latest access.

And how do we know that that's what is going to work? We should
strictly accumulate for Dom0. And what we do for DomU is moot for
the moment, until PCI passthrough becomes a thing for PVH. Hence
I've opted to be restrictive there - I'd rather see things break
(and getting adjusted) when this future work actually gets carried
out, than leave things permissive for no-one to notice that it's
too permissive, leading to an XSA.

>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d
>>  if ( p2m_is_special(ot) )
>>  {
>>  /* Don't permit unmapping grant/foreign/direct-MMIO this way. */
>> -domain_crash(d);
>>  p2m_unlock(p2m);
>> -
>> +printk(XENLOG_G_ERR
>> +   "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not 
>> permitted\n",
> 
> type and access need to be rendered in hex, or you need to use 0x
> prefixes to distinguish the two bases.

Will use %#lx then.

> Also, use commas rather than colons.  Visually, this is ambiguous with
> PCI BDFs, and commas match tuple notation in most programming languages
> which is the construct you're trying to represent.
> 
> Same below.

Sure, will do.

>> @@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do
>>  }
>>  if ( p2m_is_special(ot) )
>>  {
>> -gfn_unlock(p2m, gfn, order);
>> -domain_crash(d);
>> -return -EPERM;
>> +bool done = false, bad = true;
>> +
>> +/* Special-case (almost) identical mappings. */
>> +if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot )
>> +{
>> +/*
>> + * For MMIO allow X to differ in the requests (to cover for
>> + * set_identity_p2m_entry() and set_mmio_p2m_entry() differing 
>> in
>> + * the way they specify "access"). For the hardware domain put 
>> (or
>> + * leave) in place the more permissive of the two possibilities,
>> + * while for DomU-s go with the more restrictive variant.
> 
> This comment needs to identify clearly that it is a temporary bodge
> intended to be removed.

Okay.

Jan




Re: [PATCH 1/4] x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

2021-08-31 Thread Jan Beulich
On 31.08.2021 15:14, Jan Beulich wrote:
> On 31.08.2021 15:02, Andrew Cooper wrote:
>> On 30/08/2021 14:02, Jan Beulich wrote:
>>> One of the changes comprising the fixes for XSA-378 disallows replacing
>>> MMIO mappings by unintended (for this purpose) code paths.
>>
>> I'd drop the brackets.  All it does is confuse the sentence.
>>
>>>  This means we
>>> need to be more careful about the mappings put in place in this range -
>>> mappings should be created exactly once:
>>> - iommu_hwdom_init() comes first; it should avoid the first Mb,
>>> - pvh_populate_p2m() should insert identity mappings only into ranges
>>>   not populated as RAM,
>>> - pvh_setup_acpi() should again avoid the first Mb, which was already
>>>   dealt with at that point.
>>
>> This really is a mess.  It also seems very fragile.
> 
> So it seems to me.
> 
>> Why is iommu_hwdom_init() separate in the first place?  It only does
>> mappings up to 4G in the first place, and with this change, it is now
>> [1M-4G)
> 
> I guess we'll want to wait for Roger to return to shed some light on
> this.
> 
>>> @@ -1095,6 +1101,17 @@ static int __init pvh_setup_acpi(struct
>>>  nr_pages = PFN_UP((d->arch.e820[i].addr & ~PAGE_MASK) +
>>>d->arch.e820[i].size);
>>>  
>>> +/* Memory below 1MB has been dealt with by pvh_populate_p2m(). */
>>> +if ( pfn < PFN_DOWN(MB(1)) )
>>> +{
>>> +if ( pfn + nr_pages <= PFN_DOWN(MB(1)) )
>>> +continue;
>>> +
>>> +/* This shouldn't happen, but is easy to deal with. */
>>
>> I'm not sure this comment is helpful.
>>
>> Under PVH, there is nothing special about the 1M boundary, and we can
>> reasonably have something else here or crossing the boundary.
> 
> As long as we have this special treatment of the low Mb, the boundary
> is meaningful imo. I'd see the comment go away when the handling in
> general gets streamlined.

I should have added: And as long as Dom0's E820 map gets cloned from
the host's, which will necessarily consider the 1Mb boundary special.

Jan




Re: [PATCH v3 7/7] xen/arm: Sanitize CTR_EL0 and emulate it if needed

2021-08-31 Thread Bertrand Marquis
Hi Julien,

> On 27 Aug 2021, at 16:05, Julien Grall  wrote:
> 
> Hi Bertrand,
> 
> On 25/08/2021 14:18, Bertrand Marquis wrote:
>> Sanitize CTR_EL0 value between cores.
>> In most cases different values will taint Xen but if different
>> i-cache policies are found, we choose the one which will be compatible
>> between all cores in terms of invalidation/data cache flushing strategy.
> 
> I understand that all the CPUs in Xen needs to agree on the cache flush 
> strategy. However...
> 
>> In this case we need to activate the TID2 bit in HCR to emulate the
>> TCR_EL0 register for guests. This patch is not activating TID2 bit all
>> the time to limit the overhead when possible.
> 
> as we discussed in an earlier version, a vCPU is unlikely (at least in 
> short/medium) to be able move across pCPU of different type. So the vCPU 
> would be pinned to a set of pCPUs. IOW, the guest would have to be big.LITTLE 
> aware and therefore would be able to do its own strategy decision.
> 
> So I think we should be able to get away from trappings the registers.

I do agree that we should be able to get away from that in the long term once
we have cpupools properly set but right now this is the only way to have
something useable (I will not say right).
I will work on finding a way to setup properly cpupools (or something else as
we discussed earlier) but in the short term I think this is the best we can do.

An other solution would be to discard this patch from the serie for now until
I have worked a proper solution for this case.

Should we discard or merge or do you have an other idea ?


> 
>> When TID2 is activate we also need to emulate the CCSIDR, CSSELR and
>> CLIDR registers which is done here for both 32 and 64bit versions of the
>> registers.
>> Add CTR register field definitions using Linux value and define names
>> and use the opportunity to rename CTR_L1Ip to use an upper case name
>> instead. The patch is also defining ICACHE_POLICY_xxx instead of only
>> having CTR_L1IP_xxx. Code using those defines is also updated by this
>> patch (arm32 setup).
>> On most big/LITTLE platforms this patch will activate TID2 and emulate
>> VIPT type of i-cache for all cores (as most LITTLE cores are VIPT where
>> big ones are PIPT). This is the case for example on Juno boards.
>> On platforms with only the same type of cores, this patch should not
>> modify the behaviour.
>> Signed-off-by: Bertrand Marquis 
>> ---
>> Changes in v3: none
>> Change in v2: Patch introduced in v2
>> ---
>>  xen/arch/arm/arm64/cpufeature.c  | 19 +++---
>>  xen/arch/arm/arm64/vsysreg.c | 40 
>>  xen/arch/arm/cpufeature.c|  2 ++
>>  xen/arch/arm/domain.c|  8 ++
>>  xen/arch/arm/setup.c |  2 +-
>>  xen/arch/arm/vcpreg.c| 45 
>>  xen/include/asm-arm/arm64/hsr.h  |  6 +
>>  xen/include/asm-arm/cpufeature.h | 11 
>>  xen/include/asm-arm/processor.h  | 18 ++---
>>  9 files changed, 143 insertions(+), 8 deletions(-)
>> diff --git a/xen/arch/arm/arm64/cpufeature.c 
>> b/xen/arch/arm/arm64/cpufeature.c
>> index b1936ef1d6..334d590ba0 100644
>> --- a/xen/arch/arm/arm64/cpufeature.c
>> +++ b/xen/arch/arm/arm64/cpufeature.c
>> @@ -275,9 +275,6 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
>>  ARM64_FTR_END,
>>  };
>>  -#if 0
>> -/* TODO: use this to sanitize the cache line size among cores */
>> -
>>  static const struct arm64_ftr_bits ftr_ctr[] = {
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1), /* RES1 */
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DIC_SHIFT, 
>> 1, 1),
>> @@ -294,7 +291,6 @@ static const struct arm64_ftr_bits ftr_ctr[] = {
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 
>> CTR_IMINLINE_SHIFT, 4, 0),
>>  ARM64_FTR_END,
>>  };
>> -#endif
>>static const struct arm64_ftr_bits ftr_id_mmfr0[] = {
>>  S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 
>> ID_MMFR0_INNERSHR_SHIFT, 4, 0xf),
>> @@ -510,6 +506,12 @@ static s64 arm64_ftr_safe_value(const struct 
>> arm64_ftr_bits *ftrp, s64 new,
>>   * End of imported linux structures and code
>>   */
>>  +/*
>> + * This is set to true if we have different type of i-caches on cores
>> + * and used to activate TID2 bit to emulate CTR_EL0 register for guests
>> + */
>> +bool mismatch_cache_type = false;
> 
> If we are still planning to trap and emulate the registers, then this needs 
> to be an HW capability (see cpus_set_cap()).

Ok

> 
>> +
>>  static void sanitize_reg(u64 *cur_reg, u64 new_reg, const char *reg_name,
>>  const struct arm64_ftr_bits 
>> *ftrp)
>>  {
>> @@ -600,6 +602,15 @@ void update_system_features(const struct cpuinfo_arm 
>> *new)
>>   */
>>  SANITIZE_REG(dczid, 0, dczid);
>>  +   SANITIZE_REG(ctr, 0, ctr);
>> +
>> +/*
>> + * If CTR is different among cores, set mismatch_cache_type to 

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

2021-08-31 Thread Andrew Cooper
On 30/08/2021 14:02, Jan Beulich wrote:
> One of the changes comprising the fixes for XSA-378 disallows replacing
> MMIO mappings by unintended (for this purpose) code paths.

Drop the brackets.

> At least in
> the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
> this is too strict. Generally short-circuit requests establishing the
> same kind of mapping that's already in place.
>
> Further permit "access" to differ in the "executable" attribute. While
> ideally only ROM regions would get mapped with X set, getting there is
> quite a bit of work. Therefore, as a temporary measure, permit X to
> vary. For Dom0 the more permissive of the types will be used, while for
> DomU it'll be the more restrictive one.

Split behaviour between dom0 and domU based on types alone cannot
possibly be correct.

DomU's need to execute ROMs too, and this looks like will malfunction if
a ROM ends up in the region that HVMLoader relocated RAM from.

As this is a temporary bodge emergency bugfix, don't try to be clever -
just take the latest access.

> While there, also add a log message to the other domain_crash()
> invocation that did prevent PVH Dom0 from coming up after the XSA-378
> changes.
>
> Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping 
> entries")
> Signed-off-by: Jan Beulich 
>
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d
>  if ( p2m_is_special(ot) )
>  {
>  /* Don't permit unmapping grant/foreign/direct-MMIO this way. */
> -domain_crash(d);
>  p2m_unlock(p2m);
> -
> +printk(XENLOG_G_ERR
> +   "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not permitted\n",

type and access need to be rendered in hex, or you need to use 0x
prefixes to distinguish the two bases.

Also, use commas rather than colons.  Visually, this is ambiguous with
PCI BDFs, and commas match tuple notation in most programming languages
which is the construct you're trying to represent.

Same below.

> +   d, gfn_x(gfn) + i,
> +   mfn_x(omfn), ot, a,
> +   mfn_x(mfn) + i, t, p2m->default_access);
> +domain_crash(d);
>  return -EPERM;
>  }
>  else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
> @@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do
>  }
>  if ( p2m_is_special(ot) )
>  {
> -gfn_unlock(p2m, gfn, order);
> -domain_crash(d);
> -return -EPERM;
> +bool done = false, bad = true;
> +
> +/* Special-case (almost) identical mappings. */
> +if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot )
> +{
> +/*
> + * For MMIO allow X to differ in the requests (to cover for
> + * set_identity_p2m_entry() and set_mmio_p2m_entry() differing in
> + * the way they specify "access"). For the hardware domain put 
> (or
> + * leave) in place the more permissive of the two possibilities,
> + * while for DomU-s go with the more restrictive variant.

This comment needs to identify clearly that it is a temporary bodge
intended to be removed.

~Andrew

> + */
> +if ( gfn_p2mt == p2m_mmio_direct &&
> + access <= p2m_access_rwx &&
> + (access ^ a) == p2m_access_x )
> +{
> +if ( is_hardware_domain(d) )
> +access |= p2m_access_x;
> +else
> +access &= ~p2m_access_x;
> +bad = access == p2m_access_n;
> +}
> +
> +if ( access == a )
> +done = true;
> +}
> +
> +if ( done )
> +{
> +gfn_unlock(p2m, gfn, order);
> +return 0;
> +}
> +
> +if ( bad )
> +{
> +gfn_unlock(p2m, gfn, order);
> +printk(XENLOG_G_ERR
> +   "%pd: GFN %lx (%lx:%u:%u:%u) -> (%lx:%u:%u:%u) not 
> permitted\n",
> +   d, gfn_l,
> +   mfn_x(omfn), cur_order, ot, a,
> +   mfn_x(mfn), order, gfn_p2mt, access);
> +domain_crash(d);
> +return -EPERM;
> +}
>  }
>  else if ( p2m_is_ram(ot) )
>  {
>





Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

2021-08-31 Thread Jan Beulich
On 30.08.2021 15:02, Jan Beulich wrote:
> One of the changes comprising the fixes for XSA-378 disallows replacing
> MMIO mappings by unintended (for this purpose) code paths. At least in
> the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
> this is too strict. Generally short-circuit requests establishing the
> same kind of mapping that's already in place.
> 
> Further permit "access" to differ in the "executable" attribute. While
> ideally only ROM regions would get mapped with X set, getting there is
> quite a bit of work. Therefore, as a temporary measure, permit X to
> vary. For Dom0 the more permissive of the types will be used, while for
> DomU it'll be the more restrictive one.
> 
> While there, also add a log message to the other domain_crash()
> invocation that did prevent PVH Dom0 from coming up after the XSA-378
> changes.
> 
> Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping 
> entries")
> Signed-off-by: Jan Beulich 

Btw, I had meant to have this post-commit-message remark here:

TBD: This could be generalized to all of R, W, and X. Dealing with just X
 is merely the minimum I found is immediately necessary.

Jan




Re: [PATCH 1/4] x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

2021-08-31 Thread Jan Beulich
On 31.08.2021 15:02, Andrew Cooper wrote:
> On 30/08/2021 14:02, Jan Beulich wrote:
>> One of the changes comprising the fixes for XSA-378 disallows replacing
>> MMIO mappings by unintended (for this purpose) code paths.
> 
> I'd drop the brackets.  All it does is confuse the sentence.
> 
>>  This means we
>> need to be more careful about the mappings put in place in this range -
>> mappings should be created exactly once:
>> - iommu_hwdom_init() comes first; it should avoid the first Mb,
>> - pvh_populate_p2m() should insert identity mappings only into ranges
>>   not populated as RAM,
>> - pvh_setup_acpi() should again avoid the first Mb, which was already
>>   dealt with at that point.
> 
> This really is a mess.  It also seems very fragile.

So it seems to me.

> Why is iommu_hwdom_init() separate in the first place?  It only does
> mappings up to 4G in the first place, and with this change, it is now
> [1M-4G)

I guess we'll want to wait for Roger to return to shed some light on
this.

>> @@ -1095,6 +1101,17 @@ static int __init pvh_setup_acpi(struct
>>  nr_pages = PFN_UP((d->arch.e820[i].addr & ~PAGE_MASK) +
>>d->arch.e820[i].size);
>>  
>> +/* Memory below 1MB has been dealt with by pvh_populate_p2m(). */
>> +if ( pfn < PFN_DOWN(MB(1)) )
>> +{
>> +if ( pfn + nr_pages <= PFN_DOWN(MB(1)) )
>> +continue;
>> +
>> +/* This shouldn't happen, but is easy to deal with. */
> 
> I'm not sure this comment is helpful.
> 
> Under PVH, there is nothing special about the 1M boundary, and we can
> reasonably have something else here or crossing the boundary.

As long as we have this special treatment of the low Mb, the boundary
is meaningful imo. I'd see the comment go away when the handling in
general gets streamlined.

> Preferably with this removed, Acked-by: Andrew Cooper
> , but only because this is an emergency fix.

Thanks. I see. You'll like patch 2 even less; at least I do. And I'm
not really certain that change is enough to cover all possible
systems.

> I really don't think it is an improvement to the logic.

Yet I suppose you also have no immediate suggestions towards doing
better? Of course right here a full rework is out of scope. But if
there were smaller bits that - if adjusted - would make you feel
better about the change as a whole, I'd be happy to consider making
adjustments.

Jan




Re: [PATCH 1/4] x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

2021-08-31 Thread Andrew Cooper
On 30/08/2021 14:02, Jan Beulich wrote:
> One of the changes comprising the fixes for XSA-378 disallows replacing
> MMIO mappings by unintended (for this purpose) code paths.

I'd drop the brackets.  All it does is confuse the sentence.

>  This means we
> need to be more careful about the mappings put in place in this range -
> mappings should be created exactly once:
> - iommu_hwdom_init() comes first; it should avoid the first Mb,
> - pvh_populate_p2m() should insert identity mappings only into ranges
>   not populated as RAM,
> - pvh_setup_acpi() should again avoid the first Mb, which was already
>   dealt with at that point.

This really is a mess.  It also seems very fragile.

Why is iommu_hwdom_init() separate in the first place?  It only does
mappings up to 4G in the first place, and with this change, it is now
[1M-4G)

All IOMMU modifications should be as a side effect of regular p2m/guest
physmap operations.  I suppose this is another breakage from the PV side
of things.

> @@ -1095,6 +1101,17 @@ static int __init pvh_setup_acpi(struct
>  nr_pages = PFN_UP((d->arch.e820[i].addr & ~PAGE_MASK) +
>d->arch.e820[i].size);
>  
> +/* Memory below 1MB has been dealt with by pvh_populate_p2m(). */
> +if ( pfn < PFN_DOWN(MB(1)) )
> +{
> +if ( pfn + nr_pages <= PFN_DOWN(MB(1)) )
> +continue;
> +
> +/* This shouldn't happen, but is easy to deal with. */

I'm not sure this comment is helpful.

Under PVH, there is nothing special about the 1M boundary, and we can
reasonably have something else here or crossing the boundary.


Preferably with this removed, Acked-by: Andrew Cooper
, but only because this is an emergency fix. 
I really don't think it is an improvement to the logic.

~Andrew




Re: [XEN PATCH] Config.mk: update OVMF to edk2-stable202108

2021-08-31 Thread Jan Beulich
On 31.08.2021 14:36, Anthony PERARD wrote:
> Update to the latest stable tag.

Largely out of curiosity - if that's truly a tag, ...

> --- a/Config.mk
> +++ b/Config.mk
> @@ -244,7 +244,7 @@ QEMU_TRADITIONAL_URL ?= 
> git://xenbits.xen.org/qemu-xen-traditional.git
>  SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
>  MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
>  endif
> -OVMF_UPSTREAM_REVISION ?= b37cfdd2807181aed2fee1e17bd7ec1190db266a
> +OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5

... why not refer to it here and instead spell out a hash?

Jan




[xen-4.15-testing test] 164636: regressions - FAIL

2021-08-31 Thread osstest service owner
flight 164636 xen-4.15-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164636/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-amd  8 xen-boot  fail REGR. vs. 163759
 test-amd64-amd64-dom0pvh-xl-intel  8 xen-bootfail REGR. vs. 163759

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-libvirt 20 guest-start/debian.repeat fail in 164564 pass in 
164636
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail pass in 164564
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat  fail pass in 164564

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 163759
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 163759
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 163759
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 163759
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 163759
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 163759
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 163759
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 163759
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 163759
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 163759
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 163759
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  91bb9e9b0c0e2af926ab08958f3d65f07a105cb6
baseline version:
 xen  

Re: [PATCH v1 01/14] xen/pci: Refactor MSI code that implements MSI functionality within XEN

2021-08-31 Thread Jan Beulich
On 31.08.2021 14:31, Rahul Singh wrote:
>> On 24 Aug 2021, at 4:53 pm, Jan Beulich  wrote:
>> On 19.08.2021 14:02, Rahul Singh wrote:
>>> --- /dev/null
>>> +++ b/xen/drivers/passthrough/msi.c
>>> @@ -0,0 +1,96 @@
>>> +/*
>>> + * Copyright (C) 2008,  Netronome Systems, Inc.
>>
>> While generally copying copyright statements when splitting source
>> files is probably wanted (or even necessary) I doubt this is
>> suitable here: None of the MSI code that you move was contributed
>> by them afaict.
> 
> Let me remove the "Copyright Copyright (C) 2008,  Netronome Systems, Inc.” . 
> Can you please help me what copyright I will add for the next patch ?

None? If you look around, you will find that far from all source files
have such a line (or multiple of them).

>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License 
>>> for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along 
>>> with
>>> + * this program; If not, see .
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>
>> You surely mean xen/msi.h here: Headers like this one should always
>> be included by the producer, no matter that it builds fine without.
>> Else you risk declarations and definitions to go out of sync.
> Ok . Let me include here “xen/msi.h” and move other required includes to 
> “xen/msi.h"

Why move stuff? xen/msi.h is fins to include asm/msi.h. It's just that
including asm/msi.h here is not enough.

Jan




Re: [PATCH v3 2/2] tools/xenstore: set open file descriptor limit for xenstored

2021-08-31 Thread Andrew Cooper
On 31/08/2021 13:30, Julien Grall wrote:
> Hi Juergen,
>
> On 31/08/2021 13:11, Juergen Gross wrote:
>> On 30.07.21 19:14, Julien Grall wrote:
>>> Hi Ian,
>>>
>>> On 30/07/2021 14:35, Ian Jackson wrote:
 Juergen Gross writes ("[PATCH v3 2/2] tools/xenstore: set open file
 descriptor limit for xenstored"):
> Add a configuration item for the maximum number of domains xenstored
> should support and set the limit of open file descriptors
> accordingly.
>
> For HVM domains there are up to 5 socket connections per domain (2 by
> the xl daemon process, and 3 by qemu). So set the ulimit for
> xenstored
> to 5 * XENSTORED_MAX_DOMAINS + 100 (the "+ 100" is for some headroom,
> like logging, event channel device, etc.).
 ...
> +## Type: integer
> +## Default: 32768
> +#
> +# Select maximum number of domains supported by xenstored.
> +# Only evaluated if XENSTORETYPE is "daemon".
> +#XENSTORED_MAX_N_DOMAINS=32768

 I approve of doing something about the fd limit.  I have some qualms
 about the documentation.

 The documentation doesn't say what happens if this limit is exceeded.
 Also the default of 32758 suggests that we actually support that many
 domains.  I don't think we do...

 I didn't find anything in SUPPORT.md about how many guests we support
 but I wouldn't want this setting here to imply full support for 32768
 domains.

 If you don't want to tackle this can of works, maybe add this:

    # This just controls some resource limits for xenstored; if the
    # limit is exceeded, xenstored will stop being able to function
    # properly for additional guests.  The default value is so large
    # that it won't be exceeded in a supported configuration, but
    # should not be taken to mean that the whole Xen system is
    # guaranteed to work properly with that many guests.

 Julien, did you ask for this to be made configurable ?  Having written
 the text above, I wonder if it wouldn't just be better to
 unconditionally set it to "unlimited" rather than offering footgun
 dressed up like a tuneable...
>>>
>>> So in v1 (see [1]), Juergen wanted to raise the limit. I assumed
>>> this meant that the default limit (configured by the system may not
>>> be enough).
>>>
>>> I felt this was wrong to impose an higher limit on everyone when an
>>> admin may know the maximum number of domains.
>>>
>>> By "unlimited", do you mean the calling "ulimit" (or whatever is
>>> used for configuring FDs) with unlimited?
>>>
>>> If so, I would be OK with that. My main was was to move the raising
>>> the limit outside Xenstored because:
>>>   1) This is easier for an admin to tweak it (in particular the OOM)
>>>   2) It feels wrong to me that the daemon chose the limits
>>>   3) An admin can enforce it
>>
>> Coming back to this series, I'm puzzled now.
>>
>> Julien, you didn't want me to raise the limit to a specific number
>> covering the maximum possible number of domains, because you thought
>> this might result in xenstored hogging huge numbers of file descriptors
>> in case of a bug. Why is unlimited better then? This will make the
>> possible number even larger.
>
> I don't think I suggested the unlimited number is better... My main
> objection in your original approach is you set an arbitrary limit you
> in Xenstored (which may not apply at all) and don't offer a way to the
> admin to tweak it.
>
> If the limit is set outside of Xenstored, then it becomes much easier
> for someone to just tweak the init script. I don't have a strong
> opinion on whether the default limit should be "unlimited" or a fixed
> number.

xenstored is TCB.  It needs a large number of FDs, and can be trusted
with unlimited.

Also, like xenconsoled, we can calculate an upper bound, which is
derived from the ABI limit of 32k domids.

All you're haggling over is the error semantics in the case of:
1) the upper bound calculation is wrong, or
2) there is an fd leak

Personally, I think a fixed calculation is right, so fd leaks can be
spotted more obviously.

An admin knob is not helpful - higher than the upper bound is just
wasteful, while lower will cause malfunctions.

~Andrew




[XEN PATCH] Config.mk: update OVMF to edk2-stable202108

2021-08-31 Thread Anthony PERARD
Update to the latest stable tag.

Signed-off-by: Anthony PERARD 
---
 Config.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Config.mk b/Config.mk
index 4d723eec1d0d..82b0c7c22743 100644
--- a/Config.mk
+++ b/Config.mk
@@ -244,7 +244,7 @@ QEMU_TRADITIONAL_URL ?= 
git://xenbits.xen.org/qemu-xen-traditional.git
 SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
 MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
 endif
-OVMF_UPSTREAM_REVISION ?= b37cfdd2807181aed2fee1e17bd7ec1190db266a
+OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5
 QEMU_UPSTREAM_REVISION ?= master
 MINIOS_UPSTREAM_REVISION ?= 051b87bb9c19609976fb038f386920e1ce5454c5
 
-- 
Anthony PERARD




Re: [PATCH v1 01/14] xen/pci: Refactor MSI code that implements MSI functionality within XEN

2021-08-31 Thread Rahul Singh
Hi Jan,

> On 24 Aug 2021, at 4:53 pm, Jan Beulich  wrote:
> 
> On 19.08.2021 14:02, Rahul Singh wrote:
>> --- /dev/null
>> +++ b/xen/drivers/passthrough/msi.c
>> @@ -0,0 +1,96 @@
>> +/*
>> + * Copyright (C) 2008,  Netronome Systems, Inc.
> 
> While generally copying copyright statements when splitting source
> files is probably wanted (or even necessary) I doubt this is
> suitable here: None of the MSI code that you move was contributed
> by them afaict.

Let me remove the "Copyright Copyright (C) 2008,  Netronome Systems, Inc.” . 
Can you please help me what copyright I will add for the next patch ?
> 
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along 
>> with
>> + * this program; If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
> 
> You surely mean xen/msi.h here: Headers like this one should always
> be included by the producer, no matter that it builds fine without.
> Else you risk declarations and definitions to go out of sync.
Ok . Let me include here “xen/msi.h” and move other required includes to 
“xen/msi.h"
> 
>> +#include 
>> +
>> +int pdev_msix_assign(struct domain *d, struct pci_dev *pdev)
>> +{
>> +int rc;
>> +
>> +if ( pdev->msix )
>> +{
>> +rc = pci_reset_msix_state(pdev);
>> +if ( rc )
>> +return rc;
>> +msixtbl_init(d);
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +int pdev_msi_init(struct pci_dev *pdev)
>> +{
>> +unsigned int pos;
>> +
>> +INIT_LIST_HEAD(>msi_list);
>> +
>> +pos = pci_find_cap_offset(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> +  PCI_FUNC(pdev->devfn), PCI_CAP_ID_MSI);
>> +if ( pos )
>> +{
>> +uint16_t ctrl = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
>> +
>> +pdev->msi_maxvec = multi_msi_capable(ctrl);
>> +}
>> +
>> +pos = pci_find_cap_offset(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> +  PCI_FUNC(pdev->devfn), PCI_CAP_ID_MSIX);
>> +if ( pos )
>> +{
>> +struct arch_msix *msix = xzalloc(struct arch_msix);
>> +uint16_t ctrl;
>> +
>> +if ( !msix )
>> +return -ENOMEM;
>> +
>> +spin_lock_init(>table_lock);
>> +
>> +ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
>> +msix->nr_entries = msix_table_size(ctrl);
>> +
>> +pdev->msix = msix;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +void pdev_msi_deinit(struct pci_dev *pdev)
>> +{
>> +XFREE(pdev->msix);
>> +}
>> +
>> +void pdev_dump_msi(const struct pci_dev *pdev)
>> +{
>> +const struct msi_desc *msi;
>> +
>> +printk("- MSIs < ");
>> +list_for_each_entry ( msi, >msi_list, list )
>> +printk("%d ", msi->irq);
>> +printk(">");
> 
> While not an exact equivalent of the original code then, could I
> talk you into adding an early list_empty() check, suppressing any
> output from this function if that one returned "true”?
Ok.
> 
>> @@ -1271,18 +1249,16 @@ bool_t pcie_aer_get_firmware_first(const struct 
>> pci_dev *pdev)
>> static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
>> {
>> struct pci_dev *pdev;
>> -struct msi_desc *msi;
>> 
>> printk(" segment %04x \n", pseg->nr);
>> 
>> list_for_each_entry ( pdev, >alldevs_list, alldevs_list )
>> {
>> -printk("%pp - %pd - node %-3d - MSIs < ",
>> +printk("%pp - %pd - node %-3d ",
> 
> Together with the request above the trailin blank here also wants to
> become a leading blank in pdev_dump_msi()
Ok.
>> --- /dev/null
>> +++ b/xen/include/xen/msi.h
>> @@ -0,0 +1,56 @@
>> +/*
>> + * Copyright (C) 2008,  Netronome Systems, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along 
>> with
>> + * this program; If not, see .
>> + */
>> +
>> +#ifndef __XEN_MSI_H_
>> +#define __XEN_MSI_H_
>> +
>> +#ifdef CONFIG_HAS_PCI_MSI
>> +
>> +#include 
>> +
>> +int 

Re: [PATCH v3 2/2] tools/xenstore: set open file descriptor limit for xenstored

2021-08-31 Thread Julien Grall

Hi Juergen,

On 31/08/2021 13:11, Juergen Gross wrote:

On 30.07.21 19:14, Julien Grall wrote:

Hi Ian,

On 30/07/2021 14:35, Ian Jackson wrote:
Juergen Gross writes ("[PATCH v3 2/2] tools/xenstore: set open file 
descriptor limit for xenstored"):

Add a configuration item for the maximum number of domains xenstored
should support and set the limit of open file descriptors accordingly.

For HVM domains there are up to 5 socket connections per domain (2 by
the xl daemon process, and 3 by qemu). So set the ulimit for xenstored
to 5 * XENSTORED_MAX_DOMAINS + 100 (the "+ 100" is for some headroom,
like logging, event channel device, etc.).

...

+## Type: integer
+## Default: 32768
+#
+# Select maximum number of domains supported by xenstored.
+# Only evaluated if XENSTORETYPE is "daemon".
+#XENSTORED_MAX_N_DOMAINS=32768


I approve of doing something about the fd limit.  I have some qualms
about the documentation.

The documentation doesn't say what happens if this limit is exceeded.
Also the default of 32758 suggests that we actually support that many
domains.  I don't think we do...

I didn't find anything in SUPPORT.md about how many guests we support
but I wouldn't want this setting here to imply full support for 32768
domains.

If you don't want to tackle this can of works, maybe add this:

   # This just controls some resource limits for xenstored; if the
   # limit is exceeded, xenstored will stop being able to function
   # properly for additional guests.  The default value is so large
   # that it won't be exceeded in a supported configuration, but
   # should not be taken to mean that the whole Xen system is
   # guaranteed to work properly with that many guests.

Julien, did you ask for this to be made configurable ?  Having written
the text above, I wonder if it wouldn't just be better to
unconditionally set it to "unlimited" rather than offering footgun
dressed up like a tuneable...


So in v1 (see [1]), Juergen wanted to raise the limit. I assumed this 
meant that the default limit (configured by the system may not be 
enough).


I felt this was wrong to impose an higher limit on everyone when an 
admin may know the maximum number of domains.


By "unlimited", do you mean the calling "ulimit" (or whatever is used 
for configuring FDs) with unlimited?


If so, I would be OK with that. My main was was to move the raising 
the limit outside Xenstored because:

  1) This is easier for an admin to tweak it (in particular the OOM)
  2) It feels wrong to me that the daemon chose the limits
  3) An admin can enforce it


Coming back to this series, I'm puzzled now.

Julien, you didn't want me to raise the limit to a specific number
covering the maximum possible number of domains, because you thought
this might result in xenstored hogging huge numbers of file descriptors
in case of a bug. Why is unlimited better then? This will make the
possible number even larger.


I don't think I suggested the unlimited number is better... My main 
objection in your original approach is you set an arbitrary limit you in 
Xenstored (which may not apply at all) and don't offer a way to the 
admin to tweak it.


If the limit is set outside of Xenstored, then it becomes much easier 
for someone to just tweak the init script. I don't have a strong opinion 
on whether the default limit should be "unlimited" or a fixed number.


Cheers,

--
Julien Grall



Re: [PATCH v3 1/2] tools/xenstore: set oom score for xenstore daemon on Linux

2021-08-31 Thread Juergen Gross

On 30.07.21 15:26, Ian Jackson wrote:

Juergen Gross writes ("[PATCH v3 1/2] tools/xenstore: set oom score for xenstore 
daemon on Linux"):

Xenstored is absolutely mandatory for a Xen host and it can't be
restarted, so being killed by OOM-killer in case of memory shortage is
to be avoided.

Set /proc/$pid/oom_score_adj (if available) per default to -500 (this
translates to 50% of dom0 memory size) in order to allow xenstored to
use large amounts of memory without being killed.

...

+## Type: integer
+## Default: 50
+#
+# Percentage of dom0 memory size the xenstore daemon can use before the
+# OOM killer is allowed to kill it.
+#XENSTORED_OOM_MEM_THRESHOLD=50
+
  ## Type: string
  ## Default: @LIBEXEC@/boot/xenstore-stubdom.gz


Thanks for working on this.  I approve of the principle.

I have one question about detail:


}
+   [ -z "$XENSTORED_OOM_MEM_THRESHOLD" ] || XENSTORED_OOM_MEM_THRESHOLD=50
+   XS_OOM_SCORE=-$(($XENSTORED_OOM_MEM_THRESHOLD * 10))
+
+   rm -f @XEN_RUN_DIR@/xenstored.pid

...

+   XS_PID=`cat @XEN_RUN_DIR@/xenstored.pid`
+   echo $XS_OOM_SCORE >/proc/$XS_PID/oom_score_adj


The effect of all this is that the value specified in
XENSTORED_OOM_MEM_THRESHOLD is transformed before being echoed into
/proc, by being multiplied by -10.


Yes.


Of course an alternative would be to ask the user to specify the
tuneable directly but given its rather more obscure semantics I think
it is reasonable to have this done by the script.


Correct. Otherwise the user would need to know about the oom_score_adj
ABI.


But maybe we could add something to the doc comment ?

Eg
   # (The specified value is multiplied by -10 and echoed into
   # /proc/PID/oom_score_adj.)

?


Why? This is an internal implementation detail. I don't see why the
user needs to know how this is accomplished. What is unclear with the
XENSTORED_OOM_MEM_THRESHOLD semantics as described?

There is no other parameter with an explanation how it's semantics are
being accomplished.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v3 2/2] tools/xenstore: set open file descriptor limit for xenstored

2021-08-31 Thread Juergen Gross

On 30.07.21 19:14, Julien Grall wrote:

Hi Ian,

On 30/07/2021 14:35, Ian Jackson wrote:
Juergen Gross writes ("[PATCH v3 2/2] tools/xenstore: set open file 
descriptor limit for xenstored"):

Add a configuration item for the maximum number of domains xenstored
should support and set the limit of open file descriptors accordingly.

For HVM domains there are up to 5 socket connections per domain (2 by
the xl daemon process, and 3 by qemu). So set the ulimit for xenstored
to 5 * XENSTORED_MAX_DOMAINS + 100 (the "+ 100" is for some headroom,
like logging, event channel device, etc.).

...

+## Type: integer
+## Default: 32768
+#
+# Select maximum number of domains supported by xenstored.
+# Only evaluated if XENSTORETYPE is "daemon".
+#XENSTORED_MAX_N_DOMAINS=32768


I approve of doing something about the fd limit.  I have some qualms
about the documentation.

The documentation doesn't say what happens if this limit is exceeded.
Also the default of 32758 suggests that we actually support that many
domains.  I don't think we do...

I didn't find anything in SUPPORT.md about how many guests we support
but I wouldn't want this setting here to imply full support for 32768
domains.

If you don't want to tackle this can of works, maybe add this:

   # This just controls some resource limits for xenstored; if the
   # limit is exceeded, xenstored will stop being able to function
   # properly for additional guests.  The default value is so large
   # that it won't be exceeded in a supported configuration, but
   # should not be taken to mean that the whole Xen system is
   # guaranteed to work properly with that many guests.

Julien, did you ask for this to be made configurable ?  Having written
the text above, I wonder if it wouldn't just be better to
unconditionally set it to "unlimited" rather than offering footgun
dressed up like a tuneable...


So in v1 (see [1]), Juergen wanted to raise the limit. I assumed this 
meant that the default limit (configured by the system may not be enough).


I felt this was wrong to impose an higher limit on everyone when an 
admin may know the maximum number of domains.


By "unlimited", do you mean the calling "ulimit" (or whatever is used 
for configuring FDs) with unlimited?


If so, I would be OK with that. My main was was to move the raising the 
limit outside Xenstored because:

  1) This is easier for an admin to tweak it (in particular the OOM)
  2) It feels wrong to me that the daemon chose the limits
  3) An admin can enforce it


Coming back to this series, I'm puzzled now.

Julien, you didn't want me to raise the limit to a specific number
covering the maximum possible number of domains, because you thought
this might result in xenstored hogging huge numbers of file descriptors
in case of a bug. Why is unlimited better then? This will make the
possible number even larger.

I'd really like to know how to come to an acceptable end result soon.
Right now the max number of domains supported by xenstored is just
limited by an arbitrary system wide limit.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[xen-4.14-testing test] 164678: trouble: pass/preparing/queued/running

2021-08-31 Thread osstest service owner
flight 164678 xen-4.14-testing running [real]
http://logs.test-lab.xenproject.org/osstest/logs/164678/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-pvshim queued
 test-amd64-i386-xl-qemut-debianhvm-amd64 queued
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm  queued
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm queued
 test-amd64-i386-xl-qemut-win7-amd64  queued
 test-amd64-i386-xl-qemut-ws16-amd64  queued
 test-amd64-i386-xl-qemuu-debianhvm-amd64 queued
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  queued
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  queued
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict queued
 test-amd64-i386-xl-qemuu-ovmf-amd64  queued
 test-amd64-i386-xl-rawqueued
 test-amd64-i386-xl-shadow queued
 test-amd64-i386-xl-xsmqueued
 test-armhf-armhf-libvirt  queued
 test-armhf-armhf-libvirt-raw  queued
 test-armhf-armhf-xl   queued
 test-armhf-armhf-xl-arndale   queued
 test-armhf-armhf-xl-credit1   queued
 test-armhf-armhf-xl-credit2   queued
 test-armhf-armhf-xl-cubietruck  queued
 test-armhf-armhf-xl-multivcpu  queued
 test-armhf-armhf-xl-rtds  queued
 build-armhf-libvirt   queued
 build-i386-libvirtqueued
 test-amd64-i386-xl-qemuu-win7-amd64  queued
 test-amd64-i386-xl-qemuu-ws16-amd64  queued
 test-amd64-coresched-i386-xl  queued
 test-amd64-i386-freebsd10-amd64  queued
 test-amd64-i386-freebsd10-i386  queued
 test-amd64-i386-libvirt   queued
 test-amd64-i386-libvirt-pair  queued
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmqueued
 test-amd64-i386-libvirt-xsm   queued
 test-amd64-i386-livepatch queued
 test-amd64-i386-migrupgrade   queued
 test-amd64-i386-pair  queued
 test-amd64-i386-qemut-rhel6hvm-amd  queued
 test-amd64-i386-qemut-rhel6hvm-intel  queued
 test-amd64-i386-qemuu-rhel6hvm-amd  queued
 test-amd64-i386-qemuu-rhel6hvm-intel  queued
 test-amd64-i386-xlqueued
 test-armhf-armhf-xl-vhd   queued
 test-amd64-amd64-dom0pvh-xl-amd  3 hosts-allocate   running
 test-amd64-amd64-dom0pvh-xl-intel  3 hosts-allocate   running
 test-amd64-amd64-libvirt  3 hosts-allocate   running
 build-i3862 hosts-allocate   running
 build-i386-xsm2 hosts-allocate   running
 test-xtf-amd64-amd64-33 hosts-allocate   running
 build-i386-prev   2 hosts-allocate   running
 test-arm64-arm64-xl-xsm   3 hosts-allocate   running
 test-amd64-amd64-xl-qemuu-ws16-amd64  3 hosts-allocate   running
 test-amd64-amd64-libvirt-vhd  3 hosts-allocate   running
 test-amd64-amd64-libvirt-xsm  3 hosts-allocate   running
 test-xtf-amd64-amd64-23 hosts-allocate   running
 test-xtf-amd64-amd64-53 hosts-allocate   running
 test-amd64-coresched-amd64-xl  3 hosts-allocate   running
 test-amd64-amd64-xl-xsm   3 hosts-allocate   running
 test-amd64-amd64-xl-rtds  3 hosts-allocate   running
 test-amd64-amd64-xl-qemuu-win7-amd64  3 hosts-allocate   running
 test-amd64-amd64-xl-qemuu-ovmf-amd64  3 hosts-allocate   running
 test-amd64-amd64-xl   3 hosts-allocate   running
 test-amd64-amd64-qemuu-nested-intel  3 hosts-allocate   running
 test-amd64-amd64-pygrub   3 hosts-allocate   running
 test-xtf-amd64-amd64-13 hosts-allocate   running
 test-amd64-amd64-xl-shadow3 hosts-allocate   running
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 3 hosts-allocate running
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  3 hosts-allocate running
 test-amd64-amd64-xl-qemut-win7-amd64  3 hosts-allocate   running
 test-amd64-amd64-xl-pvshim3 hosts-allocate   running
 test-amd64-amd64-xl-pvhv2-amd  3 hosts-allocate   running
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm  3 hosts-allocate   running
 test-amd64-amd64-livepatch3 hosts-allocate   running
 test-amd64-amd64-libvirt-pair  4 hosts-allocate   running
 

4.13..4.15 testing suspended pending XSA-378 fixes

2021-08-31 Thread Ian Jackson
I had the following irc conversation with Jan.  I wanted to make a record
(in particular, to refer to from the stop files in osstest):

09:36  Diziet: To give osstest some relief, and to also have fixes 
 for the XSA-378 fallout have a faster path to an eventual push 
 (provided they get reviewed / agreed upon), could you please
09:37  - put the OVMF backport also on 4.11 (which ought to allow 
 that branch to get a push)
09:37  - suspend testing on 4.13-4.15 for the time being?
10:46 <@Diziet> jbeulich: I pushed the ovmf fix.  Let me look at the current 
flights
10:50 <@Diziet> Your point about the XSA fallout fixes is that you would like 
to have osstest testing putative fixes promptly ?
11:01  Yes. With all the other branches also getting continuously 
 tested, the latency for unstable flights to complete is quite 
 high.
11:02  And an initial set of fixes has been posted.
11:27 <@Diziet> Done.  I've cancelled one not-yet-started but doomed 4.14 
flight too



RE: [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory with E820 map

2021-08-31 Thread Wei Chen
Hi Stefano,

> -Original Message-
> From: Stefano Stabellini 
> Sent: 2021年8月31日 9:08
> To: Wei Chen 
> Cc: xen-devel@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org;
> jbeul...@suse.com; Bertrand Marquis 
> Subject: Re: [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory
> with E820 map
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > We will reuse nodes_cover_memory for Arm to check its bootmem
> > info. So we introduce two arch helpers to get memory map's
> > entry number and specified entry's range:
> > arch_get_memory_bank_number
> > arch_get_memory_bank_range
> >
> > Depends above two helpers, we make nodes_cover_memory become
> > architecture independent.
> 
> You might want to note that the only change from an x86 perspective is
> the additional checks:
> 
>   !start || !end
> 

Thanks, I will add it.

> 
> > Signed-off-by: Wei Chen 
> > ---
> >  xen/arch/x86/numa.c| 18 ++
> >  xen/arch/x86/srat.c|  8 +++-
> >  xen/include/xen/numa.h |  4 
> >  3 files changed, 25 insertions(+), 5 deletions(-)
> >
> > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> > index 6908738305..8b43be4aa7 100644
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -128,6 +128,24 @@ unsigned int __init arch_get_dma_bitsize(void)
> >   + PAGE_SHIFT, 32);
> >  }
> >
> > +uint32_t __init arch_meminfo_get_nr_bank(void)
> > +{
> > +   return e820.nr_map;
> > +}
> > +
> > +int __init arch_meminfo_get_ram_bank_range(int bank,
> > +   unsigned long long *start, unsigned long long *end)
> > +{
> > +   if (e820.map[bank].type != E820_RAM || !start || !end) {
> > +   return -1;
> > +   }
> > +
> > +   *start = e820.map[bank].addr;
> > +   *end = e820.map[bank].addr + e820.map[bank].size;
> > +
> > +   return 0;
> > +}
> > +
> >  static void dump_numa(unsigned char key)
> >  {
> >  s_time_t now = NOW();
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index 6d68b8a614..2298353846 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -316,18 +316,16 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> >  static int __init nodes_cover_memory(void)
> >  {
> > int i;
> > +   uint32_t nr_banks = arch_meminfo_get_nr_bank();
> >
> > -   for (i = 0; i < e820.nr_map; i++) {
> > +   for (i = 0; i < nr_banks; i++) {
> > int j, found;
> > unsigned long long start, end;
> >
> > -   if (e820.map[i].type != E820_RAM) {
> > +   if (arch_meminfo_get_ram_bank_range(i, , )) {
> > continue;
> > }
> >
> > -   start = e820.map[i].addr;
> > -   end = e820.map[i].addr + e820.map[i].size;
> > -
> > do {
> > found = 0;
> > for_each_node_mask(j, memory_nodes_parsed)
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index 0475823b13..6d18059bcd 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -89,6 +89,10 @@ static inline void clear_node_cpumask(int cpu)
> > cpumask_clear_cpu(cpu, _to_cpumask[cpu_to_node(cpu)]);
> >  }
> >
> > +extern uint32_t arch_meminfo_get_nr_bank(void);
> > +extern int arch_meminfo_get_ram_bank_range(int bank,
> > +unsigned long long *start, unsigned long long *end);
> > +
> >  #endif /* CONFIG_NUMA */
> >
> >  #endif /* _XEN_NUMA_H */
> > --
> > 2.25.1
> >


RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map

2021-08-31 Thread Wei Chen
Hi Stefano,

> -Original Message-
> From: Stefano Stabellini 
> Sent: 2021年8月31日 8:48
> To: Wei Chen 
> Cc: xen-devel@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org;
> jbeul...@suse.com; Bertrand Marquis 
> Subject: Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse
> device tree NUMA distance map
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > A NUMA aware device tree will provide a "distance-map" node to
> > describe distance between any two nodes. This patch introduce a
> > new helper to parse this distance map.
> >
> > Signed-off-by: Wei Chen 
> > ---
> >  xen/arch/arm/numa_device_tree.c | 67 +
> >  1 file changed, 67 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index bbe081dcd1..6e0d1d3d9f 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt,
> int node,
> >
> >  return 0;
> >  }
> > +
> > +/* Parse NUMA distance map v1 */
> > +int __init
> > +device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
> > +{
> > +const struct fdt_property *prop;
> > +const __be32 *matrix;
> > +int entry_count, len, i;
> > +
> > +printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> > +
> > +prop = fdt_get_property(fdt, node, "distance-matrix", );
> > +if ( !prop )
> > +{
> > +printk(XENLOG_WARNING
> > +   "NUMA: No distance-matrix property in distance-map\n");
> > +
> > +return -EINVAL;
> > +}
> > +
> > +if ( len % sizeof(uint32_t) != 0 )
> > +{
> > +printk(XENLOG_WARNING
> > +   "distance-matrix in node is not a multiple of u32\n");
> > +return -EINVAL;
> > +}
> > +
> > +entry_count = len / sizeof(uint32_t);
> > +if ( entry_count <= 0 )
> > +{
> > +printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> > +
> > +return -EINVAL;
> > +}
> > +
> > +matrix = (const __be32 *)prop->data;
> > +for ( i = 0; i + 2 < entry_count; i += 3 )
> > +{
> > +uint32_t from, to, distance;
> > +
> > +from = dt_read_number(matrix, 1);
> > +matrix++;
> > +to = dt_read_number(matrix, 1);
> > +matrix++;
> > +distance = dt_read_number(matrix, 1);
> > +matrix++;
> > +
> > +if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> > +(from != to && distance <= NUMA_LOCAL_DISTANCE) )
> > +{
> > +printk(XENLOG_WARNING
> > +   "Invalid nodes' distance from node#%d to node#%d
> = %d\n",
> > +   from, to, distance);
> > +return -EINVAL;
> > +}
> > +
> > +printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d
> = %d\n",
> > +   from, to, distance);
> > +numa_set_distance(from, to, distance);
> > +
> > +/* Set default distance of node B->A same as A->B */
> > +if (to > from)
> > + numa_set_distance(to, from, distance);
> 
> I am a bit unsure about this last 2 lines: why calling numa_set_distance
> in the opposite direction only when to > from? Wouldn't it be OK to
> always do both:
> 
> numa_set_distance(from, to, distance);
> numa_set_distance(to, from, distance);
> 
> ?
> 
I borrowed this code from Linux, but here is my understanding:

First, I read some notes in Documentation/devicetree/bindings/numa.txt
1. Each entry represents distance from first node to second node.
The distances are equal in either direction.
2. distance-matrix should have entries in lexicographical ascending
order of nodes.

Here is an example of distance-map node in DTB:
Sample#1, full list:
distance-map {
 compatible = "numa-distance-map-v1";
 distance-matrix = <0 0  10>,
   <0 1  20>,
   <0 2  40>,
   <0 3  20>,
   <1 0  20>,
   <1 1  10>,
   <1 2  20>,
   <1 3  40>,
   <2 0  40>,
   <2 1  20>,
   <2 2  10>,
   <2 3  20>,
   <3 0  20>,
   <3 1  40>,
   <3 2  20>,
   <3 3  10>;
};

Call numa_set_distance when "to > from" will prevent Xen to call
numa_set_distance(0, 1, 20) again when it's setting distance for <1 0 20>.
But, numa_set_distance(1, 0, 20) will be call twice.

Normally, distance-map node will be 

Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments

2021-08-31 Thread Jan Beulich
On 30.08.2021 15:01, Jan Beulich wrote:
> The code building PVH Dom0 made use of sequences of P2M changes
> which are disallowed as of XSA-378. First of all population of the
> first Mb of memory needs to be redone. Then, largely as a
> workaround, checking introduced by XSA-378 needs to be slightly
> relaxed.
> 
> Note that with these adjustments I get Dom0 to start booting on my
> development system, but the Dom0 kernel then gets stuck. Since it
> was the first time for me to try PVH Dom0 in this context (see
> below for why I was hesitant), I cannot tell yet whether this is
> due further fallout from the XSA, or some further unrelated
> problem. Dom0's BSP is in VPF_blocked state while all APs are
> still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce
> any output, so it's non-trivial to check whether (like PV likes to
> do) Dom0 has panic()ed without leaving any (visible) output.

Correction: I did mean '0' here, producing merely

(XEN) '0' pressed -> dumping Dom0's registers
(XEN) *** Dumping Dom0 vcpu#0 state: ***
(XEN) *** Dumping Dom0 vcpu#1 state: ***
(XEN) *** Dumping Dom0 vcpu#2 state: ***
(XEN) *** Dumping Dom0 vcpu#3 state: ***

'd' output supports the "system is idle" that was also visible from
'q' output.

Jan




Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Jan Beulich
On 31.08.2021 10:14, Oleksandr Andrushchenko wrote:
> On 31.08.21 11:05, Jan Beulich wrote:
>> On 31.08.2021 09:56, Oleksandr Andrushchenko wrote:
>>> On 31.08.21 10:47, Jan Beulich wrote:
 On 31.08.2021 09:06, Oleksandr Andrushchenko wrote:
> On 31.08.21 09:51, Jan Beulich wrote:
>> On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:
>>> On 30.08.21 16:04, Jan Beulich wrote:
 @@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
   * Check for overlaps with other BARs. Note that only BARs 
 that are
   * currently mapped (enabled) are checked for overlaps.
   */
 -for_each_pdev ( pdev->domain, tmp )
 +for ( d = pdev->domain; ; d = dom_xen ) {//todo
>>> I am not quite sure this will be correct for the cases where 
>>> pdev->domain != dom0,
>>> e.g. in the series for PCI passthrough for Arm this can be any guest. 
>>> For such cases
>>> we'll force running the loop for dom_xen which I am not sure is 
>>> desirable.
>> It is surely not desirable, but it also doesn't happen - see the
>> is_hardware_domain() check further down (keeping context below).
> Right
>>> Another question is why such a hidden device has its pdev->domain not 
>>> set correctly,
>>> so we need to work this around?
>> Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
>> ("PCI: don't allow guest assignment of devices used by Xen")
>> introducing that temporary override. To permit limited
>> visibility to Dom0, these devices still need setting up in the
>> IOMMU for Dom0. Consequently BAR overlap detection also needs
>> to take these into account (i.e. the goal here is not just to
>> prevent triggering the ASSERT() in question).
> So, why don't we set pdev->domain = dom_xen for such devices and call
> modify_bars or something from pci_hide_device for instance (I didn't get 
> too
> much into implementation details though)? If pci_hide_device already 
> handles
> such exceptions, so it should also take care of the correct BAR overlaps 
> etc.
 How would it? It runs long before Dom0 gets created, let alone when
 Dom0 may make adjustments to the BAR arrangement.
>>> So, why don't we call "yet another hide function" while creating Dom0 for 
>>> that
>>> exactly reason, e.g. BAR overlap handling? E.g. make it 2-stage hide for 
>>> special
>>> devices such as console etc.
>> This might be an option, but is imo going to result not only in more
>> code churn, but also in redundant code. After all what modify_bars()
>> needs is the union of BARs from Dom0's and DomXEN's devices.
> 
> To me DomXEN here is yet another workaround as strictly speaking
> vpci code didn't need and doesn't(?) need it at the moment. Yes, at least on 
> Arm.
> So, I do understand why you want it there, but this then does need a very
> good description of what is happening and why...
> 
>>
 The temporary overriding of pdev->domain is because other IOMMU code
 takes the domain to act upon from that field.
>>> So, you mean pdev->domain in that case is pointing to what?
>> Did you look at the function I've pointed you at? DomXEN there gets
>> temporarily overridden to Dom0.
> 
> This looks like yet another workaround to me which is not cute.
> So, the overall solution is spread over multiple subsystems, each
> introducing something which is hard to follow

If you have any better suggestions, I'm all ears. Or feel free to send
patches.

Jan




[xen-4.14-testing test] 164631: regressions - FAIL

2021-08-31 Thread osstest service owner
flight 164631 xen-4.14-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164631/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-amd  8 xen-boot  fail REGR. vs. 163750
 test-amd64-amd64-dom0pvh-xl-intel  8 xen-bootfail REGR. vs. 163750

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
164507 pass in 164631
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat  fail pass in 164507
 test-amd64-i386-xl   20 guest-localmigrate/x10 fail pass in 164547
 test-amd64-i386-xl-qemuu-ws16-amd64 12 windows-install fail pass in 164547

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop   fail in 164547 like 163750
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 163750
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 163750
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 163750
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 163750
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 163750
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 163750
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 163750
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 163750
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 163750
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 163750
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 xen  

Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Oleksandr Andrushchenko



On 31.08.21 11:05, Jan Beulich wrote:

On 31.08.2021 09:56, Oleksandr Andrushchenko wrote:

On 31.08.21 10:47, Jan Beulich wrote:

On 31.08.2021 09:06, Oleksandr Andrushchenko wrote:

On 31.08.21 09:51, Jan Beulich wrote:

On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:

On 30.08.21 16:04, Jan Beulich wrote:

@@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
  * Check for overlaps with other BARs. Note that only BARs that are
  * currently mapped (enabled) are checked for overlaps.
  */
-for_each_pdev ( pdev->domain, tmp )
+for ( d = pdev->domain; ; d = dom_xen ) {//todo

I am not quite sure this will be correct for the cases where pdev->domain != 
dom0,
e.g. in the series for PCI passthrough for Arm this can be any guest. For such 
cases
we'll force running the loop for dom_xen which I am not sure is desirable.

It is surely not desirable, but it also doesn't happen - see the
is_hardware_domain() check further down (keeping context below).

Right

Another question is why such a hidden device has its pdev->domain not set 
correctly,
so we need to work this around?

Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
("PCI: don't allow guest assignment of devices used by Xen")
introducing that temporary override. To permit limited
visibility to Dom0, these devices still need setting up in the
IOMMU for Dom0. Consequently BAR overlap detection also needs
to take these into account (i.e. the goal here is not just to
prevent triggering the ASSERT() in question).

So, why don't we set pdev->domain = dom_xen for such devices and call
modify_bars or something from pci_hide_device for instance (I didn't get too
much into implementation details though)? If pci_hide_device already handles
such exceptions, so it should also take care of the correct BAR overlaps etc.

How would it? It runs long before Dom0 gets created, let alone when
Dom0 may make adjustments to the BAR arrangement.

So, why don't we call "yet another hide function" while creating Dom0 for that
exactly reason, e.g. BAR overlap handling? E.g. make it 2-stage hide for special
devices such as console etc.

This might be an option, but is imo going to result not only in more
code churn, but also in redundant code. After all what modify_bars()
needs is the union of BARs from Dom0's and DomXEN's devices.


To me DomXEN here is yet another workaround as strictly speaking

vpci code didn't need and doesn't(?) need it at the moment. Yes, at least on 
Arm.

So, I do understand why you want it there, but this then does need a very

good description of what is happening and why...




The temporary overriding of pdev->domain is because other IOMMU code
takes the domain to act upon from that field.

So, you mean pdev->domain in that case is pointing to what?

Did you look at the function I've pointed you at? DomXEN there gets
temporarily overridden to Dom0.


This looks like yet another workaround to me which is not cute.

So, the overall solution is spread over multiple subsystems, each

introducing something which is hard to follow




   This could have been
solved without override, but then much heavier code churn would have
resulted.


Otherwise it looks like we put some unrelated logic into vpci which is for 
hiding
the devices (on x86).

Hiding devices is in no way x86-specific.

I mean that the use-case you have, e.g. a *PCI* console you want to hide,
is definitely not something used on Arm at least.

Not yet, that is? Why would - in the long run - somebody not want to
put in a PCI serial card in a system that supports PCI and has no
(available) other serial port? And if you have looked at the commit
I did point you at

I did

, you will also have found that it's more than just
the serial device that we hide.

Serial was just an example from the list


Jan





Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Jan Beulich
On 31.08.2021 09:56, Oleksandr Andrushchenko wrote:
> On 31.08.21 10:47, Jan Beulich wrote:
>> On 31.08.2021 09:06, Oleksandr Andrushchenko wrote:
>>> On 31.08.21 09:51, Jan Beulich wrote:
 On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:
> On 30.08.21 16:04, Jan Beulich wrote:
>> @@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
>>  * Check for overlaps with other BARs. Note that only BARs that 
>> are
>>  * currently mapped (enabled) are checked for overlaps.
>>  */
>> -for_each_pdev ( pdev->domain, tmp )
>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo
> I am not quite sure this will be correct for the cases where pdev->domain 
> != dom0,
> e.g. in the series for PCI passthrough for Arm this can be any guest. For 
> such cases
> we'll force running the loop for dom_xen which I am not sure is desirable.
 It is surely not desirable, but it also doesn't happen - see the
 is_hardware_domain() check further down (keeping context below).
>>> Right
> Another question is why such a hidden device has its pdev->domain not set 
> correctly,
> so we need to work this around?
 Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
 ("PCI: don't allow guest assignment of devices used by Xen")
 introducing that temporary override. To permit limited
 visibility to Dom0, these devices still need setting up in the
 IOMMU for Dom0. Consequently BAR overlap detection also needs
 to take these into account (i.e. the goal here is not just to
 prevent triggering the ASSERT() in question).
>>> So, why don't we set pdev->domain = dom_xen for such devices and call
>>> modify_bars or something from pci_hide_device for instance (I didn't get too
>>> much into implementation details though)? If pci_hide_device already handles
>>> such exceptions, so it should also take care of the correct BAR overlaps 
>>> etc.
>> How would it? It runs long before Dom0 gets created, let alone when
>> Dom0 may make adjustments to the BAR arrangement.
> 
> So, why don't we call "yet another hide function" while creating Dom0 for that
> exactly reason, e.g. BAR overlap handling? E.g. make it 2-stage hide for 
> special
> devices such as console etc.

This might be an option, but is imo going to result not only in more
code churn, but also in redundant code. After all what modify_bars()
needs is the union of BARs from Dom0's and DomXEN's devices.

>> The temporary overriding of pdev->domain is because other IOMMU code
>> takes the domain to act upon from that field.
> 
> So, you mean pdev->domain in that case is pointing to what?

Did you look at the function I've pointed you at? DomXEN there gets
temporarily overridden to Dom0.

>>   This could have been
>> solved without override, but then much heavier code churn would have
>> resulted.
>>
>>> Otherwise it looks like we put some unrelated logic into vpci which is for 
>>> hiding
>>> the devices (on x86).
>> Hiding devices is in no way x86-specific.
> 
> I mean that the use-case you have, e.g. a *PCI* console you want to hide,
> is definitely not something used on Arm at least.

Not yet, that is? Why would - in the long run - somebody not want to
put in a PCI serial card in a system that supports PCI and has no
(available) other serial port? And if you have looked at the commit
I did point you at, you will also have found that it's more than just
the serial device that we hide.

Jan




Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Oleksandr Andrushchenko



On 31.08.21 10:47, Jan Beulich wrote:

On 31.08.2021 09:06, Oleksandr Andrushchenko wrote:

On 31.08.21 09:51, Jan Beulich wrote:

On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:

Hello, Jan!

On 30.08.21 16:04, Jan Beulich wrote:

Hidden devices (e.g. an add-in PCI serial card used for Xen's serial
console) are associated with DomXEN, not Dom0. This means that while
looking for overlapping BARs such devices cannot be found on Dom0's
list of devices; DomXEN's list also needs to be scanned.

Signed-off-by: Jan Beulich 
---
RFC: Patch intentionally mis-formatted, as the necessary re-indentation
would make the diff difficult to read. At this point I'd merely
like to gather input towards possible better approaches to solve
the issue (not the least because quite possibly there are further
places needing changing).

--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -206,6 +206,7 @@ static int modify_bars(const struct pci_
struct vpci_header *header = >vpci->header;
struct rangeset *mem = rangeset_new(NULL, NULL, 0);
struct pci_dev *tmp, *dev = NULL;
+const struct domain *d;
const struct vpci_msix *msix = pdev->vpci->msix;
unsigned int i;
int rc;
@@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
 * Check for overlaps with other BARs. Note that only BARs that are
 * currently mapped (enabled) are checked for overlaps.
 */
-for_each_pdev ( pdev->domain, tmp )
+for ( d = pdev->domain; ; d = dom_xen ) {//todo

I am not quite sure this will be correct for the cases where pdev->domain != 
dom0,
e.g. in the series for PCI passthrough for Arm this can be any guest. For such 
cases
we'll force running the loop for dom_xen which I am not sure is desirable.

It is surely not desirable, but it also doesn't happen - see the
is_hardware_domain() check further down (keeping context below).

Right

Another question is why such a hidden device has its pdev->domain not set 
correctly,
so we need to work this around?

Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
("PCI: don't allow guest assignment of devices used by Xen")
introducing that temporary override. To permit limited
visibility to Dom0, these devices still need setting up in the
IOMMU for Dom0. Consequently BAR overlap detection also needs
to take these into account (i.e. the goal here is not just to
prevent triggering the ASSERT() in question).

So, why don't we set pdev->domain = dom_xen for such devices and call
modify_bars or something from pci_hide_device for instance (I didn't get too
much into implementation details though)? If pci_hide_device already handles
such exceptions, so it should also take care of the correct BAR overlaps etc.

How would it? It runs long before Dom0 gets created, let alone when
Dom0 may make adjustments to the BAR arrangement.


So, why don't we call "yet another hide function" while creating Dom0 for that

exactly reason, e.g. BAR overlap handling? E.g. make it 2-stage hide for special

devices such as console etc.



The temporary overriding of pdev->domain is because other IOMMU code
takes the domain to act upon from that field.


So, you mean pdev->domain in that case is pointing to what?



  This could have been
solved without override, but then much heavier code churn would have
resulted.


Otherwise it looks like we put some unrelated logic into vpci which is for 
hiding
the devices (on x86).

Hiding devices is in no way x86-specific.


I mean that the use-case you have, e.g. a *PCI* console you want to hide,

is definitely not something used on Arm at least.



Jan





Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Jan Beulich
On 31.08.2021 09:06, Oleksandr Andrushchenko wrote:
> 
> On 31.08.21 09:51, Jan Beulich wrote:
>> On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:
>>> Hello, Jan!
>>>
>>> On 30.08.21 16:04, Jan Beulich wrote:
 Hidden devices (e.g. an add-in PCI serial card used for Xen's serial
 console) are associated with DomXEN, not Dom0. This means that while
 looking for overlapping BARs such devices cannot be found on Dom0's
 list of devices; DomXEN's list also needs to be scanned.

 Signed-off-by: Jan Beulich 
 ---
 RFC: Patch intentionally mis-formatted, as the necessary re-indentation
would make the diff difficult to read. At this point I'd merely
like to gather input towards possible better approaches to solve
the issue (not the least because quite possibly there are further
places needing changing).

 --- a/xen/drivers/vpci/header.c
 +++ b/xen/drivers/vpci/header.c
 @@ -206,6 +206,7 @@ static int modify_bars(const struct pci_
struct vpci_header *header = >vpci->header;
struct rangeset *mem = rangeset_new(NULL, NULL, 0);
struct pci_dev *tmp, *dev = NULL;
 +const struct domain *d;
const struct vpci_msix *msix = pdev->vpci->msix;
unsigned int i;
int rc;
 @@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
 * Check for overlaps with other BARs. Note that only BARs that are
 * currently mapped (enabled) are checked for overlaps.
 */
 -for_each_pdev ( pdev->domain, tmp )
 +for ( d = pdev->domain; ; d = dom_xen ) {//todo
>>> I am not quite sure this will be correct for the cases where pdev->domain 
>>> != dom0,
>>> e.g. in the series for PCI passthrough for Arm this can be any guest. For 
>>> such cases
>>> we'll force running the loop for dom_xen which I am not sure is desirable.
>> It is surely not desirable, but it also doesn't happen - see the
>> is_hardware_domain() check further down (keeping context below).
> Right
>>
>>> Another question is why such a hidden device has its pdev->domain not set 
>>> correctly,
>>> so we need to work this around?
>> Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
>> ("PCI: don't allow guest assignment of devices used by Xen")
>> introducing that temporary override. To permit limited
>> visibility to Dom0, these devices still need setting up in the
>> IOMMU for Dom0. Consequently BAR overlap detection also needs
>> to take these into account (i.e. the goal here is not just to
>> prevent triggering the ASSERT() in question).
> 
> So, why don't we set pdev->domain = dom_xen for such devices and call
> modify_bars or something from pci_hide_device for instance (I didn't get too
> much into implementation details though)? If pci_hide_device already handles
> such exceptions, so it should also take care of the correct BAR overlaps etc.

How would it? It runs long before Dom0 gets created, let alone when
Dom0 may make adjustments to the BAR arrangement.

The temporary overriding of pdev->domain is because other IOMMU code
takes the domain to act upon from that field. This could have been
solved without override, but then much heavier code churn would have
resulted.

> Otherwise it looks like we put some unrelated logic into vpci which is for 
> hiding
> the devices (on x86).

Hiding devices is in no way x86-specific.

Jan




[PATCH] Config: use Mini-OS master for xen-unstable

2021-08-31 Thread Juergen Gross
In order to get Mini-OS master branch tested by OSStest, use it for
xen-unstable.

Signed-off-by: Juergen Gross 
---
There are some Mini-OS patches pending, of which at least the Mini-OS
netfront fix should be committed as it will be required for qemu-stubdom
to work.
---
 Config.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Config.mk b/Config.mk
index 4d723eec1d..d4e8ced104 100644
--- a/Config.mk
+++ b/Config.mk
@@ -246,7 +246,7 @@ MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
 endif
 OVMF_UPSTREAM_REVISION ?= b37cfdd2807181aed2fee1e17bd7ec1190db266a
 QEMU_UPSTREAM_REVISION ?= master
-MINIOS_UPSTREAM_REVISION ?= 051b87bb9c19609976fb038f386920e1ce5454c5
+MINIOS_UPSTREAM_REVISION ?= master
 
 SEABIOS_UPSTREAM_REVISION ?= rel-1.14.0
 
-- 
2.26.2




Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Oleksandr Andrushchenko



On 31.08.21 09:51, Jan Beulich wrote:

On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:

Hello, Jan!

On 30.08.21 16:04, Jan Beulich wrote:

Hidden devices (e.g. an add-in PCI serial card used for Xen's serial
console) are associated with DomXEN, not Dom0. This means that while
looking for overlapping BARs such devices cannot be found on Dom0's
list of devices; DomXEN's list also needs to be scanned.

Signed-off-by: Jan Beulich 
---
RFC: Patch intentionally mis-formatted, as the necessary re-indentation
   would make the diff difficult to read. At this point I'd merely
   like to gather input towards possible better approaches to solve
   the issue (not the least because quite possibly there are further
   places needing changing).

--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -206,6 +206,7 @@ static int modify_bars(const struct pci_
   struct vpci_header *header = >vpci->header;
   struct rangeset *mem = rangeset_new(NULL, NULL, 0);
   struct pci_dev *tmp, *dev = NULL;
+const struct domain *d;
   const struct vpci_msix *msix = pdev->vpci->msix;
   unsigned int i;
   int rc;
@@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
* Check for overlaps with other BARs. Note that only BARs that are
* currently mapped (enabled) are checked for overlaps.
*/
-for_each_pdev ( pdev->domain, tmp )
+for ( d = pdev->domain; ; d = dom_xen ) {//todo

I am not quite sure this will be correct for the cases where pdev->domain != 
dom0,
e.g. in the series for PCI passthrough for Arm this can be any guest. For such 
cases
we'll force running the loop for dom_xen which I am not sure is desirable.

It is surely not desirable, but it also doesn't happen - see the
is_hardware_domain() check further down (keeping context below).

Right



Another question is why such a hidden device has its pdev->domain not set 
correctly,
so we need to work this around?

Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
("PCI: don't allow guest assignment of devices used by Xen")
introducing that temporary override. To permit limited
visibility to Dom0, these devices still need setting up in the
IOMMU for Dom0. Consequently BAR overlap detection also needs
to take these into account (i.e. the goal here is not just to
prevent triggering the ASSERT() in question).


So, why don't we set pdev->domain = dom_xen for such devices and call

modify_bars or something from pci_hide_device for instance (I didn't get too

much into implementation details though)? If pci_hide_device already handles

such exceptions, so it should also take care of the correct BAR overlaps etc.

Otherwise it looks like we put some unrelated logic into vpci which is for 
hiding

the devices (on x86).

Thank you,

Oleksandr



Jan


@@ -308,6 +311,7 @@ static int modify_bars(const struct pci_
   }
   }
   }
+if ( !is_hardware_domain(d) ) break; }//todo
   
   ASSERT(dev);
   







Re: [PATCH RFC] vPCI: account for hidden devices in modify_bars()

2021-08-31 Thread Jan Beulich
On 31.08.2021 07:35, Oleksandr Andrushchenko wrote:
> Hello, Jan!
> 
> On 30.08.21 16:04, Jan Beulich wrote:
>> Hidden devices (e.g. an add-in PCI serial card used for Xen's serial
>> console) are associated with DomXEN, not Dom0. This means that while
>> looking for overlapping BARs such devices cannot be found on Dom0's
>> list of devices; DomXEN's list also needs to be scanned.
>>
>> Signed-off-by: Jan Beulich 
>> ---
>> RFC: Patch intentionally mis-formatted, as the necessary re-indentation
>>   would make the diff difficult to read. At this point I'd merely
>>   like to gather input towards possible better approaches to solve
>>   the issue (not the least because quite possibly there are further
>>   places needing changing).
>>
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -206,6 +206,7 @@ static int modify_bars(const struct pci_
>>   struct vpci_header *header = >vpci->header;
>>   struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>>   struct pci_dev *tmp, *dev = NULL;
>> +const struct domain *d;
>>   const struct vpci_msix *msix = pdev->vpci->msix;
>>   unsigned int i;
>>   int rc;
>> @@ -265,7 +266,8 @@ static int modify_bars(const struct pci_
>>* Check for overlaps with other BARs. Note that only BARs that are
>>* currently mapped (enabled) are checked for overlaps.
>>*/
>> -for_each_pdev ( pdev->domain, tmp )
>> +for ( d = pdev->domain; ; d = dom_xen ) {//todo
> 
> I am not quite sure this will be correct for the cases where pdev->domain != 
> dom0,
> e.g. in the series for PCI passthrough for Arm this can be any guest. For 
> such cases
> we'll force running the loop for dom_xen which I am not sure is desirable.

It is surely not desirable, but it also doesn't happen - see the
is_hardware_domain() check further down (keeping context below).

> Another question is why such a hidden device has its pdev->domain not set 
> correctly,
> so we need to work this around?

Please see _setup_hwdom_pci_devices() and commit e46ea4d44dc0
("PCI: don't allow guest assignment of devices used by Xen")
introducing that temporary override. To permit limited
visibility to Dom0, these devices still need setting up in the
IOMMU for Dom0. Consequently BAR overlap detection also needs
to take these into account (i.e. the goal here is not just to
prevent triggering the ASSERT() in question).

Jan

>> @@ -308,6 +311,7 @@ static int modify_bars(const struct pci_
>>   }
>>   }
>>   }
>> +if ( !is_hardware_domain(d) ) break; }//todo
>>   
>>   ASSERT(dev);
>>   
>>
>>
> 




[libvirt test] 164641: regressions - FAIL

2021-08-31 Thread osstest service owner
flight 164641 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/164641/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  50809fcc86fcbf9e0bb0bc1154780c7556f43031
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  417 days
Failing since151818  2020-07-11 04:18:52 Z  416 days  406 attempts
Testing same since   164539  2021-08-28 11:46:04 Z2 days3 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  Dmytro Linkin 
  Eiichi Tsukata 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Ian Wienand 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jinsheng Zhang 
  Jiri Denemark 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Richard W.M. Jones 
  Ricky Tigg 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 
  Shi Lei 
  simmon 
  Simon Chopin 
  Simon Gaiser 
  Simon Rowe 
  Stefan Bader 
  Stefan Berger 
  Stefan Berger 
  Stefan Hajnoczi 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Tomáš Janoušek 
  Tuguoyi 
  Ville Skyttä 
  Vinayak Kale 
  Wang Xin 
  WangJian 
  Weblate 
  Wei Liu 
  Wei Liu 
  William Douglas 
  Yalei Li <274268...@qq.com>
  Yalei Li 
  Yang Fei 
  Yang Hang 
  Yanqiu Zhang 
  Yaroslav Kargin 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zbigniew Jędrzejewski-Szmek 
  zhangjl02 
  Zheng Chuan 
  zhenwei pi 
  Zhenyu Ye 
  Zhenyu Zheng 
  Zhenzhong Duan 

jobs:
 build-amd64-xsm  pass   

Re: Enabling hypervisor agnosticism for VirtIO backends

2021-08-31 Thread AKASHI Takahiro
Wei,

On Thu, Aug 26, 2021 at 12:10:19PM +, Wei Chen wrote:
> Hi Akashi,
> 
> > -Original Message-
> > From: AKASHI Takahiro 
> > Sent: 2021年8月26日 17:41
> > To: Wei Chen 
> > Cc: Oleksandr Tyshchenko ; Stefano Stabellini
> > ; Alex Benn??e ; Kaly Xin
> > ; Stratos Mailing List ;
> > virtio-...@lists.oasis-open.org; Arnd Bergmann ;
> > Viresh Kumar ; Stefano Stabellini
> > ; stefa...@redhat.com; Jan Kiszka
> > ; Carl van Schaik ;
> > prat...@quicinc.com; Srivatsa Vaddagiri ; Jean-
> > Philippe Brucker ; Mathieu Poirier
> > ; Oleksandr Tyshchenko
> > ; Bertrand Marquis
> > ; Artem Mygaiev ; Julien
> > Grall ; Juergen Gross ; Paul Durrant
> > ; Xen Devel 
> > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> >
> > Hi Wei,
> >
> > On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Aug 18, 2021 at 08:35:51AM +, Wei Chen wrote:
> > > > Hi Akashi,
> > > >
> > > > > -Original Message-
> > > > > From: AKASHI Takahiro 
> > > > > Sent: 2021年8月18日 13:39
> > > > > To: Wei Chen 
> > > > > Cc: Oleksandr Tyshchenko ; Stefano Stabellini
> > > > > ; Alex Benn??e ;
> > Stratos
> > > > > Mailing List ; virtio-
> > dev@lists.oasis-
> > > > > open.org; Arnd Bergmann ; Viresh Kumar
> > > > > ; Stefano Stabellini
> > > > > ; stefa...@redhat.com; Jan Kiszka
> > > > > ; Carl van Schaik
> > ;
> > > > > prat...@quicinc.com; Srivatsa Vaddagiri ;
> > Jean-
> > > > > Philippe Brucker ; Mathieu Poirier
> > > > > ; Oleksandr Tyshchenko
> > > > > ; Bertrand Marquis
> > > > > ; Artem Mygaiev ;
> > Julien
> > > > > Grall ; Juergen Gross ; Paul
> > Durrant
> > > > > ; Xen Devel 
> > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > > >
> > > > > On Tue, Aug 17, 2021 at 08:39:09AM +, Wei Chen wrote:
> > > > > > Hi Akashi,
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: AKASHI Takahiro 
> > > > > > > Sent: 2021年8月17日 16:08
> > > > > > > To: Wei Chen 
> > > > > > > Cc: Oleksandr Tyshchenko ; Stefano
> > Stabellini
> > > > > > > ; Alex Benn??e ;
> > > > > Stratos
> > > > > > > Mailing List ; virtio-
> > > > > dev@lists.oasis-
> > > > > > > open.org; Arnd Bergmann ; Viresh Kumar
> > > > > > > ; Stefano Stabellini
> > > > > > > ; stefa...@redhat.com; Jan Kiszka
> > > > > > > ; Carl van Schaik
> > ;
> > > > > > > prat...@quicinc.com; Srivatsa Vaddagiri ;
> > Jean-
> > > > > > > Philippe Brucker ; Mathieu Poirier
> > > > > > > ; Oleksandr Tyshchenko
> > > > > > > ; Bertrand Marquis
> > > > > > > ; Artem Mygaiev
> > ;
> > > > > Julien
> > > > > > > Grall ; Juergen Gross ; Paul
> > Durrant
> > > > > > > ; Xen Devel 
> > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > > > > >
> > > > > > > Hi Wei, Oleksandr,
> > > > > > >
> > > > > > > On Mon, Aug 16, 2021 at 10:04:03AM +, Wei Chen wrote:
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > Thanks for Stefano to link my kvmtool for Xen proposal here.
> > > > > > > > This proposal is still discussing in Xen and KVM communities.
> > > > > > > > The main work is to decouple the kvmtool from KVM and make
> > > > > > > > other hypervisors can reuse the virtual device implementations.
> > > > > > > >
> > > > > > > > In this case, we need to introduce an intermediate hypervisor
> > > > > > > > layer for VMM abstraction, Which is, I think it's very close
> > > > > > > > to stratos' virtio hypervisor agnosticism work.
> > > > > > >
> > > > > > > # My proposal[1] comes from my own idea and doesn't always
> > represent
> > > > > > > # Linaro's view on this subject nor reflect Alex's concerns.
> > > > > Nevertheless,
> > > > > > >
> > > > > > > Your idea and my proposal seem to share the same background.
> > > > > > > Both have the similar goal and currently start with, at first,
> > Xen
> > > > > > > and are based on kvm-tool. (Actually, my work is derived from
> > > > > > > EPAM's virtio-disk, which is also based on kvm-tool.)
> > > > > > >
> > > > > > > In particular, the abstraction of hypervisor interfaces has a
> > same
> > > > > > > set of interfaces (for your "struct vmm_impl" and my "RPC
> > interfaces").
> > > > > > > This is not co-incident as we both share the same origin as I
> > said
> > > > > above.
> > > > > > > And so we will also share the same issues. One of them is a way
> > of
> > > > > > > "sharing/mapping FE's memory". There is some trade-off between
> > > > > > > the portability and the performance impact.
> > > > > > > So we can discuss the topic here in this ML, too.
> > > > > > > (See Alex's original email, too).
> > > > > > >
> > > > > > Yes, I agree.
> > > > > >
> > > > > > > On the other hand, my approach aims to create a "single-binary"
> > > > > solution
> > > > > > > in which the same binary of BE vm could run on any hypervisors.
> > > > > > > Somehow similar to your "proposal-#2" in [2], but in my solution,
> > all
> > > > > > > the hypervisor-specific code would be put into another entity
> > (VM),
> > > > > > > named