[Xen-devel] [xen-unstable test] 131483: regressions - FAIL
flight 131483 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/131483/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 131423 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 131423 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 131423 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 131423 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 131423 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 131423 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 131423 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 131423 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 131423 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 131423 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass version targeted for testing: xen 7183e86a29c3fe15078eb0b8c11d3e556c22effa baseline version: xen 3fd3fda9c26fc3c4f77250f795ed7ff9d38e2ec6 Last test of basis 131423 2018-12-18 02:18:18 Z4 days Failing since131444 2018-12-19 13:39:27 Z2 days2 attempts Testing same since 131483 2018-12-21 04:26:02 Z1 days1 attempts People who touched revisions under test: Andrew Cooper Andrii Anisov Brian Woods Edgar E. Iglesias Ian Jackson Jan Beulich Julien Grall Matthew Daley Razvan Cojocaru Stefano Stabellini Stefano Stabellini jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-prev
[Xen-devel] [freebsd-master test] 131488: all pass - PUSHED
flight 131488 freebsd-master real [real] http://logs.test-lab.xenproject.org/osstest/logs/131488/ Perfect :-) All tests in this flight passed as required version targeted for testing: freebsd de7f82e0773386e0171af4a2fdcb2141fd1354cc baseline version: freebsd 204d7bf6cddd87478f9c1a6bb55f482d87cf2eaa Last test of basis 131440 2018-12-19 09:19:49 Z2 days Testing same since 131488 2018-12-21 09:20:53 Z0 days1 attempts People who touched revisions under test: bcran bde cem dab emaste imp markj mckusick mjg mw np rmacklem tuexen yuripv jobs: build-amd64-freebsd-againpass build-amd64-freebsd pass build-amd64-xen-freebsd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/freebsd.git 204d7bf6cdd..de7f82e0773 de7f82e0773386e0171af4a2fdcb2141fd1354cc -> tested/master ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-3.18 test] 131479: regressions - FAIL
flight 131479 linux-3.18 real [real] http://logs.test-lab.xenproject.org/osstest/logs/131479/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-examine 8 reboot fail REGR. vs. 128858 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-rumprun-i386 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-multivcpu 7 xen-bootfail REGR. vs. 128858 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 128858 test-amd64-i386-xl7 xen-boot fail REGR. vs. 128858 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 128858 test-amd64-amd64-xl-xsm 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-rumprun-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 128858 test-amd64-amd64-xl-pvshim7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-i386-pvgrub 7 xen-boot fail REGR. vs. 128858 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host fail REGR. vs. 128858 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host fail REGR. vs. 128858 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 128858 test-amd64-i386-freebsd10-amd64 7 xen-boot fail REGR. vs. 128858 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-boot fail REGR. vs. 128858 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-cubietruck 16 guest-start/debian.repeat fail in 131420 pass in 131479 test-armhf-armhf-xl-credit1 6 xen-install fail in 131442 pass in 131479 test-amd64-amd64-qemuu-nested-intel 7 xen-bootfail pass in 131420 test-amd64-i386-freebsd10-i386 7 xen-boot fail pass in 131420 test-armhf-armhf-xl-multivcpu 16 guest-start/debian.repeat fail pass in 131442 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-arndale 16 guest-start/debian.repeatfail like 128807 test-armhf-armhf-libvirt-raw 10 debian-di-installfail like 128841 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 128858 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 128858 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 128858 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 128858 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 128858 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 16 guest-localmigrate/x10 fail like 128858 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass test-amd64-amd64-xl-pvhv2-amd 12 guest-start fail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14
[Xen-devel] [ovmf test] 131490: regressions - FAIL
flight 131490 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/131490/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i3866 xen-buildfail REGR. vs. 129475 build-i386-xsm6 xen-buildfail REGR. vs. 129475 build-amd64-xsm 6 xen-buildfail REGR. vs. 129475 build-amd64 6 xen-buildfail REGR. vs. 129475 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a18f784cfdbe17855ec4376e80db927e1a81aaca baseline version: ovmf 5ae3184d8c59f7bbb84bad482df6b8020ba58188 Last test of basis 129475 2018-11-05 21:13:11 Z 46 days Failing since129526 2018-11-06 20:49:26 Z 45 days 163 attempts Testing same since 131490 2018-12-21 10:01:12 Z0 days1 attempts People who touched revisions under test: Achin Gupta Ard Biesheuvel Bob Feng bob.c.f...@intel.com BobCF Chasel Chiu Chasel, Chiu Chen A Chen Dandan Bi David Wei Derek Lin Eric Dong Feng, Bob C Fu Siyuan Gary Lin Hao Wu Jaben Carsey Jeff Brasen Jian J Wang Jiaxin Wu Jiewen Yao Laszlo Ersek Leif Lindholm Liming Gao Liu Yu Marc Zyngier Marcin Wojtas Mike Maslenkin Ming Huang Pedroa Liu Ruiyu Ni shenglei Shenglei Zhang Star Zeng Sughosh Ganu Sumit Garg Sun, Zailiang Thomas Abraham Ting Ye Tomasz Michalec Vijayenthiran Subramaniam Vladimir Olovyannikov Wang BinX A Wu Jiaxin Ye Ting Yonghong Zhu yuchenlin Zailiang Sun Zhang, Chao B Zhao, ZhiqiangX Zhiju.Fan zhijufan ZhiqiangX Zhao zwei4 jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 4306 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-linus test] 131475: regressions - FAIL
flight 131475 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/131475/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-rumprun-amd64 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-multivcpu 7 xen-bootfail REGR. vs. 125898 test-amd64-amd64-pygrub 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-libvirt-vhd 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-qemuu-win7-amd64 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-examine 8 reboot fail REGR. vs. 125898 test-amd64-amd64-xl-pvhv2-intel 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host fail REGR. vs. 125898 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host fail REGR. vs. 125898 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 125898 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 125898 test-amd64-i386-xl-qemut-win10-i386 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl-shadow 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-rumprun-i386 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl7 xen-boot fail REGR. vs. 125898 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 125898 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 125898 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 125898 test-amd64-i386-libvirt 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 125898 test-amd64-i386-libvirt-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 125898 test-amd64-amd64-libvirt-xsm 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-examine 8 reboot fail REGR. vs. 125898 test-amd64-i386-qemut-rhel6hvm-amd 12 guest-start/redhat.repeat fail REGR. vs. 125898 test-armhf-armhf-xl 7 xen-boot fail REGR. vs. 125898 test-amd64-amd64-qemuu-nested-intel 17 debian-hvm-install/l1/l2 fail REGR. vs. 125898 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 7 xen-boot fail REGR. vs. 125898 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-credit1 7 xen-bootfail baseline untested test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 125898 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 125898 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 125898 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 125898 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 125898 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 125898 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 125898 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass
[Xen-devel] [xen-unstable-smoke test] 131502: tolerable all pass - PUSHED
flight 131502 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/131502/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 9d357cbaf74f0c1dc85a16498dac6d819743ce38 baseline version: xen d3ce6380c1f71f9575115750e4ed707911529345 Last test of basis 131500 2018-12-21 18:00:50 Z0 days Testing same since 131502 2018-12-21 22:01:26 Z0 days1 attempts People who touched revisions under test: George Dunlap Ian Jackson Wei Liu jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git d3ce6380c1..9d357cbaf7 9d357cbaf74f0c1dc85a16498dac6d819743ce38 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 13/25] argo: implement the register op
On Fri, Dec 21, 2018 at 12:53 AM Jan Beulich wrote: > > >>> On 21.12.18 at 09:16, wrote: > > On Thu, Dec 20, 2018 at 11:28 PM Jan Beulich wrote: > >> > >> >>> On 21.12.18 at 02:25, wrote: > >> > On Thu, Dec 20, 2018 at 12:29 AM Jan Beulich wrote: > >> >> > >> >> >>> On 20.12.18 at 06:29, wrote: > >> >> > On Wed, Dec 12, 2018 at 1:48 AM Jan Beulich wrote: > >> >> >> > >> >> >> > +static int > >> >> >> > +argo_find_ring_mfns(struct domain *d, struct argo_ring_info > >> >> >> > *ring_info, > >> >> >> > +uint32_t npage, > >> >> >> > XEN_GUEST_HANDLE_PARAM(argo_pfn_t) pfn_hnd, > >> >> >> > +uint32_t len) > >> >> >> > +{ > >> >> >> > +int i; > >> >> >> > +int ret = 0; > >> >> >> > + > >> >> >> > +if ( (npage << PAGE_SHIFT) < len ) > >> >> >> > +return -EINVAL; > >> >> >> > + > >> >> >> > +if ( ring_info->mfns ) > >> >> >> > +{ > >> >> >> > +/* > >> >> >> > + * Ring already existed. Check if it's the same ring, > >> >> >> > + * i.e. same number of pages and all translated gpfns > >> >> >> > still > >> >> >> > + * translating to the same mfns > >> >> >> > + */ > >> >> >> > >> >> >> This comment makes me wonder whether the translations are > >> >> >> permitted to change at other times. If so I'm not sure what > >> >> >> value verification here has. If not, this probably would want to > >> >> >> be debugging-only code. > >> >> > > >> >> > My understanding is that the gfn->mfn translation is not necessarily > >> >> > stable > >> >> > across entry and exit from host power state S4, suspend to disk. > > Note this ^^^ (and see below). > > >> Now I'm afraid there's some confusion here: Originally you've > >> said "host". > >> > >> >> How would that be? It's not stable across guest migration (or > >> >> its non-live save/restore equivalent), > >> > > >> > Right, that's clear. > >> > > >> >> but how would things change across S3? > >> > > >> > I don't think that they do change in that case. > >> > > >> > From studying the code involved above, a related item: the guest runs > >> > the same > >> > suspend and resume kernel code before entering into/exiting from either > >> > guest > >> > S3 or S4, so the guest kernel resume code needs to re-register the > >> > rings, to > >> > cover the case where it is coming up in an environment where they were > >> > dropped > >> > - so that's what it does. > >> > > >> > This relates to the code section above: if guest entry to S3 is aborted > >> > at the > >> > final step (eg. error or platform refuses, eg. maybe a physical device > >> > interaction with passthrough) then the hypervisor has not torn down the > >> > rings, > >> > the guest remains running within the same domain, and the guest resume > >> > logic > >> > runs, which runs through re-registration for all its rings. The check in > >> > the > >> > logic above allows the existing ring mappings within the hypervisor to be > >> > preserved. > >> > >> Yet now you suddenly talk about guest S3. > > > > Well, the context is that you did just ask about S3, without > > specifying host or guest. > > I'm sorry to be picky, but no, I don't think I did. You did expicitly > say "host", making me further think only about that case. OK, apologies for the confusing direction of the reply. It was not intended to be so. > > That logic aims to make ring registration idempotent, to avoid the > > teardown of established mappings of the ring pages in the case where > > doing so isn't needed. > > You treat complexity in the kernel for complexity in the hypervisor. (s/treat/trade/ ?) OK, that is a fair concern, yes. > I'm not sure this is appropriate, as I can't judge how much more > difficult it would be for the guest to look after itself. But let's look > at both cases again: > - For guest S3, afaik, the domain doesn't change, and hence > memory assignment remains the same. No re-registration > necessary then afaict. > - For guest S4, aiui, the domain gets destroyed and a new one > built upon resume. Re-registration would be needed, but due > to the domain re-construction no leftovers ought to exist in > Xen. I agree. > Hence to me it would seem more natural to have the guest deal > with the situation, and have no extra logic for this in Xen. You > want the guest to re-register anyway, yet simply avoiding to > do so in the S3 case ought to be a single (or very few) > conditional(s), i.e. not a whole lot of complexity. OK. That looks doable. thanks. Christopher ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 5/5] pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
When a connection is closing we receive on pvcalls_sk_state_change notification. Instead of setting the connection as closed immediately (-ENOTCONN), let's read one more time from it: pvcalls_conn_back_read will set the connection as closed when necessary. That way, we avoid races between pvcalls_sk_state_change and pvcalls_back_ioworker. Signed-off-by: Stefano Stabellini --- drivers/xen/pvcalls-back.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 2e5d845..71b6287 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -160,9 +160,10 @@ static void pvcalls_conn_back_read(void *opaque) /* write the data, then modify the indexes */ virt_wmb(); - if (ret < 0) + if (ret < 0) { + atomic_set(>read, 0); intf->in_error = ret; - else + } else intf->in_prod = prod + ret; /* update the indexes, then notify the other end */ virt_wmb(); @@ -288,7 +289,7 @@ static void pvcalls_sk_state_change(struct sock *sock) return; intf = map->ring; - intf->in_error = -ENOTCONN; + atomic_inc(>read); notify_remote_via_irq(map->irq); } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/5] pvcalls-front: don't return error when the ring is full
When the ring is full, size == array_size. It is not an error condition, so simply return 0 instead of an error. Signed-off-by: Stefano Stabellini --- drivers/xen/pvcalls-front.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 0158858..1a893a1 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -475,8 +475,10 @@ static int __write_ring(struct pvcalls_data_intf *intf, virt_mb(); size = pvcalls_queued(prod, cons, array_size); - if (size >= array_size) + if (size > array_size) return -EINVAL; + if (size == array_size) + return 0; if (len > array_size - size) len = array_size - size; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 3/5] pvcalls-front: properly allocate sk
Don't use kzalloc: it ends up leaving sk->sk_prot not properly initialized. Use sk_alloc instead and define our own trivial struct proto. Signed-off-by: Stefano Stabellini --- drivers/xen/pvcalls-front.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 4f3d664..0158858 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -31,6 +31,12 @@ #define PVCALLS_NR_RSP_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) #define PVCALLS_FRONT_MAX_SPIN 5000 +static struct proto pvcalls_proto = { + .name = "PVCalls", + .owner = THIS_MODULE, + .obj_size = sizeof(struct sock), +}; + struct pvcalls_bedata { struct xen_pvcalls_front_ring ring; grant_ref_t ref; @@ -837,7 +843,7 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) received: map2->sock = newsock; - newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL); + newsock->sk = sk_alloc(sock_net(sock->sk), PF_INET, GFP_KERNEL, _proto, false); if (!newsock->sk) { bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID; map->passive.inflight_req_id = PVCALLS_INVALID_ID; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/5] pvcalls fixes
Hi all, This series is a collection of small pvcalls fixes. Cheers, Stefano Stefano Stabellini (5): pvcalls-front: read all data before closing the connection pvcalls-front: don't try to free unallocated rings pvcalls-front: properly allocate sk pvcalls-front: don't return error when the ring is full pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read drivers/xen/pvcalls-back.c | 7 --- drivers/xen/pvcalls-front.c | 20 +--- 2 files changed, 17 insertions(+), 10 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/5] pvcalls-front: read all data before closing the connection
When a connection is closing in_error is set to ENOTCONN. There could still be outstanding data on the ring left by the backend. Before closing the connection on the frontend side, drain the ring. Signed-off-by: Stefano Stabellini --- drivers/xen/pvcalls-front.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 77224d8..e5d95aa 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -560,15 +560,13 @@ static int __read_ring(struct pvcalls_data_intf *intf, error = intf->in_error; /* get pointers before reading from the ring */ virt_rmb(); - if (error < 0) - return error; size = pvcalls_queued(prod, cons, array_size); masked_prod = pvcalls_mask(prod, array_size); masked_cons = pvcalls_mask(cons, array_size); if (size == 0) - return 0; + return error ?: size; if (len > size) len = size; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 13/25] argo: implement the register op
On Thu, Dec 20, 2018 at 4:52 AM Roger Pau Monné wrote: > > On Wed, Dec 19, 2018 at 09:41:59PM -0800, Christopher Clark wrote: > > On Wed, Dec 12, 2018 at 8:48 AM Roger Pau Monné > > wrote: > > > > > > On Fri, Nov 30, 2018 at 05:32:52PM -0800, Christopher Clark wrote: > > > > +static inline uint16_t > > > > +argo_hash_fn(const struct argo_ring_id *id) > > > > > > No need for the argo_ prefix for static functions, this is already an > > > argo specific file. > > > > Although the compiler could live without the prefix, I'm finding it helpful > > to > > very easily determine that functions being used are not defined elsewhere > > within Xen; so I've left the prefix as is for version two of this series. > > Why do you care whether they are defined elsewhere in Xen? The scope > of static functions is limited to the translation unit anyway. ok, I'll remove the prefixes - you're right that I shouldn't care whether they are defined elsewhere in Xen, and Jan's points about the string table expansion and serial line bandwidth are true - I had not considered those. Would adding a note to describe this reasoning to the CODING_STYLE document be welcome? > > > > +#else > > > > +*mfn = p2m_lookup(d, _gfn(pfn), ); > > > > +#endif > > > > + > > > > +if ( !mfn_valid(*mfn) ) > > > > +ret = -EINVAL; > > > > +#ifdef CONFIG_X86 > > > > +else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) ) > > > > +ret = -EAGAIN; > > > > +#endif > > > > +else if ( (p2mt != p2m_ram_rw) || > > > > + !get_page_and_type(mfn_to_page(*mfn), d, > > > > PGT_writable_page) ) > > > > +ret = -EINVAL; > > > > + > > > > +#ifdef CONFIG_X86 > > > > +put_gfn(d, pfn); > > > > > > If you do this put_gfn here, by the time you check that the gfn -> mfn > > > matches your expectations the guest might have somehow changed the gfn > > > -> mfn mapping already (for example by ballooning down memory?) > > > > If the guest does that, I think it only harms itself. If for some reason > > a memory access is denied, then the op would just fail. I don't think > > there's a more serious consequence to be worried about. > > Then I wonder why you need such check in any case if the code can > handle such cases, the more than the check itself is racy. OK, so at the root of the question here is: does it matter what the p2m type of the memory is at these points: 1) when the gfn is translated to mfn, at the time of ring registration 2) when the hypervisor writes into guest memory: - where the tx_ptr index is initialized in the register op - where ringbuf data is written in sendv - where ring description data is written in notify or is having PGT_writable_page type and ownership by the domain sufficient? For 1), I think there's some use in saying no to a guest that has supplied a region that appears misconfigured. For 2), input would be appreciated. It currently works under the assumption that a p2m type check is unnecessary, which is why the put_gfn is where it is. For further background context, here's my understanding of this section: When the guest invokes the hypercall operation to register a ring, it identifies the memory that it owns and wants the hypervisor to use by supplying an array of gfns (or in v2, addresses which are shifted to extract their gfns). The hypervisor translates from gfns to mfns, using the translation that exists at that time, and then refers to that memory internally by mfn from there on out. This find_ring_mfn function is where the gfn->mfn translation happens. (The variable name does needs renaming from pfn, as you noted - thanks.) To do the translation from gfn to mfn, (on x86) it's using get_gfn_unshare. That's doing three things: * returns the mfn, if there is one. * returns the p2m type of that memory. * acquires a reference to that gfn, which needs to be dropped at some point. The p2m type type check on the gfn in find_ring_mfn at that time is possibly conservative, rejecting more types than perhaps it needs to, but the type that it accepts (p2m_ram_rw) is sane. It is a validation of the p2m type at that instant, intended to detect if the guest has supplied memory to the ring register op that does not make sense for it to use as a ring, as indicated by the current p2m type, and if so, fail early, or indicate that a retry later is needed. Then the get_page_and_type call is where the memory identified by the mfn that was just obtained, gets locked to PGT_writable_page type, and ownership fixed to its current owner domain, by adding to its reference count. Then the gfn reference count is dropped with the put_gfn call. This means that the guest can elect to change the p2m type afterwards, if it wants; (any change needs to be consistent with its domain ownership and PGT_writable_page type though -- not sure if that constrains possible types). That memory can have guest-supplied data written into it, either by the domain owning the page itself, or in
[Xen-devel] [PATCH 2/5] pvcalls-front: don't try to free unallocated rings
inflight_req_id is 0 when initialized. If inflight_req_id is 0, there is no accept_map to free. Fix the check in pvcalls_front_release. Signed-off-by: Stefano Stabellini --- drivers/xen/pvcalls-front.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index e5d95aa..4f3d664 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -1030,8 +1030,8 @@ int pvcalls_front_release(struct socket *sock) spin_lock(>socket_lock); list_del(>list); spin_unlock(>socket_lock); - if (READ_ONCE(map->passive.inflight_req_id) != - PVCALLS_INVALID_ID) { + if (READ_ONCE(map->passive.inflight_req_id) != PVCALLS_INVALID_ID && + READ_ONCE(map->passive.inflight_req_id) != 0) { pvcalls_front_free_map(bedata, map->passive.accept_map); } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 131500: tolerable all pass - PUSHED
flight 131500 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/131500/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen d3ce6380c1f71f9575115750e4ed707911529345 baseline version: xen af80e1d9c79e3bcb392775f311d20eec54b3389b Last test of basis 131491 2018-12-21 11:01:16 Z0 days Testing same since 131500 2018-12-21 18:00:50 Z0 days1 attempts People who touched revisions under test: Benjamin Sanda Julien Grall Stefano Stabellini Wei Liu jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git af80e1d9c7..d3ce6380c1 d3ce6380c1f71f9575115750e4ed707911529345 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC v2 3/4] pvh: Add x86/HVM direct boot ABI header file
From: Liam Merwick The x86/HVM direct boot ABI permits Qemu to be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. https://xenbits.xen.org/docs/unstable/misc/pvh.html This commit adds the header file that defines the start_info struct that needs to be populated in order to use this ABI. The canonical version of start_info.h is in the Xen codebase. (like QEMU, the Linux kernel uses a copy as well). Signed-off-by: Liam Merwick Reviewed-by: Konrad Rzeszutek Wilk --- include/hw/xen/start_info.h | 146 1 file changed, 146 insertions(+) create mode 100644 include/hw/xen/start_info.h diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h new file mode 100644 index ..348779eb10cd --- /dev/null +++ b/include/hw/xen/start_info.h @@ -0,0 +1,146 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2016, Citrix Systems, Inc. + */ + +#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ +#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ + +/* + * Start of day structure passed to PVH guests and to HVM guests in %ebx. + * + * NOTE: nothing will be loaded at physical address 0, so a 0 value in any + * of the address fields should be treated as not present. + * + * 0 ++ + *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE + *|| ("xEn3" with the 0x80 bit of the "E" set). + * 4 ++ + *| version| Version of this structure. Current version is 1. New + *|| versions are guaranteed to be backwards-compatible. + * 8 ++ + *| flags | SIF_xxx flags. + * 12 ++ + *| nr_modules | Number of modules passed to the kernel. + * 16 ++ + *| modlist_paddr | Physical address of an array of modules + *|| (layout of the structure below). + * 24 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 32 ++ + *| rsdp_paddr | Physical address of the RSDP ACPI data structure. + * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + *|| Zero if there is no memory map being provided. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ + * + * The layout of each entry in the module structure is the following: + * + * 0 ++ + *| paddr | Physical address of the module. + * 8 ++ + *| size | Size of the module in bytes. + * 16 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 24 ++ + *| reserved | + * 32 ++ + * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. E820_TYPE_xxx, for example. + * 20 +| + *| reserved | + * 24 ++ + * + * The address and sizes are always a 64bit little endian unsigned integer. + * + * NB: Xen on x86 will always try to place all the data below the 4GiB + * boundary. + * + * Version numbers of the hvm_start_info structure have evolved
[Xen-devel] [RFC v2 4/4] pvh: Boot uncompressed kernel using direct boot ABI
These changes (along with corresponding Linux kernel and qboot changes) enable a guest to be booted using the x86/HVM direct boot ABI. This commit adds a load_elfboot() routine to pass the size and location of the kernel entry point to qboot (which will fill in the start_info struct information needed to to boot the guest). Having loaded the ELF binary, load_linux() will run qboot which continues the boot. The address for the kernel entry point is read from an ELF Note in the uncompressed kernel binary by a helper routine passed to load_elf(). Co-developed-by: George Kennedy Signed-off-by: George Kennedy Signed-off-by: Liam Merwick --- hw/i386/pc.c | 136 +- include/elf.h | 10 + 2 files changed, 145 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 115bc2825ce4..6d44a14da44d 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -54,6 +54,7 @@ #include "sysemu/qtest.h" #include "kvm_i386.h" #include "hw/xen/xen.h" +#include "hw/xen/start_info.h" #include "ui/qemu-spice.h" #include "exec/memory.h" #include "exec/address-spaces.h" @@ -109,6 +110,9 @@ static struct e820_entry *e820_table; static unsigned e820_entries; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +/* Physical Address of PVH entry point read from kernel ELF NOTE */ +static size_t pvh_start_addr; + void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -834,6 +838,109 @@ struct setup_data { uint8_t data[0]; } __attribute__((packed)); + +/* + * The entry point into the kernel for PVH boot is different from + * the native entry point. The PVH entry is defined by the x86/HVM + * direct boot ABI and is available in an ELFNOTE in the kernel binary. + * + * This function is passed to load_elf() when it is called from + * load_elfboot() which then additionally checks for an ELF Note of + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to + * parse the PVH entry address from the ELF Note. + * + * Due to trickery in elf_opts.h, load_elf() is actually available as + * load_elf32() or load_elf64() and this routine needs to be able + * to deal with being called as 32 or 64 bit. + * + * The address of the PVH entry point is saved to the 'pvh_start_addr' + * global variable. (although the entry point is 32-bit, the kernel + * binary can be either 32-bit or 64-bit). + */ +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64) +{ +size_t *elf_note_data_addr; + +/* Check if ELF Note header passed in is valid */ +if (arg1 == NULL) { +return 0; +} + +if (is64) { +struct elf64_note *nhdr64 = (struct elf64_note *)arg1; +uint64_t nhdr_size64 = sizeof(struct elf64_note); +uint64_t phdr_align = *(uint64_t *)arg2; +uint64_t nhdr_namesz = nhdr64->n_namesz; + +elf_note_data_addr = +((void *)nhdr64) + nhdr_size64 + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align); +} else { +struct elf32_note *nhdr32 = (struct elf32_note *)arg1; +uint32_t nhdr_size32 = sizeof(struct elf32_note); +uint32_t phdr_align = *(uint32_t *)arg2; +uint32_t nhdr_namesz = nhdr32->n_namesz; + +elf_note_data_addr = +((void *)nhdr32) + nhdr_size32 + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align); +} + +pvh_start_addr = *elf_note_data_addr; + +return pvh_start_addr; +} + +static bool load_elfboot(const char *kernel_filename, + int kernel_file_size, + uint8_t *header, + size_t pvh_xen_start_addr, + FWCfgState *fw_cfg) +{ +uint32_t flags = 0; +uint32_t mh_load_addr = 0; +uint32_t elf_kernel_size = 0; +uint64_t elf_entry; +uint64_t elf_low, elf_high; +int kernel_size; + +if (ldl_p(header) != 0x464c457f) { +return false; /* no elfboot */ +} + +bool elf_is64 = header[EI_CLASS] == ELFCLASS64; +flags = elf_is64 ? +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags; + +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */ +error_report("elfboot unsupported flags = %x", flags); +exit(1); +} + +uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY; +kernel_size = load_elf(kernel_filename, read_pvh_start_addr, + NULL, _note_type, _entry, + _low, _high, 0, I386_ELF_MACHINE, + 0, 0); + +if (kernel_size < 0) { +error_report("Error while loading elf kernel"); +exit(1); +} +mh_load_addr = elf_low; +elf_kernel_size = elf_high - elf_low; + +if (pvh_start_addr == 0) { +error_report("Error loading uncompressed kernel without PVH ELF Note"); +exit(1); +} +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr); +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr); +
[Xen-devel] [RFC v2 2/4] elf-ops.h: Add get_elf_note_type()
Introduce a routine which, given a pointer to a range of ELF Notes, searches through them looking for a note matching the type specified and returns a pointer to the matching ELF note. Signed-off-by: Liam Merwick --- include/hw/elf_ops.h | 50 ++ 1 file changed, 50 insertions(+) diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h index 37d20a3800c1..ffbdfbe9c2d8 100644 --- a/include/hw/elf_ops.h +++ b/include/hw/elf_ops.h @@ -265,6 +265,49 @@ fail: return ret; } +/* Given 'nhdr', a pointer to a range of ELF Notes, search through them + * for a note matching type 'elf_note_type' and return a pointer to + * the matching ELF note. + */ +static struct elf_note *glue(get_elf_note_type, SZ)(struct elf_note *nhdr, +elf_word note_size, +elf_word phdr_align, +elf_word elf_note_type) +{ +elf_word nhdr_size = sizeof(struct elf_note); +elf_word elf_note_entry_offset = 0; +elf_word note_type; +elf_word nhdr_namesz; +elf_word nhdr_descsz; + +if (nhdr == NULL) { +return NULL; +} + +note_type = nhdr->n_type; +while (note_type != elf_note_type) { +nhdr_namesz = nhdr->n_namesz; +nhdr_descsz = nhdr->n_descsz; + +elf_note_entry_offset = nhdr_size + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align) + +QEMU_ALIGN_UP(nhdr_descsz, phdr_align); + +/* If the offset calculated in this iteration exceeds the +* supplied size, we are done and no matching note was found. +*/ +if (elf_note_entry_offset > note_size) { +return NULL; +} + +/* skip to the next ELF Note entry */ +nhdr = (void *)nhdr + elf_note_entry_offset; +note_type = nhdr->n_type; +} + +return nhdr; +} + static int glue(load_elf, SZ)(const char *name, int fd, uint64_t (*elf_note_fn)(void *, void *, bool), uint64_t (*translate_fn)(void *, uint64_t), @@ -512,6 +555,13 @@ static int glue(load_elf, SZ)(const char *name, int fd, } } + /* Search the ELF notes to find one with a type matching the +* value passed in via 'translate_opaque' +*/ +nhdr = (struct elf_note *)data; + assert(translate_opaque != NULL); +nhdr = glue(get_elf_note_type, SZ)(nhdr, file_size, ph->p_align, + *(uint64_t *)translate_opaque); if (nhdr != NULL) { bool is64 = sizeof(struct elf_note) == sizeof(struct elf64_note); -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
On 11/12/2018 14:57, Liam Merwick wrote: On 11/12/2018 14:01, Stefan Hajnoczi wrote: On Wed, Dec 05, 2018 at 10:37:24PM +, Liam Merwick wrote: From: Liam Merwick The x86/HVM direct boot ABI permits Qemu to be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. https://xenbits.xen.org/docs/unstable/misc/pvh.html This commit adds the header file that defines the start_info struct that needs to be populated in order to use this ABI. Signed-off-by: Maran Wilson Signed-off-by: Liam Merwick Reviewed-by: Konrad Rzeszutek Wilk --- include/hw/xen/start_info.h | 146 1 file changed, 146 insertions(+) create mode 100644 include/hw/xen/start_info.h Does it make sense to bring in Linux include/xen/interface/hvm/start_info.h via QEMU's include/standard-headers/? QEMU has a script in scripts/update-linux-header.sh for syncing Linux headers into include/standard-headers/. This makes it easy to keep Linux header files up-to-date. We basically treat files in include/standard-headers/ as auto-generated. If you define start_info.h yourself without using include/standard-headers/, then it won't be synced with Linux. That does seem better. I will make that change. When attempting to implement this, I found the canonical copy of this header file is actually in Xen and the Linux copy is kept in sync with that. Also, 'make headers_install' doesn't install those Xen headers. Instead I updated the commit comment to mention the canonical copy location. This file isn't expected to change much so I think keeping it in sync in future shouldn't be onerous. Regards, Liam ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC v2 0/4] QEMU changes to do PVH boot
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, QEMU should be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html Details on the Linux changes (v9 staged for 4.21): https://lkml.org/lkml/2018/12/14/1330 qboot pull request: https://github.com/bonzini/qboot/pull/17 This patch series provides QEMU support to read the ELF header of an uncompressed kernel binary and get the 32-bit PVH kernel entry point from an ELF Note. In load_linux() a call is made to load_elfboot() so see if the header matches that of an uncompressed kernel binary (ELF) and if so, loads the binary and determines the kernel entry address from an ELF Note in the binary. Then qboot does futher initialisation of the guest (e820, etc.) and jumps to the kernel entry address and boots the guest. changes v1 -> v2 - Based on feedback from Stefan Hajnoczi - The reading of the PVH entry point is now done in a single pass during elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2 and considerably reworked. - Patch1 adds a new optional function pointer to parse the ELF note type (the type is passed in via the existing translate_opaque arg - the function already had 11 args so I didn't want to add more than one new arg). - Patch2 adds a function to elf_ops.h to find an ELF note matching a specific type - Patch3 just has a line added to the commit message to state that the Xen repo is the canonical location - Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1 just minor load_elfboot() changes and the addition of a read_pvh_start_addr() helper function for load_elf() Usіng the method/scripts documented by the NEMU team at https://github.com/intel/nemu/wiki/Measuring-Boot-Latency https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html below are some timings measured (vmlinux and bzImage from the same build) Time to get to kernel start is almost halved (95ṁs -> 48ms) QEMU + qboot + vmlinux (PVH + 4.20-rc4) qemu_init_end: 41.550521 fw_start: 41.667139 (+0.116618) fw_do_boot: 47.448495 (+5.781356) linux_startup_64: 47.720785 (+0.27229) linux_start_kernel: 48.399541 (+0.678756) linux_start_user: 296.952056 (+248.552515) QEMU + qboot + bzImage: qemu_init_end: 29.209276 fw_start: 29.317342 (+0.108066) linux_start_boot: 36.679362 (+7.36202) linux_startup_64: 94.531349 (+57.851987) linux_start_kernel: 94.900913 (+0.369564) linux_start_user: 401.060971 (+306.160058) QEMU + bzImage: qemu_init_end: 30.424430 linux_startup_64: 893.770334 (+863.345904) linux_start_kernel: 894.17049 (+0.400156) linux_start_user: 1208.679768 (+314.509278) Liam Merwick (4): elf: Add optional function ptr to load_elf() to parse ELF notes elf-ops.h: Add get_elf_note_type() pvh: Add x86/HVM direct boot ABI header file pvh: Boot uncompressed kernel using direct boot ABI hw/alpha/dp264.c | 4 +- hw/arm/armv7m.c| 3 +- hw/arm/boot.c | 2 +- hw/core/generic-loader.c | 2 +- hw/core/loader.c | 24 --- hw/cris/boot.c | 3 +- hw/hppa/machine.c | 6 +- hw/i386/multiboot.c| 2 +- hw/i386/pc.c | 131 +++- hw/lm32/lm32_boards.c | 6 +- hw/lm32/milkymist.c| 3 +- hw/m68k/an5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/microblaze/boot.c | 7 +- hw/mips/mips_fulong2e.c| 5 +- hw/mips/mips_malta.c | 5 +- hw/mips/mips_mipssim.c | 5 +- hw/mips/mips_r4k.c | 5 +- hw/moxie/moxiesim.c| 2 +- hw/nios2/boot.c| 7 +- hw/openrisc/openrisc_sim.c | 2 +- hw/pci-host/prep.c | 2 +- hw/ppc/e500.c | 3 +- hw/ppc/mac_newworld.c | 5 +- hw/ppc/mac_oldworld.c | 5 +- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/sam460ex.c | 3 +- hw/ppc/spapr.c | 7 +- hw/ppc/virtex_ml507.c | 2 +- hw/riscv/sifive_e.c| 2 +- hw/riscv/sifive_u.c| 2 +- hw/riscv/spike.c | 2 +- hw/riscv/virt.c| 2 +- hw/s390x/ipl.c | 9 ++- hw/sparc/leon3.c | 3 +- hw/sparc/sun4m.c | 6 +- hw/sparc64/sun4u.c | 4 +- hw/tricore/tricore_testboard.c | 2 +- hw/xtensa/sim.c| 12 ++-- hw/xtensa/xtfpga.c | 2 +- include/elf.h | 10 +++ include/hw/elf_ops.h | 72 include/hw/loader.h|
Re: [Xen-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
Thanks Stefan for the review - comments inline. On 11/12/2018 14:17, Stefan Hajnoczi wrote: On Wed, Dec 05, 2018 at 10:37:25PM +, Liam Merwick wrote: From: Liam Merwick Add support to read the PVH Entry address from an ELF note in the uncompressed kernel binary (as defined by the x86/HVM direct boot ABI). This 32-bit entry point will be used by QEMU to load the kernel in the guest and jump into the kernel entry point. For now, a call to this function is added in pc_memory_init() to read the address - a future patch will use the entry point. Signed-off-by: Liam Merwick --- hw/i386/pc.c | 272 +- include/elf.h | 10 +++ 2 files changed, 281 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index f095725dbab2..056aa46d99b9 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -109,6 +109,9 @@ static struct e820_entry *e820_table; static unsigned e820_entries; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +/* Physical Address of PVH entry point read from kernel ELF NOTE */ +static size_t pvh_start_addr; + void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -834,6 +837,267 @@ struct setup_data { uint8_t data[0]; } __attribute__((packed)); +/* + * Search through the ELF Notes for an entry with the given + * ELF Note type + */ +static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64, +size_t elf_note_type) Generic ELF code. Can you put it in hw/core/loader.c? I've added a modified/slimmed down version to include/hw/elf_ops.h (which now handles 32 and 64 bit as you mention below). I've put this in a separate commit. +{ +void *nhdr = NULL; +size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr); +size_t elf_note_entry_sz = 0; +size_t phdr_off; +size_t phdr_align; +size_t phdr_memsz; +size_t nhdr_namesz; +size_t nhdr_descsz; +size_t note_type; The macro tricks used by hw/core/loader.c are nasty, but I think they get the types right. Here the Elf64 on 32-bit host case is definitely broken due to using size_t. Perhaps 64-on-32 isn't supported, but getting the types right is worth discussing. + +phdr_off = elf_is64 ? +((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset; +phdr_align = elf_is64 ? +((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align; +phdr_memsz = elf_is64 ? +((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz; + +nhdr = ehdr + phdr_off; The ELF file is untrusted. All inputs must be validated. phdr_off could be an bogus/malicious value. Most of the parsing of the ELF binary goes away due to moving to parse during elf_load() - more info below. +note_type = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type; +nhdr_namesz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz; +nhdr_descsz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz; + +while (note_type != elf_note_type) { +elf_note_entry_sz = nhdr_size + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align) + +QEMU_ALIGN_UP(nhdr_descsz, phdr_align); + +/* + * Verify that we haven't exceeded the end of the ELF Note section. + * If we have, then there is no note of the given type present + * in the ELF Notes. + */ +if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) { +error_report("Note type (0x%lx) not found in ELF Note section", +elf_note_type); +return NULL; +} + +/* skip to the next ELF Note entry */ +nhdr += elf_note_entry_sz; +note_type = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type; +nhdr_namesz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz; +nhdr_descsz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz; +} + +return nhdr; +} + +/* + * The entry point into the kernel for PVH boot is different from + * the native entry point. The PVH entry is defined by the x86/HVM + * direct boot ABI and is available in an ELFNOTE in the kernel binary. + * This function reads the ELF headers of the binary specified on the + * command line by -kernel (path contained in 'filename') and discovers + * the PVH entry address from the appropriate ELF Note. + * + * The address of the PVH entry point is saved to the 'pvh_start_addr' + * global variable. The ELF class of the binary is returned via 'elfclass' + * (although the entry point is 32-bit, the kernel binary can be either + * 32-bit or 64-bit). + */ +static bool read_pvh_start_addr_elf_note(const char *filename, +unsigned char *elfclass) +{ Can this be integrated into ELF loading?
[Xen-devel] [RFC v2 1/4] elf: Add optional function ptr to load_elf() to parse ELF notes
This patch adds an optional function pointer, 'elf_note_fn', to load_elf() which causes load_elf() to additionally parse any ELF program headers of type PT_NOTE and check to see if the ELF Note is of the type specified by the 'translate_opaque' arg. If a matching ELF Note is found then the specfied function pointer is called to process the ELF note. Passing a NULL function pointer results in ELF Notes being skipped. The first consumer of this functionality is the PVHboot support which needs to read the XEN_ELFNOTE_PHYS32_ENTRY ELF Note while loading the uncompressed kernel binary in order to discover the boot entry address for the x86/HVM direct boot ABI. Signed-off-by: Liam Merwick --- hw/alpha/dp264.c | 4 ++-- hw/arm/armv7m.c| 3 ++- hw/arm/boot.c | 2 +- hw/core/generic-loader.c | 2 +- hw/core/loader.c | 24 hw/cris/boot.c | 3 ++- hw/hppa/machine.c | 6 +++--- hw/i386/multiboot.c| 2 +- hw/lm32/lm32_boards.c | 6 -- hw/lm32/milkymist.c| 3 ++- hw/m68k/an5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/microblaze/boot.c | 7 --- hw/mips/mips_fulong2e.c| 5 +++-- hw/mips/mips_malta.c | 5 +++-- hw/mips/mips_mipssim.c | 5 +++-- hw/mips/mips_r4k.c | 5 +++-- hw/moxie/moxiesim.c| 2 +- hw/nios2/boot.c| 7 --- hw/openrisc/openrisc_sim.c | 2 +- hw/pci-host/prep.c | 2 +- hw/ppc/e500.c | 3 ++- hw/ppc/mac_newworld.c | 5 +++-- hw/ppc/mac_oldworld.c | 5 +++-- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/sam460ex.c | 3 ++- hw/ppc/spapr.c | 7 --- hw/ppc/virtex_ml507.c | 2 +- hw/riscv/sifive_e.c| 2 +- hw/riscv/sifive_u.c| 2 +- hw/riscv/spike.c | 2 +- hw/riscv/virt.c| 2 +- hw/s390x/ipl.c | 9 ++--- hw/sparc/leon3.c | 3 ++- hw/sparc/sun4m.c | 6 -- hw/sparc64/sun4u.c | 4 ++-- hw/tricore/tricore_testboard.c | 2 +- hw/xtensa/sim.c| 12 hw/xtensa/xtfpga.c | 2 +- include/hw/elf_ops.h | 23 +++ include/hw/loader.h| 9 - 41 files changed, 134 insertions(+), 70 deletions(-) diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c index dd62f2a4050c..0347eb897c8a 100644 --- a/hw/alpha/dp264.c +++ b/hw/alpha/dp264.c @@ -114,7 +114,7 @@ static void clipper_init(MachineState *machine) error_report("no palcode provided"); exit(1); } -size = load_elf(palcode_filename, cpu_alpha_superpage_to_phys, +size = load_elf(palcode_filename, NULL, cpu_alpha_superpage_to_phys, NULL, _entry, _low, _high, 0, EM_ALPHA, 0, 0); if (size < 0) { @@ -133,7 +133,7 @@ static void clipper_init(MachineState *machine) if (kernel_filename) { uint64_t param_offset; -size = load_elf(kernel_filename, cpu_alpha_superpage_to_phys, +size = load_elf(kernel_filename, NULL, cpu_alpha_superpage_to_phys, NULL, _entry, _low, _high, 0, EM_ALPHA, 0, 0); if (size < 0) { diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c index 4bf9131b81e4..a4d528537eb4 100644 --- a/hw/arm/armv7m.c +++ b/hw/arm/armv7m.c @@ -298,7 +298,8 @@ void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size) as = cpu_get_address_space(cs, asidx); if (kernel_filename) { -image_size = load_elf_as(kernel_filename, NULL, NULL, , , +image_size = load_elf_as(kernel_filename, NULL, NULL, NULL, + , , NULL, big_endian, EM_ARM, 1, 0, as); if (image_size < 0) { image_size = load_image_targphys_as(kernel_filename, 0, diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 94fce128028c..2b59379be6af 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -884,7 +884,7 @@ static int64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry, } } -ret = load_elf_as(info->kernel_filename, NULL, NULL, +ret = load_elf_as(info->kernel_filename, NULL, NULL, NULL, pentry, lowaddr, highaddr, big_endian, elf_machine, 1, data_swab, as); if (ret <= 0) { diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c index fbae05fb3b64..3695dd439cd0 100644 --- a/hw/core/generic-loader.c +++ b/hw/core/generic-loader.c @@ -136,7 +136,7 @@ static void generic_loader_realize(DeviceState *dev, Error **errp) AddressSpace *as = s->cpu ? s->cpu->as : NULL; if (!s->force_raw) { -size = load_elf_as(s->file, NULL, NULL, , NULL,
[Xen-devel] [PATCH for-4.12 V13 1/5] x86/p2m: allocate logdirty_ranges for altp2ms
For now, only do allocation/deallocation; keeping them in sync will be done in subsequent patches. Logdirty synchronization will only be done for active altp2ms; so allocate logdirty rangesets (copying the host logdirty rangeset) when an altp2m is activated, and free it when deactivated. Signed-off-by: Razvan Cojocaru Tested-by: Tamas K Lengyel Reviewed-by: George Dunlap --- CC: George Dunlap CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" --- Changes since V12: - Added George's Reviewed-by. --- xen/arch/x86/mm/p2m.c | 46 +++--- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index fea4497..96a6d3e 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2265,6 +2265,40 @@ void p2m_flush_altp2m(struct domain *d) altp2m_list_unlock(d); } +static int p2m_activate_altp2m(struct domain *d, unsigned int idx) +{ +struct p2m_domain *hostp2m, *p2m; +int rc; + +ASSERT(idx < MAX_ALTP2M); + +p2m = d->arch.altp2m_p2m[idx]; +hostp2m = p2m_get_hostp2m(d); + +p2m_lock(p2m); + +rc = p2m_init_logdirty(p2m); + +if ( rc ) +goto out; + +/* The following is really just a rangeset copy. */ +rc = rangeset_merge(p2m->logdirty_ranges, hostp2m->logdirty_ranges); + +if ( rc ) +{ +p2m_free_logdirty(p2m); +goto out; +} + +p2m_init_altp2m_ept(d, idx); + + out: +p2m_unlock(p2m); + +return rc; +} + int p2m_init_altp2m_by_id(struct domain *d, unsigned int idx) { int rc = -EINVAL; @@ -2275,10 +2309,7 @@ int p2m_init_altp2m_by_id(struct domain *d, unsigned int idx) altp2m_list_lock(d); if ( d->arch.altp2m_eptp[idx] == mfn_x(INVALID_MFN) ) -{ -p2m_init_altp2m_ept(d, idx); -rc = 0; -} +rc = p2m_activate_altp2m(d, idx); altp2m_list_unlock(d); return rc; @@ -2296,9 +2327,10 @@ int p2m_init_next_altp2m(struct domain *d, uint16_t *idx) if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) ) continue; -p2m_init_altp2m_ept(d, i); -*idx = i; -rc = 0; +rc = p2m_activate_altp2m(d, i); + +if ( !rc ) +*idx = i; break; } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V13 0/5] Fix VGA logdirty related display freezes with altp2m
This series aims to prevent the display from freezing when enabling altp2m and switching to a new view (and assorted problems when resizing the display). The series introduces p2m_{init,free}_logdirty(), allocates a new logdirty rangeset for each new altp2m, and propagates (under lock) changes to all p2ms. [PATCH for-4.12 V13 1/5] x86/p2m: allocate logdirty_ranges for altp2ms [PATCH for-4.12 V13 2/5] x86/p2m: refactor p2m_reset_altp2m() [PATCH for-4.12 V13 3/5] x86/altp2m: fix display frozen when switching to a new view early [PATCH for-4.12 V13 4/5] p2m: Always use hostp2m when clipping rangesets [PATCH for-4.12 V13 5/5] p2m: change_type_range: Only invalidate mapped gfns Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V13 4/5] p2m: Always use hostp2m when clipping rangesets
The logdirty rangesets of the altp2ms need to be kept in sync with the hostp2m. This means when iterating through the altp2ms, we need to use the host p2m to clip the rangeset, not the indiviual altp2m's value. This change also: - Documents that the end is non-inclusive - Calculates an "inclusive" value for the end once, rather than open-coding the modification, and (worse) back-modifying updates so that the calculation ends up correct - Clarifies the logic deciding whether to call change_entry_type_global() or change_entry_type_range() - Handles the case where start >= hostp2m->max_mapped_pfn Signed-off-by: George Dunlap Signed-off-by: Razvan Cojocaru Tested-by: Tamas K Lengyel --- CC: George Dunlap CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" --- Changes since V12: - Corrected the clipping warning placement. --- xen/arch/x86/mm/p2m.c | 48 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index d145850..2af50af 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1002,30 +1002,46 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn_l, return rc; } -/* Modify the p2m type of a range of gfns from ot to nt. */ +/* Modify the p2m type of [start, end_exclusive) from ot to nt. */ static void change_type_range(struct p2m_domain *p2m, - unsigned long start, unsigned long end, + unsigned long start, unsigned long end_exclusive, p2m_type_t ot, p2m_type_t nt) { -unsigned long gfn = start; struct domain *d = p2m->domain; +const unsigned long host_max_pfn = p2m_get_hostp2m(d)->max_mapped_pfn; +unsigned long end = end_exclusive - 1; int rc = 0; -if ( unlikely(end > p2m->max_mapped_pfn) ) +/* + * Always clip the rangeset down to the host p2m. This is probably not + * the right behavior. This should be revisited later, but for now post a + * warning. + */ +if ( unlikely(end > host_max_pfn) ) { -if ( !gfn ) -{ -p2m->change_entry_type_global(p2m, ot, nt); -gfn = end; -} -end = p2m->max_mapped_pfn + 1; +printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to max_mapped_pfn\n", + d->domain_id); +end = host_max_pfn; } -if ( gfn < end ) -rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1); + +/* If the requested range is out of scope, return doing nothing. */ +if ( start > end ) +return; + +/* + * If all valid gfns are in the invalidation range, just do a + * global type change. Otherwise, invalidate only the range we + * need. + */ +if ( !start && end >= p2m->max_mapped_pfn ) +p2m->change_entry_type_global(p2m, ot, nt); +else +rc = p2m->change_entry_type_range(p2m, ot, nt, start, end); + if ( rc ) { -printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from %d to %d\n", - rc, d->domain_id, start, end - 1, ot, nt); +printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx) from %d to %d\n", + rc, d->domain_id, start, end_exclusive, ot, nt); domain_crash(d); } @@ -1033,11 +1049,11 @@ static void change_type_range(struct p2m_domain *p2m, { case p2m_ram_rw: if ( ot == p2m_ram_logdirty ) -rc = rangeset_remove_range(p2m->logdirty_ranges, start, end - 1); +rc = rangeset_remove_range(p2m->logdirty_ranges, start, end); break; case p2m_ram_logdirty: if ( ot == p2m_ram_rw ) -rc = rangeset_add_range(p2m->logdirty_ranges, start, end - 1); +rc = rangeset_add_range(p2m->logdirty_ranges, start, end); break; default: break; -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V13 3/5] x86/altp2m: fix display frozen when switching to a new view early
When an new altp2m view is created very early on guest boot, the display will freeze (although the guest will run normally). This may also happen on resizing the display. The reason is the way Xen currently (mis)handles logdirty VGA: it intentionally misconfigures VGA pages so that they will fault. The problem is that it only does this in the host p2m. Once we switch to a new altp2m, the misconfigured entries will no longer fault, so the display will not be updated. This patch: * updates ept_handle_misconfig() to use the active altp2m instead of the hostp2m; * modifies p2m_change_entry_type_global(), p2m_memory_type_changed(), p2m_change_type_range() and p2m_finish_type_change() to propagate their changes to all valid altp2ms. With the introduction of altp2m fields in p2m_memory_type_changed() the whole function has been put under CONFIG_HVM. Suggested-by: George Dunlap Signed-off-by: Razvan Cojocaru Tested-by: Tamas K Lengyel Acked-by: George Dunlap --- CC: Jun Nakajima CC: Kevin Tian CC: George Dunlap CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" --- Changes since V12: - None. --- xen/arch/x86/mm/p2m-ept.c | 9 ++- xen/arch/x86/mm/p2m-pt.c | 8 +++ xen/arch/x86/mm/p2m.c | 169 ++ xen/include/asm-x86/p2m.h | 6 +- 4 files changed, 158 insertions(+), 34 deletions(-) diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 6e4e375..00fb82d 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -657,6 +657,9 @@ bool_t ept_handle_misconfig(uint64_t gpa) bool_t spurious; int rc; +if ( altp2m_active(curr->domain) ) +p2m = p2m_get_altp2m(curr); + p2m_lock(p2m); spurious = curr->arch.hvm.vmx.ept_spurious_misconfig; @@ -1416,9 +1419,13 @@ void p2m_init_altp2m_ept(struct domain *d, unsigned int i) struct p2m_domain *hostp2m = p2m_get_hostp2m(d); struct ept_data *ept; +p2m->default_access = hostp2m->default_access; +p2m->domain = hostp2m->domain; + +p2m->global_logdirty = hostp2m->global_logdirty; p2m->ept.ad = hostp2m->ept.ad; p2m->min_remapped_gfn = gfn_x(INVALID_GFN); -p2m->max_remapped_gfn = 0; +p2m->max_mapped_pfn = p2m->max_remapped_gfn = 0; ept = >ept; ept->mfn = pagetable_get_pfn(p2m_get_pagetable(p2m)); d->arch.altp2m_eptp[i] = ept->eptp; diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index 17a6b61..b5c19df 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -464,6 +465,13 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa) struct p2m_domain *p2m = p2m_get_hostp2m(current->domain); int rc; +/* + * Should altp2m ever be enabled for NPT / shadow use, this code + * should be updated to make use of the active altp2m, like + * ept_handle_misconfig(). + */ +ASSERT(!altp2m_active(current->domain)); + p2m_lock(p2m); rc = do_recalc(p2m, PFN_DOWN(gpa)); p2m_unlock(p2m); diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 7c6aae7..d145850 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -277,7 +277,6 @@ int p2m_init(struct domain *d) int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start, unsigned long end) { -ASSERT(p2m_is_hostp2m(p2m)); if ( p2m->global_logdirty || rangeset_contains_range(p2m->logdirty_ranges, start, end) ) return 1; @@ -286,31 +285,79 @@ int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start, return 0; } +static void change_entry_type_global(struct p2m_domain *p2m, + p2m_type_t ot, p2m_type_t nt) +{ +p2m->change_entry_type_global(p2m, ot, nt); +p2m->global_logdirty = (nt == p2m_ram_logdirty); +} + void p2m_change_entry_type_global(struct domain *d, p2m_type_t ot, p2m_type_t nt) { -struct p2m_domain *p2m = p2m_get_hostp2m(d); +struct p2m_domain *hostp2m = p2m_get_hostp2m(d); ASSERT(ot != nt); ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt)); -p2m_lock(p2m); -p2m->change_entry_type_global(p2m, ot, nt); -p2m->global_logdirty = (nt == p2m_ram_logdirty); -p2m_unlock(p2m); +p2m_lock(hostp2m); + +change_entry_type_global(hostp2m, ot, nt); + +#ifdef CONFIG_HVM +if ( unlikely(altp2m_active(d)) ) +{ +unsigned int i; + +for ( i = 0; i < MAX_ALTP2M; i++ ) +if ( d->arch.altp2m_eptp[i] != mfn_x(INVALID_MFN) ) +{ +struct p2m_domain *altp2m = d->arch.altp2m_p2m[i]; + +p2m_lock(altp2m); +change_entry_type_global(altp2m, ot, nt); +p2m_unlock(altp2m); +} +} +#endif + +p2m_unlock(hostp2m); +} + +#ifdef CONFIG_HVM +/*
[Xen-devel] [PATCH for-4.12 V13 2/5] x86/p2m: refactor p2m_reset_altp2m()
Refactor p2m_reset_altp2m() so that it can be used to remove redundant codepaths, fixing the locking while we're at it. The previous code now replaced by p2m_reset_altp2m(d, i, ALTP2M_DEACTIVATE) calls did not set p2m->min_remapped_gfn and p2m->max_remapped_gfn because in those cases the altp2m idx was disabled; so before getting used again, p2m_init_altp2m_ept() would get called, which resets them. Always setting them in p2m_reset_altp2m(), while redundant, is preferable to an extra conditional. Signed-off-by: Razvan Cojocaru Tested-by: Tamas K Lengyel Reviewed-by: George Dunlap --- CC: George Dunlap CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" --- Changes since V12: - Added George's Reviewed-by. --- xen/arch/x86/mm/p2m.c | 57 ++- 1 file changed, 34 insertions(+), 23 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 96a6d3e..7c6aae7 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -2247,6 +2247,36 @@ bool_t p2m_altp2m_lazy_copy(struct vcpu *v, paddr_t gpa, return 1; } +enum altp2m_reset_type { +ALTP2M_RESET, +ALTP2M_DEACTIVATE +}; + +static void p2m_reset_altp2m(struct domain *d, unsigned int idx, + enum altp2m_reset_type reset_type) +{ +struct p2m_domain *p2m; + +ASSERT(idx < MAX_ALTP2M); +p2m = d->arch.altp2m_p2m[idx]; + +p2m_lock(p2m); + +p2m_flush_table_locked(p2m); + +if ( reset_type == ALTP2M_DEACTIVATE ) +p2m_free_logdirty(p2m); + +/* Uninit and reinit ept to force TLB shootdown */ +ept_p2m_uninit(p2m); +ept_p2m_init(p2m); + +p2m->min_remapped_gfn = gfn_x(INVALID_GFN); +p2m->max_remapped_gfn = 0; + +p2m_unlock(p2m); +} + void p2m_flush_altp2m(struct domain *d) { unsigned int i; @@ -2255,10 +2285,7 @@ void p2m_flush_altp2m(struct domain *d) for ( i = 0; i < MAX_ALTP2M; i++ ) { -p2m_flush_table(d->arch.altp2m_p2m[i]); -/* Uninit and reinit ept to force TLB shootdown */ -ept_p2m_uninit(d->arch.altp2m_p2m[i]); -ept_p2m_init(d->arch.altp2m_p2m[i]); +p2m_reset_altp2m(d, i, ALTP2M_DEACTIVATE); d->arch.altp2m_eptp[i] = mfn_x(INVALID_MFN); } @@ -2357,10 +2384,7 @@ int p2m_destroy_altp2m_by_id(struct domain *d, unsigned int idx) if ( !_atomic_read(p2m->active_vcpus) ) { -p2m_flush_table(d->arch.altp2m_p2m[idx]); -/* Uninit and reinit ept to force TLB shootdown */ -ept_p2m_uninit(d->arch.altp2m_p2m[idx]); -ept_p2m_init(d->arch.altp2m_p2m[idx]); +p2m_reset_altp2m(d, idx, ALTP2M_DEACTIVATE); d->arch.altp2m_eptp[idx] = mfn_x(INVALID_MFN); rc = 0; } @@ -2485,16 +2509,6 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned int idx, return rc; } -static void p2m_reset_altp2m(struct p2m_domain *p2m) -{ -p2m_flush_table(p2m); -/* Uninit and reinit ept to force TLB shootdown */ -ept_p2m_uninit(p2m); -ept_p2m_init(p2m); -p2m->min_remapped_gfn = gfn_x(INVALID_GFN); -p2m->max_remapped_gfn = 0; -} - int p2m_altp2m_propagate_change(struct domain *d, gfn_t gfn, mfn_t mfn, unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma) @@ -2528,7 +2542,7 @@ int p2m_altp2m_propagate_change(struct domain *d, gfn_t gfn, { if ( !reset_count++ ) { -p2m_reset_altp2m(p2m); +p2m_reset_altp2m(d, i, ALTP2M_RESET); last_reset_idx = i; } else @@ -2542,10 +2556,7 @@ int p2m_altp2m_propagate_change(struct domain *d, gfn_t gfn, d->arch.altp2m_eptp[i] == mfn_x(INVALID_MFN) ) continue; -p2m = d->arch.altp2m_p2m[i]; -p2m_lock(p2m); -p2m_reset_altp2m(p2m); -p2m_unlock(p2m); +p2m_reset_altp2m(d, i, ALTP2M_RESET); } ret = 0; -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 V13 5/5] p2m: change_type_range: Only invalidate mapped gfns
change_type_range() invalidates gfn ranges to lazily change the type of a range of gfns, and also modifies the logdirty rangesets of that p2m. At the moment, it clips both down by the hostp2m. While this will result in correct behavior, it's not entirely efficient, since invalidated entries outside that range will, on fault, simply be modified back to "empty" before faulting normally again. Separate out the calculation of the two ranges. Keep using the hostp2m's max_mapped_pfn to clip the logdirty ranges, but use the current p2m's max_mapped_pfn to further clip the invalidation range for alternate p2ms. Signed-off-by: George Dunlap Signed-off-by: Razvan Cojocaru Reviewed-by: Jan Beulich Tested-by: Tamas K Lengyel Acked-by: George Dunlap --- CC: George Dunlap CC: Jan Beulich CC: Andrew Cooper CC: Wei Liu CC: "Roger Pau Monné" --- Changes since V12: - Updated to apply on top of the modified patch 4. --- xen/arch/x86/mm/p2m.c | 58 +-- 1 file changed, 42 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 2af50af..6380bc0 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1007,42 +1007,68 @@ static void change_type_range(struct p2m_domain *p2m, unsigned long start, unsigned long end_exclusive, p2m_type_t ot, p2m_type_t nt) { +unsigned long invalidate_start, invalidate_end; struct domain *d = p2m->domain; const unsigned long host_max_pfn = p2m_get_hostp2m(d)->max_mapped_pfn; unsigned long end = end_exclusive - 1; +const unsigned long max_pfn = p2m->max_mapped_pfn; int rc = 0; /* - * Always clip the rangeset down to the host p2m. This is probably not - * the right behavior. This should be revisited later, but for now post a - * warning. + * If we have an altp2m, the logdirty rangeset range needs to + * match that of the hostp2m, but for efficiency, we want to clip + * down the the invalidation range according to the mapped values + * in the altp2m. Keep track of and clip the ranges separately. + */ +invalidate_start = start; +invalidate_end = end; + +/* + * Clip down to the host p2m. This is probably not the right behavior. + * This should be revisited later, but for now post a warning. */ if ( unlikely(end > host_max_pfn) ) { printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to max_mapped_pfn\n", d->domain_id); -end = host_max_pfn; +end = invalidate_end = host_max_pfn; } /* If the requested range is out of scope, return doing nothing. */ if ( start > end ) return; +if ( p2m_is_altp2m(p2m) ) +invalidate_end = min(invalidate_end, max_pfn); + /* - * If all valid gfns are in the invalidation range, just do a - * global type change. Otherwise, invalidate only the range we - * need. + * If the p2m is empty, or the range is outside the currently + * mapped range, no need to do the invalidation; just update the + * rangeset. */ -if ( !start && end >= p2m->max_mapped_pfn ) -p2m->change_entry_type_global(p2m, ot, nt); -else -rc = p2m->change_entry_type_range(p2m, ot, nt, start, end); - -if ( rc ) +if ( invalidate_start < invalidate_end ) { -printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx) from %d to %d\n", - rc, d->domain_id, start, end_exclusive, ot, nt); -domain_crash(d); +/* + * If all valid gfns are in the invalidation range, just do a + * global type change. Otherwise, invalidate only the range + * we need. + * + * NB that invalidate_end can't logically be >max_pfn at this + * point. If this changes, the == will need to be changed to + * >=. + */ +ASSERT(invalidate_end <= max_pfn); +if ( !invalidate_start && invalidate_end == max_pfn) +p2m->change_entry_type_global(p2m, ot, nt); +else +rc = p2m->change_entry_type_range(p2m, ot, nt, + invalidate_start, invalidate_end); +if ( rc ) +{ +printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from %d to %d\n", + rc, d->domain_id, invalidate_start, invalidate_end, ot, nt); +domain_crash(d); +} } switch ( nt ) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] 4.10.1 Xen crash and reboot
Hello, And again today: (XEN) [ Xen-4.10.3-pre x86_64 debug=n Not tainted ] (XEN) CPU:4 (XEN) RIP:e008:[] guest_4.o#sh_page_fault__guest_4+0x70b/0x2060 (XEN) RFLAGS: 00010203 CONTEXT: hypervisor (d61v1) (XEN) rax: 00c422641dd0 rbx: 832005c49000 rcx: 81c0e060 (XEN) rdx: rsi: 832005c49000 rdi: 00c422641dd0 (XEN) rbp: 81c0e0601880 rsp: 83207e607c38 r8: 0310 (XEN) r9: r10: r11: (XEN) r12: 83207e607ef8 r13: 00f9cea7 r14: (XEN) r15: 830079592000 cr0: 80050033 cr4: 00372660 (XEN) cr3: 001ffab1a001 cr2: 81c0e0601880 (XEN) fsb: 7f89c67fc700 gsb: gss: 88007f30 (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) Xen code around (guest_4.o#sh_page_fault__guest_4+0x70b/0x2060): (XEN) 49 c1 e8 1e 4a 8d 2c c1 <48> 8b 4d 00 f6 c1 01 0f 84 f8 06 00 00 48 c1 e1 (XEN) Xen stack trace from rsp=83207e607c38: (XEN)830f68748208 00c422641dd0 832005c49600 00f9cea7 (XEN)832005c49660 832005c49000 83207e607d70 83207e607d20 (XEN)0c422641 0090 82d0805802c0 000205c49000 (XEN)0008 0880 0898 82d0805802c0 (XEN)01fd58a1 01ffab1a 0208 0041 (XEN)800f9cea7825 01ff82d0 000d 82d0 (XEN)832005c49000 0001000d 83207e607fff 83207e607d20 (XEN)00a1 00c422641dd0 000f86569067 000f86544067 (XEN)000f68748067 800f9cea7925 00f87171 00f86569 (XEN)00f86544 00f68748 0005 (XEN)82e03ff56340 832005c49000 00057ff0 (XEN)83207e607e18 830079592000 832005c49000 83207e607ef8 (XEN)00c422641dd0 82d08034e4b0 82d080349e20 (XEN) 83207e607fff 830079592000 82d08034e5ae (XEN)82d080354913 82d080354907 82d080354913 82d080354907 (XEN)82d080354913 82d080354907 82d080354913 82d080354907 (XEN)82d080354913 82d080354907 82d080354913 82d080354907 (XEN)82d080354913 83207e607ef8 830079592000 00c422641dd0 (XEN)832005c49000 0004 82d0802a1842 (XEN)82d080354913 82d080354907 82d080354913 82d080354907 (XEN) Xen call trace: (XEN)[] guest_4.o#sh_page_fault__guest_4+0x70b/0x2060 (XEN)[] do_iret+0/0x1a0 (XEN)[] toggle_guest_pt+0x30/0x160 (XEN)[] do_iret+0xfe/0x1a0 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] do_page_fault+0x1a2/0x4e0 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] handle_exception+0x8f/0xf9 (XEN)[] handle_exception+0x9b/0xf9 (XEN)[] x86_64/entry.S#handle_exception_saved+0x68/0x94 (XEN) (XEN) Pagetable walk from 81c0e0601880: (XEN) L4[0x103] = 801ffab1a063 (XEN) L3[0x103] = 801ffab1a063 (XEN) L2[0x103] = 801ffab1a063 (XEN) L1[0x001] = (XEN) (XEN) (XEN) Panic on CPU 4: (XEN) FATAL PAGE FAULT (XEN) [error_code=] (XEN) Faulting linear address: 81c0e0601880 (XEN) (XEN) (XEN) Reboot in five seconds... (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. Host has now rebooted into a hypervisor with pcid=0 command line. I note that: (XEN) RFLAGS: 00010203 CONTEXT: hypervisor (d61v1) and the previous incident (below): (XEN) RFLAGS: 00010246 CONTEXT: hypervisor (d31v1) These are the same guest. Is it worth me moving this guest to a test host without pcid=0 to see if it crashes it, meanwhile keeping production hosts with pcid=0? And then putting pcid=0 on the test host to see if it survives longer? This will take quite a long time to gain confidence of, since the incidents are about 2 weeks apart each time. Thanks, Andy On Mon, Dec 10, 2018 at 03:58:41PM +, Andy Smith wrote: > Hi, > > Up front information: > > Today one of my Xen hosts crashed with this logging on the serial: > > (XEN) [ Xen-4.10.1 x86_64 debug=n Not
[Xen-devel] [PATCH v2] gic-vgic: skip irqs locking in gic_restore_pending_irqs()
From: Andrii Anisov This function is called under IRQs disabled already, so drop additional flags save and restore. Signed-off-by: Andrii Anisov --- This is a half of an RFC patch [1] which relies on the already existing code. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03293.html Changes in v2: Added an ASSERT() to verify the function should be called with interrupts disabled. --- xen/arch/arm/gic-vgic.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 48922f5..f9098c8 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -279,11 +279,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) int lr = 0; struct pending_irq *p, *t, *p_r; struct list_head *inflight_r; -unsigned long flags; unsigned int nr_lrs = gic_get_nr_lrs(); int lrs = nr_lrs; -spin_lock_irqsave(>arch.vgic.lock, flags); +ASSERT(!local_irq_is_enabled()); + +spin_lock(>arch.vgic.lock); if ( list_empty(>arch.vgic.lr_pending) ) goto out; @@ -327,7 +328,7 @@ found: } out: -spin_unlock_irqrestore(>arch.vgic.lock, flags); +spin_unlock(>arch.vgic.lock); } void gic_clear_pending_irqs(struct vcpu *v) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] x86/mm-locks: apply a bias to lock levels for control domain
> On Dec 21, 2018, at 9:41 AM, Roger Pau Monne wrote: > > paging_log_dirty_op function takes mm locks from a subject domain and > then attempts to perform copy to operations against the caller domain > in order to copy the result of the hypercall into the caller provided > buffer. > > This works fine when the caller is a non-paging domain, but triggers a > lock order panic when the caller is a paging domain due to the fact > that at the point where the copy to operation is performed the subject > domain paging lock is locked, and the copy operation requires > locking the caller p2m lock which has a lower level. > > Fix this limitation by adding a bias to the level of control domain mm > locks, so that the lower control domain mm lock always has a level > greater than the higher unprivileged domain lock level. This allows > locking the subject domain mm locks and then locking the control > domain mm locks, while keeping the same lock ordering and the changes > mostly confined to mm-locks.h. > > Note that so far only this flow (locking a subject domain locks and > then the control domain ones) has been identified, but not all > possible code paths have been inspected. Hence this solution attempts > to be a non-intrusive fix for the problem at hand, without discarding > further changes in the future if other valid code paths are found that > require more complex lock level ordering. > > Signed-off-by: Roger Pau Monné Reviewed-by: George Dunlap …but given the nature of the change, I’d like to see it pass at least one ad-hoc osstest run before it gets checked in. (That probably means you’ll have to chase someone else to do that.) -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V12 4/5] p2m: Always use hostp2m when clipping rangesets
On 12/21/18 8:24 PM, George Dunlap wrote: > >> On Dec 20, 2018, at 5:38 PM, Razvan Cojocaru >> wrote: >> >> The logdirty rangesets of the altp2ms need to be kept in sync with the >> hostp2m. This means when iterating through the altp2ms, we need to >> use the host p2m to clip the rangeset, not the indiviual altp2m's >> value. >> >> This change also: >> >> - Documents that the end is non-inclusive >> >> - Calculates an "inclusive" value for the end once, rather than >> open-coding the modification, and (worse) back-modifying updates so >> that the calculation ends up correct >> >> - Clarifies the logic deciding whether to call >> change_entry_type_global() or change_entry_type_range() >> >> - Handles the case where start >= hostp2m->max_mapped_pfn >> >> Signed-off-by: George Dunlap >> Signed-off-by: Razvan Cojocaru >> Tested-by: Tamas K Lengyel >> >> --- >> CC: George Dunlap >> CC: Jan Beulich >> CC: Andrew Cooper >> CC: Wei Liu >> CC: "Roger Pau Monné" >> >> --- >> Changes since V11: >> - Added end_exclusive, to avoid modifying function arguments. >> - Moved the if ( start >= host_max_pfn ) warning below the code >> doing the clipping. > > I’m afraid you misunderstood what I meant. > > Why would we want to give a warning only if start >= host_max_pfn? And if > the *start* was above host_max_pfn, why would we tell people we were > “clipping” the rangeset, when in fact we drop it entirely in that case? > > Rather, I meant this: > > — > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c > index f552fd5199..2af50af2bd 100644 > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -1018,11 +1018,11 @@ static void change_type_range(struct p2m_domain *p2m, > * warning. > */ > if ( unlikely(end > host_max_pfn) ) > -end = host_max_pfn; > - > -if ( start >= host_max_pfn ) > +{ > printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to > max_mapped_pfn\n", > d->domain_id); > +end = host_max_pfn; > +} > > /* If the requested range is out of scope, return doing nothing. */ > if ( start > end ) > — > > Everything else looks good. Apologies, I'll resend. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 11/11] dm_depriv: Mark `UID cleanup` as completed
Signed-off-by: George Dunlap Acked-by: Ian Jackson --- CC: Ian Jackson CC: Wei Liu --- docs/designs/qemu-deprivilege.md | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/designs/qemu-deprivilege.md b/docs/designs/qemu-deprivilege.md index f7444a434d..81a5f5c05d 100644 --- a/docs/designs/qemu-deprivilege.md +++ b/docs/designs/qemu-deprivilege.md @@ -128,26 +128,6 @@ are specified; this does not apply to QEMU running as a Xen DM. '''Tested''': Not tested -# Restrictions / improvements still to do - -This lists potential restrictions still to do. It is meant to be -listed in order of ease of implementation, with low-hanging fruit -first. - -### Further RLIMITs - -RLIMIT_AS limits the total amount of memory; but this includes the -virtual memory which QEMU uses as a mapcache. xen-mapcache.c already -fiddles with this; it would be straightforward to make it *set* the -rlimit to what it thinks a sensible limit is. - -RLIMIT_NPROC limits total number of processes or threads. QEMU uses -threads for some devices, so this would require some thought. - -Other things that would take some cleverness / changes to QEMU to -utilize due to ordering constrants: - - RLIMIT_NOFILES (after all necessary files are opened) - ### libxl UID cleanup '''Description''': Domain IDs are reused, and thus restricted UIDs are @@ -223,6 +203,26 @@ Since this will kill all other `reaper_uid` processes as well, we must either allocate a separate `reaper_uid` per domain, or use locking to ensure that only one killing process is active at a time. +# Restrictions / improvements still to do + +This lists potential restrictions still to do. It is meant to be +listed in order of ease of implementation, with low-hanging fruit +first. + +### Further RLIMITs + +RLIMIT_AS limits the total amount of memory; but this includes the +virtual memory which QEMU uses as a mapcache. xen-mapcache.c already +fiddles with this; it would be straightforward to make it *set* the +rlimit to what it thinks a sensible limit is. + +RLIMIT_NPROC limits total number of processes or threads. QEMU uses +threads for some devices, so this would require some thought. + +Other things that would take some cleverness / changes to QEMU to +utilize due to ordering constrants: + - RLIMIT_NOFILES (after all necessary files are opened) + ## libxl: Treat QMP connection as untrusted '''Description''': Currently libxl talks with QEMU via QMP; but its -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] gic-vgic: skip irqs locking in gic_restore_pending_irqs()
Hi, Sorry for the formatting. Please address my comments from the previous version. Cheers, On Fri, 21 Dec 2018, 18:18 Andrii Anisov, wrote: > From: Andrii Anisov > > This function is called under IRQs disabled already, so drop additional > flags save and restore. > > Signed-off-by: Andrii Anisov > > --- > > This is a half of an RFC patch [1] which relies on the already > existing code. > > [1] > https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03293.html > > --- > xen/arch/arm/gic-vgic.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c > index 48922f5..34179c0 100644 > --- a/xen/arch/arm/gic-vgic.c > +++ b/xen/arch/arm/gic-vgic.c > @@ -279,11 +279,10 @@ static void gic_restore_pending_irqs(struct vcpu *v) > int lr = 0; > struct pending_irq *p, *t, *p_r; > struct list_head *inflight_r; > -unsigned long flags; > unsigned int nr_lrs = gic_get_nr_lrs(); > int lrs = nr_lrs; > > -spin_lock_irqsave(>arch.vgic.lock, flags); > +spin_lock(>arch.vgic.lock); > > if ( list_empty(>arch.vgic.lr_pending) ) > goto out; > @@ -327,7 +326,7 @@ found: > } > > out: > -spin_unlock_irqrestore(>arch.vgic.lock, flags); > +spin_unlock(>arch.vgic.lock); > } > > void gic_clear_pending_irqs(struct vcpu *v) > -- > 2.7.4 > > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 08/11] libxl: Kill QEMU by uid when possible
George Dunlap writes ("[PATCH v4 08/11] libxl: Kill QEMU by uid when possible"): > The privcmd fd that a dm_restrict'ed QEMU has gives it permission to > one specific domain ID. This domain ID will probably eventually be > used again. It is therefore necessary to make absolutely sure that a > rogue QEMU process cannot hang around after its domain has exited. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V12 4/5] p2m: Always use hostp2m when clipping rangesets
> On Dec 20, 2018, at 5:38 PM, Razvan Cojocaru > wrote: > > The logdirty rangesets of the altp2ms need to be kept in sync with the > hostp2m. This means when iterating through the altp2ms, we need to > use the host p2m to clip the rangeset, not the indiviual altp2m's > value. > > This change also: > > - Documents that the end is non-inclusive > > - Calculates an "inclusive" value for the end once, rather than > open-coding the modification, and (worse) back-modifying updates so > that the calculation ends up correct > > - Clarifies the logic deciding whether to call > change_entry_type_global() or change_entry_type_range() > > - Handles the case where start >= hostp2m->max_mapped_pfn > > Signed-off-by: George Dunlap > Signed-off-by: Razvan Cojocaru > Tested-by: Tamas K Lengyel > > --- > CC: George Dunlap > CC: Jan Beulich > CC: Andrew Cooper > CC: Wei Liu > CC: "Roger Pau Monné" > > --- > Changes since V11: > - Added end_exclusive, to avoid modifying function arguments. > - Moved the if ( start >= host_max_pfn ) warning below the code > doing the clipping. I’m afraid you misunderstood what I meant. Why would we want to give a warning only if start >= host_max_pfn? And if the *start* was above host_max_pfn, why would we tell people we were “clipping” the rangeset, when in fact we drop it entirely in that case? Rather, I meant this: — diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index f552fd5199..2af50af2bd 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1018,11 +1018,11 @@ static void change_type_range(struct p2m_domain *p2m, * warning. */ if ( unlikely(end > host_max_pfn) ) -end = host_max_pfn; - -if ( start >= host_max_pfn ) +{ printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to max_mapped_pfn\n", d->domain_id); +end = host_max_pfn; +} /* If the requested range is out of scope, return doing nothing. */ if ( start > end ) — Everything else looks good. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 10/11] libxl: Introduce specific username to be used as a reaper
Untrusted device models must be killed by uid rather than by pid for safety. To do this reliably, we need another uid, not used for any other purpose, from which to make the kill system call. When using xen-qemuuser-range-base, we can repurpose xen-qemuuser-range-base itself as a UID from which to kill other devicemodel uids (since domain ID 0 should never have a device model associated with it). However, we'd like people to be able to use the device_model_user feature without also defining xen-qemuuser-range-base (which requires the ability to 'reserve' 32k+ user IDs). To that end, introduce the xen-qemuuser-reaper id. When killing by UID, first look for and use that ID if available; then fall back to xen-qemuuser-range-base. Document the new call in docs/features/qemu-deprivilege.pandoc. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v3: - New in this version CC: Ian Jackson CC: Wei Liu CC: Anthony Perard --- docs/features/qemu-deprivilege.pandoc | 9 ++ tools/libxl/libxl_dm.c| 40 +-- tools/libxl/libxl_internal.h | 1 + 3 files changed, 35 insertions(+), 15 deletions(-) diff --git a/docs/features/qemu-deprivilege.pandoc b/docs/features/qemu-deprivilege.pandoc index ce21a60ef7..eb05981a83 100644 --- a/docs/features/qemu-deprivilege.pandoc +++ b/docs/features/qemu-deprivilege.pandoc @@ -77,12 +77,21 @@ And then in your config file, the following line: device_model_user="xen-qemuuser-c6-01" +If you use this method, you should also allocate one "reaper" user to +be used for killing device models: + +adduser --system --no-create-home --group xen-qemuuser-reaper + NOTE: It is important when using `device_model_user` that EACH VM HAVE A SEPARATE UID, and that none of these UIDs map to root. xl will throw an error a uid maps to zero, but not if multiple VMs have the same uid. Multiple VMs with the same device model uid will cause problems. +It is also important that `xen-qemuuser-reaper` not have any processes +associated with it, as they will be destroyed when deprivileged qemu +processes are destroyed. + ## Domain config changes The core domain config change is to add the following line to the diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 922fa70f11..f5322e3f45 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -268,24 +268,37 @@ static int libxl__get_reaper_uid(libxl__gc *gc, uid_t *reaper_uid) struct passwd *user_base, user_pwbuf; int rc; +rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_REAPER, + _pwbuf, _base); +/* + * Either there was an error, or we found a suitable user; stop + * looking + */ +if (rc || user_base) +goto out; + rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); -if (rc) -return rc; +if (rc || user_base) +goto out; -if (!user_base) { -LOG(WARN, "Couldn't find uid for reaper process"); -return ERROR_INVAL; -} - -if (user_base->pw_uid == 0) { -LOG(ERROR, "UID for reaper process maps to root!"); -return ERROR_INVAL; +LOG(WARN, "Couldn't find uid for reaper process"); +rc = ERROR_INVAL; + + out: +/* First check to see if the discovered user maps to root */ +if (!rc) { +if (user_base->pw_uid == 0) { +LOG(ERROR, "UID for reaper process maps to root!"); +rc = ERROR_INVAL; +} } -*reaper_uid = user_base->pw_uid; +/* If everything is OK, set reaper_uid as appropriate */ +if (!rc) +*reaper_uid = user_base->pw_uid; -return 0; +return rc; } const char *libxl__domain_device_model(libxl__gc *gc, @@ -2908,9 +2921,6 @@ static int get_reaper_lock_and_uid(libxl__destroy_devicemodel_state *ddms, /* * Get reaper_uid. If we can't find such a uid, return an error. - * - * FIXME: This means that domain destruction will fail if - * device_model_user is set but QEMU_USER_RANGE_BASE doesn't exist. */ return libxl__get_reaper_uid(gc, reaper_uid); } diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 8323c7924d..67f3f6ac76 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -4419,6 +4419,7 @@ _hidden int libxl__read_sysfs_file_contents(libxl__gc *gc, #define LIBXL_QEMU_USER_PREFIX "xen-qemuuser" #define LIBXL_QEMU_USER_SHARED LIBXL_QEMU_USER_PREFIX"-shared" #define LIBXL_QEMU_USER_RANGE_BASE LIBXL_QEMU_USER_PREFIX"-range-base" +#define LIBXL_QEMU_USER_REAPER LIBXL_QEMU_USER_PREFIX"-reaper" static inline bool libxl__acpi_defbool_val(const libxl_domain_build_info *b_info) { -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 02/11] libxl: Get rid of support for QEMU_USER_BASE (xen-qemuuser-domidNN)
QEMU_USER_BASE allows a user to specify the UID to use when running the devicemodel for a specific domain number. Unfortunately, this is not really practical: It requires nearly 32,000 entries in /etc/passwd. QEMU_USER_RANGE_BASE is much more practical. Remove support for QEMU_USER_BASE. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- NB that I've chosen not to update the xl.cfg man page at this time; it needs a lot of other updates as well, which would be easier to do all at once at the end. CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 16 tools/libxl/libxl_internal.h | 1 - 2 files changed, 4 insertions(+), 13 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index bbcbc94b6c..6024d4b7b8 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -138,13 +138,6 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, return 0; } -user = GCSPRINTF("%s%d", LIBXL_QEMU_USER_BASE, guest_domid); -ret = userlookup_helper_getpwnam(gc, user, _pwbuf, 0); -if (ret < 0) -return ret; -if (ret > 0) -goto end_search; - ret = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); if (ret < 0) @@ -174,15 +167,14 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, if (ret < 0) return ret; if (ret > 0) { -LOGD(WARN, guest_domid, "Could not find user %s%d, falling back to %s", - LIBXL_QEMU_USER_BASE, guest_domid, LIBXL_QEMU_USER_SHARED); +LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s", + LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED); goto end_search; } LOGD(ERROR, guest_domid, - "Could not find user %s%d or %s or range base pseudo-user %s, cannot restrict", - LIBXL_QEMU_USER_BASE, guest_domid, LIBXL_QEMU_USER_SHARED, - LIBXL_QEMU_USER_RANGE_BASE); + "Could not find user %s or range base pseudo-user %s, cannot restrict", + LIBXL_QEMU_USER_SHARED, LIBXL_QEMU_USER_RANGE_BASE); return ERROR_INVAL; end_search: diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index c4a43bd0b7..b147f3803c 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -4387,7 +4387,6 @@ _hidden int libxl__read_sysfs_file_contents(libxl__gc *gc, int *datalen_r); #define LIBXL_QEMU_USER_PREFIX "xen-qemuuser" -#define LIBXL_QEMU_USER_BASE LIBXL_QEMU_USER_PREFIX"-domid" #define LIBXL_QEMU_USER_SHARED LIBXL_QEMU_USER_PREFIX"-shared" #define LIBXL_QEMU_USER_RANGE_BASE LIBXL_QEMU_USER_PREFIX"-range-base" -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 03/11] libxl: Clean up userlookup_helper_getpw* helper
Bring conventions more in line with libxl__xs_read_checked(): - If found, return 0 and set pointer to non-NULL - If not found, return 0 and set pointer to NULL - On error, return libxl-style error number. Update documentation to match. Use CODING_STYLE compliant `r` rather than `ret`. On error, log the error code before returning instead of discarding it. Now that it only returns 0 or errno, update caller error checks to be `if (ret)` rather than `if (ret < 0)`. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v3: - Used more idiomatic `if (ret)` rather than `if (ret < 0)` CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 6024d4b7b8..67204b94c2 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -72,7 +72,13 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name) * userlookup_helper_getpwuid(libxl__gc*, uid_t uid, * struct passwd **pwd_r); * - * returns 1 if the user was found, 0 if it was not, -1 on error + * If the user is found, return 0 and set *pwd_r to the appropriat + * value. + * + * If the user is not found but there are no errors, return 0 + * and set *pwd_r to NULL. + * + * On error, return a libxl-style error code. */ #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \ static int userlookup_helper_##NAME(libxl__gc *gc, \ @@ -83,7 +89,7 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name) struct STRUCTNAME *resultp = NULL; \ char *buf = NULL; \ long buf_size; \ -int ret;\ +int r; \ \ buf_size = sysconf(SYSCONF);\ if (buf_size < 0) { \ @@ -95,17 +101,16 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name) \ while (1) { \ buf = libxl__realloc(gc, buf, buf_size);\ -ret = NAME##_r(spec, resultbuf, buf, buf_size, ); \ -if (ret == ERANGE) {\ +r = NAME##_r(spec, resultbuf, buf, buf_size, ); \ +if (r == ERANGE) { \ buf_size += 128;\ continue; \ } \ -if (ret != 0) \ +if (r != 0) { \ +LOGEV(ERROR, r, "Looking up username/uid with " #NAME); \ return ERROR_FAIL; \ -if (resultp != NULL) { \ -if (out) *out = resultp;\ -return 1; \ } \ +*out = resultp; \ return 0; \ } \ } @@ -140,16 +145,16 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, ret = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); -if (ret < 0) +if (ret) return ret; -if (ret > 0) { +if (user_base) { struct passwd *user_clash, user_clash_pwbuf; uid_t intended_uid = user_base->pw_uid + guest_domid; ret = userlookup_helper_getpwuid(gc, intended_uid, _clash_pwbuf, _clash); -if (ret < 0) +if (ret) return ret; -if (ret > 0) { +if (user_clash) { LOGD(ERROR, guest_domid, "wanted to use uid %ld (%s + %d) but that is user %s !", (long)intended_uid, LIBXL_QEMU_USER_RANGE_BASE, @@ -163,10 +168,10 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, } user = LIBXL_QEMU_USER_SHARED; -ret = userlookup_helper_getpwnam(gc, user, _pwbuf, 0); -if (ret < 0) +ret = userlookup_helper_getpwnam(gc, user, _pwbuf,
[Xen-devel] [PATCH v4 05/11] libxl: Do root checks once in libxl__domain_get_device_model_uid
At the moment, we check for equivalence to literal "root" before deciding whether to add the `runas` command-line option to QEMU. This is unsatisfactory for several reasons. First, just because the string doesn't match "root" doesn't mean the final uid won't end up being zero; in particular, the range_base calculations may end up producing "0:NNN", which would be root in any case. Secondly, it's almost certainly a configuration error if the resulting uid ends up to be zero; rather than silently do what was specified but probably not intended, throw an error. To fix this, check for root once in libxl__domain_get_device_model_uid. If the result is root, return an error; if appropriate, set `runas`. After that, assume that the presence of state->dm_runas implies that a `runas` argument should be constructed. One side effect of this is to check whether device_model_user exists before passing it to qemu, resulting in better error reporting. While we're here: - Refactor the function to use the "goto out" idiom - Use 'rc' rather than 'ret', in line with CODING_STYLE Signed-off-by: George Dunlap Acked-by: Ian Jackson --- Let the record show that I personally think the `goto out` pattern here would be improved by having a "root check" label. v3: - Update to use more idiomatic `if (rc)` - Make "inputs" to goto more clear - Initialize intended_uid to -1 sentinel - Document expectations when jumping to 'out' - Don't return directly after non-trivial initial checks - Always explicitly set rc to 0 when jumping to out, even if we just checked that it was zero a few lines earlier - Work around one-goto limitation by having multiple 'rc' checks. - Return generic ERROR_INVAL rather than in accurate ERROR_DEVICE_EXISTS v2: - Refactor to use `out` rather than multiple labels - Only check for root once - Use 'out' rather than direct returns for errors (only use direct returns for early `succeed-without-setting-runas` paths) - Use `rc` rather than `ret` to more closely align with CODING_STYLE - Fill out comments about the cases we're handling - Return ERROR_DEVICE_EXISTS rather than ERROR_FAIL if there's another username that maps to our calculated uid - Report an error if the specified device_model_user doesn't exist CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 107 - 1 file changed, 85 insertions(+), 22 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 67204b94c2..d73bbb6b06 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -126,65 +126,128 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, const libxl_domain_build_info *b_info = >guest_config->b_info; struct passwd *user_base, user_pwbuf; -int ret; +int rc; char *user; +uid_t intended_uid = -1; /* Only qemu-upstream can run as a different uid */ if (b_info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) return 0; +/* + * From this point onward, all paths should go through the `out` + * label. The invariants should be: + * - rc may be 0, or an error code. + * - if rc is an error code, user and intended_uid are ignored. + * - if rc is 0, user may be set or not set. + * - if user is set, then intended_uid must be set to a UID matching + * the username `user`. This will be checked for root (0). + */ + +/* + * If device_model_user is present, set `-runas` even if + * dm_restrict isn't in use + */ user = b_info->device_model_user; -if (user) -goto end_search; +if (user) { +rc = userlookup_helper_getpwnam(gc, user, _pwbuf, _base); +if (rc) +goto out; + +if (!user_base) { +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", + user); +rc = ERROR_INVAL; +goto out; +} + +intended_uid = user_base->pw_uid; +rc = 0; +goto out; +} +/* + * If dm_restrict isn't set, and we don't have a specified user, don't + * bother setting a `-runas` parameter. + */ if (!libxl_defbool_val(b_info->dm_restrict)) { LOGD(DEBUG, guest_domid, "dm_restrict disabled, starting QEMU as root"); -return 0; +user = NULL; /* Should already be null, but just in case */ +goto out; } -ret = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, +/* + * dm_restrict is set, but device_model_user isn't set; look for + * QEMU_USER_BASE_RANGE + */ +rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); -if (ret) -return ret; +if (rc) +goto out; if (user_base) { struct passwd *user_clash, user_clash_pwbuf; -uid_t intended_uid = user_base->pw_uid + guest_domid; -ret =
[Xen-devel] [PATCH v4 07/11] libxl: Make killing of device model asynchronous
Or at least, give it an asynchronous interface so that we can make it actually asynchronous in subsequent patches. Create state structures and callback function signatures. Add the state structure to libxl__destroy_domid_state. Break libxl__destroy_domid down into two functions. No functional change intended. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v2: - Note that libxl__devicemodel_destroy_cb may be called reentrantly NB that I retain the comment before libxl__destroy_device_model, in spite of the fact that it looks "pointless", to separate it logically from the previous prototype. CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 11 +++--- tools/libxl/libxl_domain.c | 40 tools/libxl/libxl_internal.h | 20 -- 3 files changed, 58 insertions(+), 13 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 450433452d..ca59df33fe 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2696,19 +2696,24 @@ out: return rc; } -int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid) +void libxl__destroy_device_model(libxl__egc *egc, + libxl__destroy_devicemodel_state *ddms) { +STATE_AO_GC(ddms->ao); int rc; +int domid = ddms->domid; char *path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, ""); + if (!xs_rm(CTX->xsh, XBT_NULL, path)) LOGD(ERROR, domid, "xs_rm failed for %s", path); + /* We should try to destroy the device model anyway. */ rc = kill_device_model(gc, GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); - + libxl__qmp_cleanup(gc, domid); -return rc; +ddms->callback(egc, ddms, rc); } /* Return 0 if no dm needed, 1 if needed and <0 if error. */ diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c index d46b97dedf..0ce1ba1327 100644 --- a/tools/libxl/libxl_domain.c +++ b/tools/libxl/libxl_domain.c @@ -1008,6 +1008,10 @@ static void destroy_finish_check(libxl__egc *egc, } /* Callbacks for libxl__destroy_domid */ +static void dm_destroy_cb(libxl__egc *egc, + libxl__destroy_devicemodel_state *ddms, + int rc); + static void devices_destroy_cb(libxl__egc *egc, libxl__devices_remove_state *drs, int rc); @@ -1066,16 +1070,18 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) if (rc < 0) { LOGEVD(ERROR, rc, domid, "xc_domain_pause failed"); } + if (dm_present) { -if (libxl__destroy_device_model(gc, domid) < 0) -LOGD(ERROR, domid, "libxl__destroy_device_model failed"); +dis->ddms.ao = ao; +dis->ddms.domid = domid; +dis->ddms.callback = dm_destroy_cb; + +libxl__destroy_device_model(egc, >ddms); +return; +} else { +dm_destroy_cb(egc, >ddms, 0); +return; } -dis->drs.ao = ao; -dis->drs.domid = domid; -dis->drs.callback = devices_destroy_cb; -dis->drs.force = 1; -libxl__devices_destroy(egc, >drs); -return; out: assert(rc); @@ -1083,6 +1089,24 @@ out: return; } +static void dm_destroy_cb(libxl__egc *egc, + libxl__destroy_devicemodel_state *ddms, + int rc) +{ +libxl__destroy_domid_state *dis = CONTAINER_OF(ddms, *dis, ddms); +STATE_AO_GC(dis->ao); +uint32_t domid = dis->domid; + +if (rc < 0) +LOGD(ERROR, domid, "libxl__destroy_device_model failed"); + +dis->drs.ao = ao; +dis->drs.domid = domid; +dis->drs.callback = devices_destroy_cb; +dis->drs.force = 1; +libxl__devices_destroy(egc, >drs); +} + static void devices_destroy_cb(libxl__egc *egc, libxl__devices_remove_state *drs, int rc) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index b147f3803c..f9e0bf6578 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1705,8 +1705,6 @@ _hidden int libxl__wait_for_device_model_deprecated(libxl__gc *gc, void *userdata), void *check_callback_userdata); -_hidden int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid); - _hidden const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *g_cfg); _hidden char *libxl__abs_path(libxl__gc *gc, const char *s, const char *path); @@ -3672,6 +3670,7 @@ extern const struct libxl_device_type *device_type_tbl[]; typedef struct libxl__domain_destroy_state libxl__domain_destroy_state; typedef struct libxl__destroy_domid_state libxl__destroy_domid_state; +typedef struct libxl__destroy_devicemodel_state libxl__destroy_devicemodel_state; typedef struct libxl__devices_remove_state
[Xen-devel] [PATCH v4 04/11] dm_depriv: Describe expected usage of device_model_user parameter
A number of subsequent patches rely on as-yet undefined behavior for what the `device_model_user` parameter does. Rather than implement it incorrectly (or randomly), or remove the feature, describe an expected usage for the feature. Further patches will make decisions based on this expected usage. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v3: - Fix a minor typo v2: - Remove stale comment about device_model_user not being ready RFC: As we'll see in a later patch, this implementation is still incomplete: we need a `reaper` uid from which to kill uids. CC: Ian Jackson CC: Wei Liu CC: Anthony Perard --- docs/features/qemu-deprivilege.pandoc | 17 + tools/libxl/libxl_types.idl | 1 - 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/features/qemu-deprivilege.pandoc b/docs/features/qemu-deprivilege.pandoc index f941525189..ce21a60ef7 100644 --- a/docs/features/qemu-deprivilege.pandoc +++ b/docs/features/qemu-deprivilege.pandoc @@ -66,6 +66,23 @@ this, create a user named `xen-qemuuser-shared`; for example: adduser --no-create-home --system xen-qemuuser-shared +A final way to set up a separate process for qemus is to allocate one +UID per VM, and set the UID in the domain config file with the +`device_model_user` argument. For example, suppose you have a VM +named `c6-01`. You might do the following: + +adduser --system --no-create-home --group xen-qemuuser-c6-01 + +And then in your config file, the following line: + +device_model_user="xen-qemuuser-c6-01" + +NOTE: It is important when using `device_model_user` that EACH VM HAVE +A SEPARATE UID, and that none of these UIDs map to root. xl will +throw an error a uid maps to zero, but not if multiple VMs have the +same uid. Multiple VMs with the same device model uid will cause +problems. + ## Domain config changes The core domain config change is to add the following line to the diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 51cf06a3a2..141c46e42a 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -495,7 +495,6 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("device_model", string), ("device_model_ssidref", uint32), ("device_model_ssid_label", string), -# device_model_user is not ready for use yet ("device_model_user", string), # extra parameters pass directly to qemu, NULL terminated -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 09/11] libxl: Kill QEMU with "reaper" ruid
Using kill(-1) to killing an untrusted dm process with the real uid equal to the dm_uid isn't guaranteed to succeed: the process in question may be able to kill the reaper process after the setresuid() and before the kill(). Instead, set the real uid to the QEMU user for domain 0 (QEMU_USER_RANGE_BASE + 0). The reaper process will still be able to kill the dm process, but not vice versa. This, in turn, requires locking to make sure that only one reaper process is using that uid at a time; otherwise one reaper process may kill the other reaper process. Create a lockfile in RUNDIR/dm-reaper-lock, and grab the lock before executing kill. In the event that we can't get the lock for some reason, go ahead with the kill using dm_uid for both real and effective UIDs. This isn't guaranteed to work, but it's no worse than not trying to kill the process at all. NB that this effectively requires admins using device_model_user to also define xen_qemuuser_range_base; this will be addressed in subsequent patches. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v4: - Add a comment to the pot of get_reaper_lock_and_uid() mentioning that it must be called from a subprocess. v3: - Have libxl__get_reaper_uid return an error if a suitable uid is not found. - Explicitly set reaper_uid to dm_uid if get_reaper_lock_and_uid() returns an error, rather than relying on reaper_uid being unchanged on failure. - Open the lockfile 0644 rather than 0666. - Remove bogus comment. v2: - Port over previous changes - libxl__get_reaper_uid() won't set errno, use LOG rather than LOGE. - Accumulate error and return for all failures - Move flock() outside of the condition. Also fix EINTR check (check errno rather than return value). - Add a comment explaining why we return an error even if the kill() succeeds - Move locking to a separate function to minimize gotos - Refactor libxl__get_reaper_id to take a pointer for reaper_uid; return only success/failure. Also return EINVAL if reaper_uid would resolve to 0. - Handle "reaper_uid not found" specially; note issue with device_model_user feature - Assert that final reaper_uid != 0 CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 122 + 1 file changed, 112 insertions(+), 10 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index f2a21cf744..922fa70f11 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -259,6 +259,35 @@ out: return rc; } +/* + * Look up "reaper UID". If present and non-root, returns 0 and sets + * reaper_uid. Otherwise returns libxl-style error. + */ +static int libxl__get_reaper_uid(libxl__gc *gc, uid_t *reaper_uid) +{ +struct passwd *user_base, user_pwbuf; +int rc; + +rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, + _pwbuf, _base); +if (rc) +return rc; + +if (!user_base) { +LOG(WARN, "Couldn't find uid for reaper process"); +return ERROR_INVAL; +} + +if (user_base->pw_uid == 0) { +LOG(ERROR, "UID for reaper process maps to root!"); +return ERROR_INVAL; +} + +*reaper_uid = user_base->pw_uid; + +return 0; +} + const char *libxl__domain_device_model(libxl__gc *gc, const libxl_domain_build_info *info) { @@ -2834,37 +2863,110 @@ out: return; } +/* + * Note that this attempts to grab a file lock, so must be called from + * a sub-process. + */ +static int get_reaper_lock_and_uid(libxl__destroy_devicemodel_state *ddms, + uid_t *reaper_uid) +{ +STATE_AO_GC(ddms->ao); +int domid = ddms->domid; +int r; +const char * lockfile; +int fd; + +/* Try to lock the "reaper uid" */ +lockfile = GCSPRINTF("%s/dm-reaper-lock", libxl__run_dir_path()); + +/* + * NB that since we've just forked, we can't have any + * threads; so we don't need the libxl__carefd + * infrastructure here. + */ +fd = open(lockfile, O_RDWR|O_CREAT, 0644); +if (fd < 0) { +LOGED(ERROR, domid, + "unexpected error while trying to open lockfile %s, errno=%d", + lockfile, errno); +return ERROR_FAIL; +} + +/* Try to lock the file, retrying on EINTR */ +for (;;) { +r = flock(fd, LOCK_EX); +if (!r) +break; +if (errno != EINTR) { +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */ +LOGED(ERROR, domid, + "unexpected error while trying to lock %s, fd=%d, errno=%d", + lockfile, fd, errno); +return ERROR_FAIL; +} +} + +/* + * Get reaper_uid. If we can't find such a uid, return an error. + * + * FIXME: This means that domain destruction will fail if + * device_model_user is set but QEMU_USER_RANGE_BASE doesn't exist. + */ +
[Xen-devel] [PATCH v4 06/11] libxl: Move qmp cleanup into devicemodel destroy function
Removing the qmp connection is logically part of the device model destruction; having the caller destroy it is a mild layering violation. Move libxl__qmp_cleanup() into libxl__destroy_device_model(). This will make it easier when we make devicemodel destruction asynchronous. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 9 +++-- tools/libxl/libxl_domain.c | 2 -- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index d73bbb6b06..450433452d 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2698,12 +2698,17 @@ out: int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid) { +int rc; char *path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, ""); if (!xs_rm(CTX->xsh, XBT_NULL, path)) LOGD(ERROR, domid, "xs_rm failed for %s", path); /* We should try to destroy the device model anyway. */ -return kill_device_model(gc, -GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); +rc = kill_device_model(gc, + GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); + +libxl__qmp_cleanup(gc, domid); + +return rc; } /* Return 0 if no dm needed, 1 if needed and <0 if error. */ diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c index 3377bba994..d46b97dedf 100644 --- a/tools/libxl/libxl_domain.c +++ b/tools/libxl/libxl_domain.c @@ -1069,8 +1069,6 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) if (dm_present) { if (libxl__destroy_device_model(gc, domid) < 0) LOGD(ERROR, domid, "libxl__destroy_device_model failed"); - -libxl__qmp_cleanup(gc, domid); } dis->drs.ao = ao; dis->drs.domid = domid; -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 08/11] libxl: Kill QEMU by uid when possible
The privcmd fd that a dm_restrict'ed QEMU has gives it permission to one specific domain ID. This domain ID will probably eventually be used again. It is therefore necessary to make absolutely sure that a rogue QEMU process cannot hang around after its domain has exited. Killing QEMU by pid is insufficient in this situation, because QEMU may be able to fork() to escape killing. It is surprisingly tricky to kill a process which can call fork() without races; the only reliable way is to use kill(-1) to kill all processes with a given uid. We can use this method only when we're sure that there's only one QEMU instance per uid. Add a dm_uid into the domain_build_state struct, and set it in libxl__domain_get_device_model_uid() when it's safe to kill by UID. Store this in xenstore next to device-model-pid. On domain destroy, check to see if device-model-uid is present in xenstore. If so, fork off a reaper process, setuid to that uid, and do kill(-9) to kill all uids of that type. Otherwise, carry on destroying by pid. While we're here, make libxl__destroy_device_model() consistently: 1. Return an error when anything fails 2. But continue to do as much clean-up as possible NOTE that this is not yet completely safe: with ruid == dm_uid, the device model may be able to kill(-9) the 'reaper' process before the reaper process can kill it. Further patches will address this. Signed-off-by: George Dunlap --- v4: - Add missing goto out - Check that WEXITSTATUS() <= 125 before using it to set rc - Remove stray blank line v3: - rename dm_uid to dm_kill_uid to indicate that it's only used when it's suitable to be used for kiling. - rename xenstore node to device-model-kill-uid for the same reason - Add a comment explaining what dm_runas and dm_kill_uid are for. - Minor tweak to comment for clarification - Add an `rc = 0;` before an `out:` label - Make the accumulator macro generic and put in libxl-internal.h - Fix up child error code return - Account for undefined behavior on failure of libxl__xs_read_checked. v2: - Rebase on top of previous "goto out" refactoring - Rather than introducing a `uid` string, Introduce a boolean, "kill_by_uid"; and do the GCSPRINTF() once if that is set. - Fix typo "starting" - Always call kill_device_model_uid_cb(); only call libxl__qmp_cleanup() from there - Refactor libxl__destroy_device_model() to follow "goto out on error" pattern - Retain and report errors even when we continue trying to clean up - Report errors removing DM xenstore directory (except -ENOENT) - Report errors reading device-model-uid - Put "kill by uid" child logic in a separate function - Refactor "kill by uid" to follow "goto out on error" pattern - Change "kill by uid" to return libxl-style error, rather than errno - Document the intention of when to return errors - Assert that dm_uid != 0 - Log what the reaper process setresuid'd to CC: Ian Jackson CC: Wei Liu CC: Anthony Perard --- tools/libxl/libxl_dm.c | 214 +-- tools/libxl/libxl_internal.h | 16 ++- 2 files changed, 220 insertions(+), 10 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index ca59df33fe..f2a21cf744 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -129,6 +129,7 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, int rc; char *user; uid_t intended_uid = -1; +bool kill_by_uid; /* Only qemu-upstream can run as a different uid */ if (b_info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) @@ -141,7 +142,8 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, * - if rc is an error code, user and intended_uid are ignored. * - if rc is 0, user may be set or not set. * - if user is set, then intended_uid must be set to a UID matching - * the username `user`. This will be checked for root (0). + * the username `user`, and kill_by_uid must be set to the appropriate + * value. intended_uid will be checked for root (0). */ /* @@ -162,6 +164,7 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, } intended_uid = user_base->pw_uid; +kill_by_uid = true; rc = 0; goto out; } @@ -205,12 +208,15 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, LOGD(DEBUG, guest_domid, "using uid %ld", (long)intended_uid); user = GCSPRINTF("%ld:%ld", (long)intended_uid, (long)user_base->pw_gid); +kill_by_uid = true; rc = 0; goto out; } /* - * We couldn't find QEMU_USER_BASE_RANGE; look for QEMU_USER_SHARED + * We couldn't find QEMU_USER_BASE_RANGE; look for + * QEMU_USER_SHARED. NB for QEMU_USER_SHARED, all QEMU will run + * as the same UID, we can't kill by uid; therefore don't set uid. */ user = LIBXL_QEMU_USER_SHARED; rc = userlookup_helper_getpwnam(gc, user,
[Xen-devel] [PATCH] gic-vgic: skip irqs locking in gic_restore_pending_irqs()
From: Andrii Anisov This function is called under IRQs disabled already, so drop additional flags save and restore. Signed-off-by: Andrii Anisov --- This is a half of an RFC patch [1] which relies on the already existing code. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03293.html --- xen/arch/arm/gic-vgic.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c index 48922f5..34179c0 100644 --- a/xen/arch/arm/gic-vgic.c +++ b/xen/arch/arm/gic-vgic.c @@ -279,11 +279,10 @@ static void gic_restore_pending_irqs(struct vcpu *v) int lr = 0; struct pending_irq *p, *t, *p_r; struct list_head *inflight_r; -unsigned long flags; unsigned int nr_lrs = gic_get_nr_lrs(); int lrs = nr_lrs; -spin_lock_irqsave(>arch.vgic.lock, flags); +spin_lock(>arch.vgic.lock); if ( list_empty(>arch.vgic.lr_pending) ) goto out; @@ -327,7 +326,7 @@ found: } out: -spin_unlock_irqrestore(>arch.vgic.lock, flags); +spin_unlock(>arch.vgic.lock); } void gic_clear_pending_irqs(struct vcpu *v) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 11/11] dm_depriv: Mark `UID cleanup` as completed
George Dunlap writes ("[PATCH v3 11/11] dm_depriv: Mark `UID cleanup` as completed"): > Signed-off-by: George Dunlap Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V12 2/5] x86/p2m: refactor p2m_reset_altp2m()
> On Dec 20, 2018, at 5:38 PM, Razvan Cojocaru > wrote: > > Refactor p2m_reset_altp2m() so that it can be used to remove > redundant codepaths, fixing the locking while we're at it. > > The previous code now replaced by p2m_reset_altp2m(d, i, > ALTP2M_DEACTIVATE) calls did not set p2m->min_remapped_gfn > and p2m->max_remapped_gfn because in those cases the altp2m > idx was disabled; so before getting used again, > p2m_init_altp2m_ept() would get called, which resets them. > Always setting them in p2m_reset_altp2m(), while redundant, > is preferable to an extra conditional. > > Signed-off-by: Razvan Cojocaru > Tested-by: Tamas K Lengyel Reviewed-by: George Dunlap ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/11] libxl: Introduce specific username to be used as a reaper
George Dunlap writes ("[PATCH v3 10/11] libxl: Introduce specific username to be used as a reaper"): > Untrusted device models must be killed by uid rather than by pid for > safety. To do this reliably, we need another uid, not used for any > other purpose, from which to make the kill system call. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Live migrate with Linux >= 4.13 domU causes kernel time jumps and TCP connection stalls.
Hi, We've been tracking down a live migration bug during the last three days here at work, and here's what we found so far. 1. Xen version and dom0 linux kernel version don't matter. 2. DomU kernel is >= Linux 4.13. When using live migrate to another dom0, this often happens: [ 37.511305] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 37.513316] OOM killer disabled. [ 37.513323] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 37.514837] suspending xenstore... [ 37.515142] xen:grant_table: Grant tables using version 1 layout [18446744002.593711] OOM killer enabled. [18446744002.593726] Restarting tasks ... done. [18446744002.604527] Setting capacity to 6291456 As a side effect, all open TCP connections stall, because the timestamp counters of packets sent to the outside world are affected: https://syrinx.knorrie.org/~knorrie/tmp/tcp-stall.png "The problem seems to occur after the domU is resumed. The first packet (#90) has wrong timestamp value (far from the past), marked red in the image. Green is the normal sequence of timestamps from the server (domU), acknowledged by the client. Once client receives the packet from the past, it attempts re-sending everything from the start. As the timestamp never reaches normal value, the client goes crazy thinking that the server has not received anything, keeping the loop on. But they just exist in different ages." --- >8 --- Ad 1. We reproduced this on different kinds of HP dl360 g7/8/9 gear, both with Xen 4.11 / Linux 4.19.9 dom0 kernel and with Xen 4.4 / Linux 3.16 as dom0 kernel. Ad 2. This was narrowed down by just grabbing old debian kernel images from https://snapshot.debian.org/binary/?cat=l and trying them. OK linux-image-4.12.0-2-amd64_4.12.13-1_amd64.deb FAIL linux-image-4.13.0-rc5-amd64_4.13~rc5-1~exp1_amd64.deb FAIL linux-image-4.13.0-trunk-amd64_4.13.1-1~exp1_amd64.deb FAIL linux-image-4.13.0-1-amd64_4.13.4-1_amd64.deb FAIL linux-image-4.13.0-1-amd64_4.13.13-1_amd64.deb FAIL linux-image-4.14.0-3-amd64_4.14.17-1_amd64.deb FAIL linux-image-4.15.0-3-amd64_4.15.17-1_amd64.deb FAIL linux-image-4.16.0-2-amd64_4.16.16-2_amd64.deb FAIL ... everything up to 4.19.9 here So, there seems to be a change introduced in 4.13 that makes this behaviour appear. We didn't start compiling old kernels yet to be able to bisect it further. --- >8 --- For the rest of the info, I'm focussing on a test environment for reproduction, which is 4x identical HP DL360G7, named sirius, gamma, omega and flopsy. It's running the 4.11 packages from Debian, rebuilt for Stretch: 4.11.1~pre.20180911.5acdd26fdc+dfsg-5~bpo9+1 https://salsa.debian.org/xen-team/debian-xen/commits/stretch-backports Dom0 kernel is 4.19.9 from Debian, rebuilt for Stretch: https://salsa.debian.org/knorrie-guest/linux/commits/debian/4.19.9-1_mxbp9+1 xen_commandline : placeholder dom0_max_vcpus=1-4 dom0_mem=4G,max:4G com2=115200,8n1 console=com2,vga noreboot xpti=no-dom0,domu smt=off vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz stepping: 2 microcode : 0x1f cpu MHz : 3066.727 --- >8 --- There are some interesting additional patterns: 1. consistent success / failure paths. After rebooting all 4 physical servers, starting a domU with 4.19 kernel and then live migrating it, it might first time fail, or it might succeed. However, from the first time it fails, the specific direction of movement keeps showing the failure every single time this combination is used. Same goes for successful live migrate. E.g.: sirius -> flopsy OK sirius -> gamma OK flopsy -> gamma OK flopsy -> sirius OK gamma -> flopsy FAIL gamma -> sirius FAIL omega -> flopsy FAIL After rebooting all of the servers again, and restarting the whole test procedure, the combinations and results change, but are again consistent as soon as we start live migrating and seeing results. 2. TCP connections only hang when opened while "timestamp value in dmesg is low", followed with a "time is 18 gazillion" situation. When opening a TCP connection to the domU while it's at 18 gazillion seconds uptime, the TCP connection keeps working all the time after subsequent live migrations, even when it jumps up and down, following the OK and FAIL paths. 3. Since this is related to time and clocks, the last thing today we tried was, instead of using default settings, put "clocksource=tsc tsc=stable:socket" on the xen command line and "clocksource=tsc" on the domU linux kernel line. What we observed after doing this, is that the failure happens less often, but still happens. Everything else applies. --- >8 --- Additional question: It's 2018, should we have these "clocksource=tsc tsc=stable:socket" on Xen and "clocksource=tsc" anyways now, for Xen 4.11 and Linux 4.19 domUs? All our hardware has 'TscInvariant = true'. Related:
Re: [Xen-devel] [PATCH v3 09/11] libxl: Kill QEMU with "reaper" ruid
George Dunlap writes ("[PATCH v3 09/11] libxl: Kill QEMU with "reaper" ruid"): > Using kill(-1) to killing an untrusted dm process with the real uid > equal to the dm_uid isn't guaranteed to succeed: the process in > question may be able to kill the reaper process after the setresuid() > and before the kill(). ... > +/* > + * Look up "reaper UID". If present and non-root, returns 0 and sets > + * reaper_uid. Otherwise returns libxl-style error. > + */ Might be worth mentioning that this function expects to be called in a subprocess, for a variety of reasons including because it takes a lock. Nevertheless, Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/11] libxl: Kill QEMU by uid when possible
George Dunlap writes ("[PATCH v3 08/11] libxl: Kill QEMU by uid when possible"): > The privcmd fd that a dm_restrict'ed QEMU has gives it permission to > one specific domain ID. This domain ID will probably eventually be > used again. It is therefore necessary to make absolutely sure that a > rogue QEMU process cannot hang around after its domain has exited. ... > +static int kill_device_model_uid_child(libxl__destroy_devicemodel_state > *ddms, > +/* > + * And kill everyone but me. > + * > + * NB that it's not clear from either POSIX or the Linux man page > + * that ESRCH would be returned with a pid value of -1, but it > + * doesn't hurt to check. > + */ > +r = kill(-1, 9); > +if (r && errno != ESRCH) { > +LOGED(ERROR, domid, "kill(-1,9)"); > +rc = ERROR_FAIL; > +} Missing `goto out', there. > + > +rc = 0; > + > +out: > +return rc; > +} > +static void kill_device_model_uid_cb(libxl__egc *egc, > +libxl__ev_child *destroyer, > +pid_t pid, int status) > +{ > +libxl__destroy_devicemodel_state *ddms = CONTAINER_OF(destroyer, *ddms, > destroyer); > +STATE_AO_GC(ddms->ao); > + > +if (status) { > +int rc = ERROR_FAIL; > + > +if (WIFEXITED(status)) > +rc = -WEXITSTATUS(status); But WEXITSTATUS might be something weird. See my mail about possible system-invented exit statuses. I suggest you tolerate only the status values you intend to generate, 1..125. (And 0 for success.) > +/* > + * A macro to help retain the first failure in "do as much as you can" > + * situations. Note the hard-coded use of the variable name `rc`. > + */ > +#define ACCUMULATE_RC(rc_acc) ((rc_acc) = (rc_acc) ?: rc) > + Good, although you have trailing whitespace in the new blank line. Everything else LGTM. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V12 1/5] x86/p2m: allocate logdirty_ranges for altp2ms
> On Dec 20, 2018, at 5:38 PM, Razvan Cojocaru > wrote: > > For now, only do allocation/deallocation; keeping them in sync > will be done in subsequent patches. > > Logdirty synchronization will only be done for active altp2ms; > so allocate logdirty rangesets (copying the host logdirty > rangeset) when an altp2m is activated, and free it when > deactivated. > > Signed-off-by: Razvan Cojocaru > Tested-by: Tamas K Lengyel Reviewed-by: George Dunlap ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 7/8] xenalyze: Build for Both ARM and x86 Platforms
On Fri, 21 Dec 2018, Julien Grall wrote: > Hi, > > On 21/12/2018 16:26, Julien Grall wrote: > > From: Benjamin Sanda > > > > Modified to provide building of the xenalyze binary for both ARM and > > x86 platforms. The xenalyze binary is now built as part of the BIN > > list for both platforms. > > > > Signed-off-by: Benjamin Sanda > > Signed-off-by: Julien Grall > > Acked-by: Wei Liu > > Acked-by: Stefano Stabellini > > I made a typo in the address e-mail. Stefano, could you fix it on commit? Yes, fixed on commit. First 7 patches have been committed. > > --- > > Changes in v3: > > - Add Stefano's acked-by > > > > Changes in v2: > > - Add Wei's acked-by > > --- > > tools/xentrace/Makefile | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/tools/xentrace/Makefile b/tools/xentrace/Makefile > > index 0bad942bdf..9fb7fc96e7 100644 > > --- a/tools/xentrace/Makefile > > +++ b/tools/xentrace/Makefile > > @@ -9,8 +9,7 @@ LDLIBS += $(LDLIBS_libxenevtchn) > > LDLIBS += $(LDLIBS_libxenctrl) > > LDLIBS += $(ARGP_LDFLAGS) > > -BIN-$(CONFIG_X86) = xenalyze > > -BIN = $(BIN-y) > > +BIN = xenalyze > > SBIN = xentrace xentrace_setsize > > LIBBIN = xenctx > > SCRIPTS = xentrace_format > > > > -- > Julien Grall > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 05/11] libxl: Do root checks once in libxl__domain_get_device_model_uid
George Dunlap writes ("[PATCH v3 05/11] libxl: Do root checks once in libxl__domain_get_device_model_uid"): > At the moment, we check for equivalence to literal "root" before > deciding whether to add the `runas` command-line option to QEMU. This > is unsatisfactory for several reasons. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] gic: drop interrupts enabling on interrupts processing
From: Andrii Anisov This reduces the number of context switches in case we have coming guest interrupts from different sources at a high rate. What is likely for multimedia use-cases. Having irqs unlocked here makes us go through trap path again in case we have a new guest interrupt arrived (even with the same priority, after `desc->handler->end(desc)` in `do_IRQ()`), what is just a processor cycles wasting. We will catch them all in the `gic_interrupt() function loop anyway. And the guest irqs arrival prioritization is meaningless here, it is only effective at guest's level. Signed-off-by: Andrii Anisov --- This is the patch [1] from RFC series, with extended commit message. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03286.html --- xen/arch/arm/gic.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 6cc7dec..9f5cd95 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -386,10 +386,8 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq) if ( likely(irq >= 16 && irq < 1020) ) { -local_irq_enable(); isb(); do_IRQ(regs, irq, is_fiq); -local_irq_disable(); } else if ( is_lpi(irq) ) { -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 4/8] xen/arm: Make get_page_from_gfn working with DOMID_XEN
On 21.12.18 19:27, Julien Grall wrote: Is it a wildcard tag? :) No, just a wrong copy-paste. Must be: Reviewed-by: Andrii Anisov ;) -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 3/8] xen/arm: Add support for read-only foreign mappings
Reviewed-by: Andrii Anisov -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 3/8] xen/arm: Add support for read-only foreign mappings
Hi Andrii, On 21/12/2018 17:27, Andrii Anisov wrote: On 21.12.18 18:26, Julien Grall wrote: Currently, foreign mappings can only be read-write. A follow-up patch will extend foreign mapping for Xen backend memory (via XEN_DOMID), some of that memory should only be read accessible for the mapping domain. Introduce a new p2m_type to cater read-only foreign mappings. For now, the decision between the two foreign mapping type is based on the type of the guest page mapped. Signed-off-by: Julien Grall Andrii Anisov Another wildcard tag? :) Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 4/8] xen/arm: Make get_page_from_gfn working with DOMID_XEN
Hi Andrii, On 21/12/2018 17:26, Andrii Anisov wrote: On 21.12.18 18:26, Julien Grall wrote: DOMID_XEN is used to share pages beloging to the hypervisor (e.g trace buffers). Unlike other domains, DOMID_XEN is a non-auto translated domain and therefore does not have a P2M. This patch adds a special case for DOMID_XEN in get_page_from_gfn. We may want to provide "non-auto translated helpers" in the future if we see more case. Signed-off-by: Julien Grall Andrii Anisov Is it a wildcard tag? :) Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 3/8] xen/arm: Add support for read-only foreign mappings
On 21.12.18 18:26, Julien Grall wrote: Currently, foreign mappings can only be read-write. A follow-up patch will extend foreign mapping for Xen backend memory (via XEN_DOMID), some of that memory should only be read accessible for the mapping domain. Introduce a new p2m_type to cater read-only foreign mappings. For now, the decision between the two foreign mapping type is based on the type of the guest page mapped. Signed-off-by: Julien Grall Andrii Anisov -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 4/8] xen/arm: Make get_page_from_gfn working with DOMID_XEN
On 21.12.18 18:26, Julien Grall wrote: DOMID_XEN is used to share pages beloging to the hypervisor (e.g trace buffers). Unlike other domains, DOMID_XEN is a non-auto translated domain and therefore does not have a P2M. This patch adds a special case for DOMID_XEN in get_page_from_gfn. We may want to provide "non-auto translated helpers" in the future if we see more case. Signed-off-by: Julien Grall Andrii Anisov -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 0/2] misc safety certification fixes
On Fri, 21 Dec 2018, Julien Grall wrote: > Hi, > > On 21/12/2018 09:27, Jan Beulich wrote: > > > > > On 20.12.18 at 18:26, wrote: > > > On 11/12/18 11:06 PM, Stefano Stabellini wrote: > > > Discussing with Stefano today, he is aiming to get this series for Xen > > > 4.12. I will be away until the x86/common code freeze. > > > > > > I agree with him that I will waive my ack if it gets reviewed by any > > > committers. > > > > Well, discussion on patch 2 was abandoned rather than finished > > afaict, which means Stefano either lost interest or is meaning to > > submit v5 with the comments addressed. > > He is planning to send a new version while I am on holidays. My request > applies for any new version of this providing it gets reviewed by any > committers. That's right: I haven't had the bandwidth to continue the discussion, but I intend to resume work on it on the 1st week of Jan. The idea is that even if Julien will be on holidays, if I manage to make enough progress and get your reviewed-by the series could go in anyway before the freeze, if that's OK for you. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 01/11] libxl: Move dm user determination logic into a helper function
To reliably kill an untrusted devicemodel, we need to know not only its pid, but its uid. In preparation for this, move the userid determination logic into a helper function. Create a new field, `dm_runas`, in libxl__domain_build_state to store the value during domain creation. This change also removes unnecessary duplication of the argument construction code. While here, clean up some minor CODING_STYLE infractions (space between * and variable name). No functional change intended. While here, delete some trailing whitespace. Signed-off-by: George Dunlap Acked-by: Wei Liu Acked-by: Ian Jackson --- v2: - Remove unnecessary space between * and dm_runas - Additional code clean-up - Delete trailing whitespace CC: Ian Jackson CC: Wei Liu CC: Anthony Perard --- tools/libxl/libxl_dm.c | 260 +++ tools/libxl/libxl_internal.h | 22 +-- 2 files changed, 151 insertions(+), 131 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 5698fe8af3..bbcbc94b6c 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -65,6 +65,131 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name) return logfile_w; } +/* + * userlookup_helper_getpwnam(libxl__gc*, const char *user, + * struct passwd **pwd_r); + * + * userlookup_helper_getpwuid(libxl__gc*, uid_t uid, + * struct passwd **pwd_r); + * + * returns 1 if the user was found, 0 if it was not, -1 on error + */ +#define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \ +static int userlookup_helper_##NAME(libxl__gc *gc, \ +SPEC_TYPE spec, \ +struct STRUCTNAME *resultbuf, \ +struct STRUCTNAME **out)\ +{ \ +struct STRUCTNAME *resultp = NULL; \ +char *buf = NULL; \ +long buf_size; \ +int ret;\ +\ +buf_size = sysconf(SYSCONF);\ +if (buf_size < 0) { \ +buf_size = 2048;\ +LOG(DEBUG, \ +"sysconf failed, setting the initial buffer size to %ld", \ +buf_size); \ +} \ +\ +while (1) { \ +buf = libxl__realloc(gc, buf, buf_size);\ +ret = NAME##_r(spec, resultbuf, buf, buf_size, ); \ +if (ret == ERANGE) {\ +buf_size += 128;\ +continue; \ +} \ +if (ret != 0) \ +return ERROR_FAIL; \ +if (resultp != NULL) { \ +if (out) *out = resultp;\ +return 1; \ +} \ +return 0; \ +} \ +} + +DEFINE_USERLOOKUP_HELPER(getpwnam, const char*, passwd, _SC_GETPW_R_SIZE_MAX); +DEFINE_USERLOOKUP_HELPER(getpwuid, uid_t, passwd, _SC_GETPW_R_SIZE_MAX); + +static int libxl__domain_get_device_model_uid(libxl__gc *gc, + libxl__dm_spawn_state *dmss) +{ +int guest_domid = dmss->guest_domid; +libxl__domain_build_state *const state = dmss->build_state; +const libxl_domain_build_info *b_info = >guest_config->b_info; + +struct passwd *user_base, user_pwbuf; +int ret; +char *user; + +/* Only qemu-upstream can run as a different uid */ +if (b_info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) +return 0; + +user = b_info->device_model_user; +if (user) +goto end_search; + +if (!libxl_defbool_val(b_info->dm_restrict)) { +LOGD(DEBUG, guest_domid, + "dm_restrict disabled,
[Xen-devel] [PATCH v3 10/11] libxl: Introduce specific username to be used as a reaper
Untrusted device models must be killed by uid rather than by pid for safety. To do this reliably, we need another uid, not used for any other purpose, from which to make the kill system call. When using xen-qemuuser-range-base, we can repurpose xen-qemuuser-range-base itself as a UID from which to kill other devicemodel uids (since domain ID 0 should never have a device model associated with it). However, we'd like people to be able to use the device_model_user feature without also defining xen-qemuuser-range-base (which requires the ability to 'reserve' 32k+ user IDs). To that end, introduce the xen-qemuuser-reaper id. When killing by UID, first look for and use that ID if available; then fall back to xen-qemuuser-range-base. Document the new call in docs/features/qemu-deprivilege.pandoc. Signed-off-by: George Dunlap --- v3: - New in this version CC: Ian Jackson CC: Wei Liu CC: Anthony Perard --- docs/features/qemu-deprivilege.pandoc | 9 ++ tools/libxl/libxl_dm.c| 40 +-- tools/libxl/libxl_internal.h | 1 + 3 files changed, 35 insertions(+), 15 deletions(-) diff --git a/docs/features/qemu-deprivilege.pandoc b/docs/features/qemu-deprivilege.pandoc index ce21a60ef7..eb05981a83 100644 --- a/docs/features/qemu-deprivilege.pandoc +++ b/docs/features/qemu-deprivilege.pandoc @@ -77,12 +77,21 @@ And then in your config file, the following line: device_model_user="xen-qemuuser-c6-01" +If you use this method, you should also allocate one "reaper" user to +be used for killing device models: + +adduser --system --no-create-home --group xen-qemuuser-reaper + NOTE: It is important when using `device_model_user` that EACH VM HAVE A SEPARATE UID, and that none of these UIDs map to root. xl will throw an error a uid maps to zero, but not if multiple VMs have the same uid. Multiple VMs with the same device model uid will cause problems. +It is also important that `xen-qemuuser-reaper` not have any processes +associated with it, as they will be destroyed when deprivileged qemu +processes are destroyed. + ## Domain config changes The core domain config change is to add the following line to the diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index f7c4e5eb3b..731d7f3c2c 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -268,24 +268,37 @@ static int libxl__get_reaper_uid(libxl__gc *gc, uid_t *reaper_uid) struct passwd *user_base, user_pwbuf; int rc; +rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_REAPER, + _pwbuf, _base); +/* + * Either there was an error, or we found a suitable user; stop + * looking + */ +if (rc || user_base) +goto out; + rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); -if (rc) -return rc; +if (rc || user_base) +goto out; -if (!user_base) { -LOG(WARN, "Couldn't find uid for reaper process"); -return ERROR_INVAL; -} - -if (user_base->pw_uid == 0) { -LOG(ERROR, "UID for reaper process maps to root!"); -return ERROR_INVAL; +LOG(WARN, "Couldn't find uid for reaper process"); +rc = ERROR_INVAL; + + out: +/* First check to see if the discovered user maps to root */ +if (!rc) { +if (user_base->pw_uid == 0) { +LOG(ERROR, "UID for reaper process maps to root!"); +rc = ERROR_INVAL; +} } -*reaper_uid = user_base->pw_uid; +/* If everything is OK, set reaper_uid as appropriate */ +if (!rc) +*reaper_uid = user_base->pw_uid; -return 0; +return rc; } const char *libxl__domain_device_model(libxl__gc *gc, @@ -2904,9 +2917,6 @@ static int get_reaper_lock_and_uid(libxl__destroy_devicemodel_state *ddms, /* * Get reaper_uid. If we can't find such a uid, return an error. - * - * FIXME: This means that domain destruction will fail if - * device_model_user is set but QEMU_USER_RANGE_BASE doesn't exist. */ return libxl__get_reaper_uid(gc, reaper_uid); } diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index e3619daf3a..f588e63599 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -4419,6 +4419,7 @@ _hidden int libxl__read_sysfs_file_contents(libxl__gc *gc, #define LIBXL_QEMU_USER_PREFIX "xen-qemuuser" #define LIBXL_QEMU_USER_SHARED LIBXL_QEMU_USER_PREFIX"-shared" #define LIBXL_QEMU_USER_RANGE_BASE LIBXL_QEMU_USER_PREFIX"-range-base" +#define LIBXL_QEMU_USER_REAPER LIBXL_QEMU_USER_PREFIX"-reaper" static inline bool libxl__acpi_defbool_val(const libxl_domain_build_info *b_info) { -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 03/11] libxl: Clean up userlookup_helper_getpw* helper
Bring conventions more in line with libxl__xs_read_checked(): - If found, return 0 and set pointer to non-NULL - If not found, return 0 and set pointer to NULL - On error, return libxl-style error number. Update documentation to match. Use CODING_STYLE compliant `r` rather than `ret`. On error, log the error code before returning instead of discarding it. Now that it only returns 0 or errno, update caller error checks to be `if (ret)` rather than `if (ret < 0)`. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v3: - Used more idiomatic `if (ret)` rather than `if (ret < 0)` CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 6024d4b7b8..67204b94c2 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -72,7 +72,13 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name) * userlookup_helper_getpwuid(libxl__gc*, uid_t uid, * struct passwd **pwd_r); * - * returns 1 if the user was found, 0 if it was not, -1 on error + * If the user is found, return 0 and set *pwd_r to the appropriat + * value. + * + * If the user is not found but there are no errors, return 0 + * and set *pwd_r to NULL. + * + * On error, return a libxl-style error code. */ #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \ static int userlookup_helper_##NAME(libxl__gc *gc, \ @@ -83,7 +89,7 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name) struct STRUCTNAME *resultp = NULL; \ char *buf = NULL; \ long buf_size; \ -int ret;\ +int r; \ \ buf_size = sysconf(SYSCONF);\ if (buf_size < 0) { \ @@ -95,17 +101,16 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name) \ while (1) { \ buf = libxl__realloc(gc, buf, buf_size);\ -ret = NAME##_r(spec, resultbuf, buf, buf_size, ); \ -if (ret == ERANGE) {\ +r = NAME##_r(spec, resultbuf, buf, buf_size, ); \ +if (r == ERANGE) { \ buf_size += 128;\ continue; \ } \ -if (ret != 0) \ +if (r != 0) { \ +LOGEV(ERROR, r, "Looking up username/uid with " #NAME); \ return ERROR_FAIL; \ -if (resultp != NULL) { \ -if (out) *out = resultp;\ -return 1; \ } \ +*out = resultp; \ return 0; \ } \ } @@ -140,16 +145,16 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, ret = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); -if (ret < 0) +if (ret) return ret; -if (ret > 0) { +if (user_base) { struct passwd *user_clash, user_clash_pwbuf; uid_t intended_uid = user_base->pw_uid + guest_domid; ret = userlookup_helper_getpwuid(gc, intended_uid, _clash_pwbuf, _clash); -if (ret < 0) +if (ret) return ret; -if (ret > 0) { +if (user_clash) { LOGD(ERROR, guest_domid, "wanted to use uid %ld (%s + %d) but that is user %s !", (long)intended_uid, LIBXL_QEMU_USER_RANGE_BASE, @@ -163,10 +168,10 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, } user = LIBXL_QEMU_USER_SHARED; -ret = userlookup_helper_getpwnam(gc, user, _pwbuf, 0); -if (ret < 0) +ret = userlookup_helper_getpwnam(gc, user, _pwbuf,
[Xen-devel] [PATCH v3 05/11] libxl: Do root checks once in libxl__domain_get_device_model_uid
At the moment, we check for equivalence to literal "root" before deciding whether to add the `runas` command-line option to QEMU. This is unsatisfactory for several reasons. First, just because the string doesn't match "root" doesn't mean the final uid won't end up being zero; in particular, the range_base calculations may end up producing "0:NNN", which would be root in any case. Secondly, it's almost certainly a configuration error if the resulting uid ends up to be zero; rather than silently do what was specified but probably not intended, throw an error. To fix this, check for root once in libxl__domain_get_device_model_uid. If the result is root, return an error; if appropriate, set `runas`. After that, assume that the presence of state->dm_runas implies that a `runas` argument should be constructed. One side effect of this is to check whether device_model_user exists before passing it to qemu, resulting in better error reporting. While we're here: - Refactor the function to use the "goto out" idiom - Use 'rc' rather than 'ret', in line with CODING_STYLE Signed-off-by: George Dunlap --- Let the record show that I personally think the `goto out` pattern here would be improved by having a "root check" label. v3: - Update to use more idiomatic `if (rc)` - Make "inputs" to goto more clear - Initialize intended_uid to -1 sentinel - Document expectations when jumping to 'out' - Don't return directly after non-trivial initial checks - Always explicitly set rc to 0 when jumping to out, even if we just checked that it was zero a few lines earlier - Work around one-goto limitation by having multiple 'rc' checks. - Return generic ERROR_INVAL rather than in accurate ERROR_DEVICE_EXISTS v2: - Refactor to use `out` rather than multiple labels - Only check for root once - Use 'out' rather than direct returns for errors (only use direct returns for early `succeed-without-setting-runas` paths) - Use `rc` rather than `ret` to more closely align with CODING_STYLE - Fill out comments about the cases we're handling - Return ERROR_DEVICE_EXISTS rather than ERROR_FAIL if there's another username that maps to our calculated uid - Report an error if the specified device_model_user doesn't exist CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 107 - 1 file changed, 85 insertions(+), 22 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 67204b94c2..d73bbb6b06 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -126,65 +126,128 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, const libxl_domain_build_info *b_info = >guest_config->b_info; struct passwd *user_base, user_pwbuf; -int ret; +int rc; char *user; +uid_t intended_uid = -1; /* Only qemu-upstream can run as a different uid */ if (b_info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) return 0; +/* + * From this point onward, all paths should go through the `out` + * label. The invariants should be: + * - rc may be 0, or an error code. + * - if rc is an error code, user and intended_uid are ignored. + * - if rc is 0, user may be set or not set. + * - if user is set, then intended_uid must be set to a UID matching + * the username `user`. This will be checked for root (0). + */ + +/* + * If device_model_user is present, set `-runas` even if + * dm_restrict isn't in use + */ user = b_info->device_model_user; -if (user) -goto end_search; +if (user) { +rc = userlookup_helper_getpwnam(gc, user, _pwbuf, _base); +if (rc) +goto out; + +if (!user_base) { +LOGD(ERROR, guest_domid, "Couldn't find device_model_user %s", + user); +rc = ERROR_INVAL; +goto out; +} + +intended_uid = user_base->pw_uid; +rc = 0; +goto out; +} +/* + * If dm_restrict isn't set, and we don't have a specified user, don't + * bother setting a `-runas` parameter. + */ if (!libxl_defbool_val(b_info->dm_restrict)) { LOGD(DEBUG, guest_domid, "dm_restrict disabled, starting QEMU as root"); -return 0; +user = NULL; /* Should already be null, but just in case */ +goto out; } -ret = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, +/* + * dm_restrict is set, but device_model_user isn't set; look for + * QEMU_USER_BASE_RANGE + */ +rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); -if (ret) -return ret; +if (rc) +goto out; if (user_base) { struct passwd *user_clash, user_clash_pwbuf; -uid_t intended_uid = user_base->pw_uid + guest_domid; -ret = userlookup_helper_getpwuid(gc, intended_uid, + +
[Xen-devel] [PATCH v3 11/11] dm_depriv: Mark `UID cleanup` as completed
Signed-off-by: George Dunlap --- CC: Ian Jackson CC: Wei Liu --- docs/designs/qemu-deprivilege.md | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/designs/qemu-deprivilege.md b/docs/designs/qemu-deprivilege.md index f7444a434d..81a5f5c05d 100644 --- a/docs/designs/qemu-deprivilege.md +++ b/docs/designs/qemu-deprivilege.md @@ -128,26 +128,6 @@ are specified; this does not apply to QEMU running as a Xen DM. '''Tested''': Not tested -# Restrictions / improvements still to do - -This lists potential restrictions still to do. It is meant to be -listed in order of ease of implementation, with low-hanging fruit -first. - -### Further RLIMITs - -RLIMIT_AS limits the total amount of memory; but this includes the -virtual memory which QEMU uses as a mapcache. xen-mapcache.c already -fiddles with this; it would be straightforward to make it *set* the -rlimit to what it thinks a sensible limit is. - -RLIMIT_NPROC limits total number of processes or threads. QEMU uses -threads for some devices, so this would require some thought. - -Other things that would take some cleverness / changes to QEMU to -utilize due to ordering constrants: - - RLIMIT_NOFILES (after all necessary files are opened) - ### libxl UID cleanup '''Description''': Domain IDs are reused, and thus restricted UIDs are @@ -223,6 +203,26 @@ Since this will kill all other `reaper_uid` processes as well, we must either allocate a separate `reaper_uid` per domain, or use locking to ensure that only one killing process is active at a time. +# Restrictions / improvements still to do + +This lists potential restrictions still to do. It is meant to be +listed in order of ease of implementation, with low-hanging fruit +first. + +### Further RLIMITs + +RLIMIT_AS limits the total amount of memory; but this includes the +virtual memory which QEMU uses as a mapcache. xen-mapcache.c already +fiddles with this; it would be straightforward to make it *set* the +rlimit to what it thinks a sensible limit is. + +RLIMIT_NPROC limits total number of processes or threads. QEMU uses +threads for some devices, so this would require some thought. + +Other things that would take some cleverness / changes to QEMU to +utilize due to ordering constrants: + - RLIMIT_NOFILES (after all necessary files are opened) + ## libxl: Treat QMP connection as untrusted '''Description''': Currently libxl talks with QEMU via QMP; but its -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 06/11] libxl: Move qmp cleanup into devicemodel destroy function
Removing the qmp connection is logically part of the device model destruction; having the caller destroy it is a mild layering violation. Move libxl__qmp_cleanup() into libxl__destroy_device_model(). This will make it easier when we make devicemodel destruction asynchronous. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 9 +++-- tools/libxl/libxl_domain.c | 2 -- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index d73bbb6b06..450433452d 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2698,12 +2698,17 @@ out: int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid) { +int rc; char *path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, ""); if (!xs_rm(CTX->xsh, XBT_NULL, path)) LOGD(ERROR, domid, "xs_rm failed for %s", path); /* We should try to destroy the device model anyway. */ -return kill_device_model(gc, -GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); +rc = kill_device_model(gc, + GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); + +libxl__qmp_cleanup(gc, domid); + +return rc; } /* Return 0 if no dm needed, 1 if needed and <0 if error. */ diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c index 3377bba994..d46b97dedf 100644 --- a/tools/libxl/libxl_domain.c +++ b/tools/libxl/libxl_domain.c @@ -1069,8 +1069,6 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) if (dm_present) { if (libxl__destroy_device_model(gc, domid) < 0) LOGD(ERROR, domid, "libxl__destroy_device_model failed"); - -libxl__qmp_cleanup(gc, domid); } dis->drs.ao = ao; dis->drs.domid = domid; -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 08/11] libxl: Kill QEMU by uid when possible
The privcmd fd that a dm_restrict'ed QEMU has gives it permission to one specific domain ID. This domain ID will probably eventually be used again. It is therefore necessary to make absolutely sure that a rogue QEMU process cannot hang around after its domain has exited. Killing QEMU by pid is insufficient in this situation, because QEMU may be able to fork() to escape killing. It is surprisingly tricky to kill a process which can call fork() without races; the only reliable way is to use kill(-1) to kill all processes with a given uid. We can use this method only when we're sure that there's only one QEMU instance per uid. Add a dm_uid into the domain_build_state struct, and set it in libxl__domain_get_device_model_uid() when it's safe to kill by UID. Store this in xenstore next to device-model-pid. On domain destroy, check to see if device-model-uid is present in xenstore. If so, fork off a reaper process, setuid to that uid, and do kill(-9) to kill all uids of that type. Otherwise, carry on destroying by pid. While we're here, make libxl__destroy_device_model() consistently: 1. Return an error when anything fails 2. But continue to do as much clean-up as possible NOTE that this is not yet completely safe: with ruid == dm_uid, the device model may be able to kill(-9) the 'reaper' process before the reaper process can kill it. Further patches will address this. Signed-off-by: George Dunlap --- v3: - rename dm_uid to dm_kill_uid to indicate that it's only used when it's suitable to be used for kiling. - rename xenstore node to device-model-kill-uid for the same reason - Add a comment explaining what dm_runas and dm_kill_uid are for. - Minor tweak to comment for clarification - Add an `rc = 0;` before an `out:` label - Make the accumulator macro generic and put in libxl-internal.h - Fix up child error code return - Account for undefined behavior on failure of libxl__xs_read_checked. v2: - Rebase on top of previous "goto out" refactoring - Rather than introducing a `uid` string, Introduce a boolean, "kill_by_uid"; and do the GCSPRINTF() once if that is set. - Fix typo "starting" - Always call kill_device_model_uid_cb(); only call libxl__qmp_cleanup() from there - Refactor libxl__destroy_device_model() to follow "goto out on error" pattern - Retain and report errors even when we continue trying to clean up - Report errors removing DM xenstore directory (except -ENOENT) - Report errors reading device-model-uid - Put "kill by uid" child logic in a separate function - Refactor "kill by uid" to follow "goto out on error" pattern - Change "kill by uid" to return libxl-style error, rather than errno - Document the intention of when to return errors - Assert that dm_uid != 0 - Log what the reaper process setresuid'd to CC: Ian Jackson CC: Wei Liu CC: Anthony Perard --- tools/libxl/libxl_dm.c | 213 +-- tools/libxl/libxl_internal.h | 16 ++- 2 files changed, 219 insertions(+), 10 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index ca59df33fe..d265f24864 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -129,6 +129,7 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, int rc; char *user; uid_t intended_uid = -1; +bool kill_by_uid; /* Only qemu-upstream can run as a different uid */ if (b_info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) @@ -141,7 +142,8 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, * - if rc is an error code, user and intended_uid are ignored. * - if rc is 0, user may be set or not set. * - if user is set, then intended_uid must be set to a UID matching - * the username `user`. This will be checked for root (0). + * the username `user`, and kill_by_uid must be set to the appropriate + * value. intended_uid will be checked for root (0). */ /* @@ -162,6 +164,7 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, } intended_uid = user_base->pw_uid; +kill_by_uid = true; rc = 0; goto out; } @@ -205,12 +208,15 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, LOGD(DEBUG, guest_domid, "using uid %ld", (long)intended_uid); user = GCSPRINTF("%ld:%ld", (long)intended_uid, (long)user_base->pw_gid); +kill_by_uid = true; rc = 0; goto out; } /* - * We couldn't find QEMU_USER_BASE_RANGE; look for QEMU_USER_SHARED + * We couldn't find QEMU_USER_BASE_RANGE; look for + * QEMU_USER_SHARED. NB for QEMU_USER_SHARED, all QEMU will run + * as the same UID, we can't kill by uid; therefore don't set uid. */ user = LIBXL_QEMU_USER_SHARED; rc = userlookup_helper_getpwnam(gc, user, _pwbuf, _base); @@ -220,6 +226,7 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc,
[Xen-devel] [PATCH v3 09/11] libxl: Kill QEMU with "reaper" ruid
Using kill(-1) to killing an untrusted dm process with the real uid equal to the dm_uid isn't guaranteed to succeed: the process in question may be able to kill the reaper process after the setresuid() and before the kill(). Instead, set the real uid to the QEMU user for domain 0 (QEMU_USER_RANGE_BASE + 0). The reaper process will still be able to kill the dm process, but not vice versa. This, in turn, requires locking to make sure that only one reaper process is using that uid at a time; otherwise one reaper process may kill the other reaper process. Create a lockfile in RUNDIR/dm-reaper-lock, and grab the lock before executing kill. In the event that we can't get the lock for some reason, go ahead with the kill using dm_uid for both real and effective UIDs. This isn't guaranteed to work, but it's no worse than not trying to kill the process at all. NB that this effectively requires admins using device_model_user to also define xen_qemuuser_range_base; this will be addressed in subsequent patches. Signed-off-by: George Dunlap --- v3: - Have libxl__get_reaper_uid return an error if a suitable uid is not found. - Explicitly set reaper_uid to dm_uid if get_reaper_lock_and_uid() returns an error, rather than relying on reaper_uid being unchanged on failure. - Open the lockfile 0644 rather than 0666. - Remove bogus comment. v2: - Port over previous changes - libxl__get_reaper_uid() won't set errno, use LOG rather than LOGE. - Accumulate error and return for all failures - Move flock() outside of the condition. Also fix EINTR check (check errno rather than return value). - Add a comment explaining why we return an error even if the kill() succeeds - Move locking to a separate function to minimize gotos - Refactor libxl__get_reaper_id to take a pointer for reaper_uid; return only success/failure. Also return EINVAL if reaper_uid would resolve to 0. - Handle "reaper_uid not found" specially; note issue with device_model_user feature - Assert that final reaper_uid != 0 CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 118 + 1 file changed, 108 insertions(+), 10 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index d265f24864..f7c4e5eb3b 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -259,6 +259,35 @@ out: return rc; } +/* + * Look up "reaper UID". If present and non-root, returns 0 and sets + * reaper_uid. Otherwise returns libxl-style error. + */ +static int libxl__get_reaper_uid(libxl__gc *gc, uid_t *reaper_uid) +{ +struct passwd *user_base, user_pwbuf; +int rc; + +rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, + _pwbuf, _base); +if (rc) +return rc; + +if (!user_base) { +LOG(WARN, "Couldn't find uid for reaper process"); +return ERROR_INVAL; +} + +if (user_base->pw_uid == 0) { +LOG(ERROR, "UID for reaper process maps to root!"); +return ERROR_INVAL; +} + +*reaper_uid = user_base->pw_uid; + +return 0; +} + const char *libxl__domain_device_model(libxl__gc *gc, const libxl_domain_build_info *info) { @@ -2834,37 +2863,106 @@ out: return; } +static int get_reaper_lock_and_uid(libxl__destroy_devicemodel_state *ddms, + uid_t *reaper_uid) +{ +STATE_AO_GC(ddms->ao); +int domid = ddms->domid; +int r; +const char * lockfile; +int fd; + +/* Try to lock the "reaper uid" */ +lockfile = GCSPRINTF("%s/dm-reaper-lock", libxl__run_dir_path()); + +/* + * NB that since we've just forked, we can't have any + * threads; so we don't need the libxl__carefd + * infrastructure here. + */ +fd = open(lockfile, O_RDWR|O_CREAT, 0644); +if (fd < 0) { +LOGED(ERROR, domid, + "unexpected error while trying to open lockfile %s, errno=%d", + lockfile, errno); +return ERROR_FAIL; +} + +/* Try to lock the file, retrying on EINTR */ +for (;;) { +r = flock(fd, LOCK_EX); +if (!r) +break; +if (errno != EINTR) { +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */ +LOGED(ERROR, domid, + "unexpected error while trying to lock %s, fd=%d, errno=%d", + lockfile, fd, errno); +return ERROR_FAIL; +} +} + +/* + * Get reaper_uid. If we can't find such a uid, return an error. + * + * FIXME: This means that domain destruction will fail if + * device_model_user is set but QEMU_USER_RANGE_BASE doesn't exist. + */ +return libxl__get_reaper_uid(gc, reaper_uid); +} + + /* * Destroy all processes of the given uid by setresuid to the * specified uid and kill(-1). NB this MUST BE CALLED FROM A SEPARATE - * PROCESS from the normal libxl process. Returns
[Xen-devel] [PATCH v3 02/11] libxl: Get rid of support for QEMU_USER_BASE (xen-qemuuser-domidNN)
QEMU_USER_BASE allows a user to specify the UID to use when running the devicemodel for a specific domain number. Unfortunately, this is not really practical: It requires nearly 32,000 entries in /etc/passwd. QEMU_USER_RANGE_BASE is much more practical. Remove support for QEMU_USER_BASE. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- NB that I've chosen not to update the xl.cfg man page at this time; it needs a lot of other updates as well, which would be easier to do all at once at the end. CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 16 tools/libxl/libxl_internal.h | 1 - 2 files changed, 4 insertions(+), 13 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index bbcbc94b6c..6024d4b7b8 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -138,13 +138,6 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, return 0; } -user = GCSPRINTF("%s%d", LIBXL_QEMU_USER_BASE, guest_domid); -ret = userlookup_helper_getpwnam(gc, user, _pwbuf, 0); -if (ret < 0) -return ret; -if (ret > 0) -goto end_search; - ret = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE, _pwbuf, _base); if (ret < 0) @@ -174,15 +167,14 @@ static int libxl__domain_get_device_model_uid(libxl__gc *gc, if (ret < 0) return ret; if (ret > 0) { -LOGD(WARN, guest_domid, "Could not find user %s%d, falling back to %s", - LIBXL_QEMU_USER_BASE, guest_domid, LIBXL_QEMU_USER_SHARED); +LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s", + LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED); goto end_search; } LOGD(ERROR, guest_domid, - "Could not find user %s%d or %s or range base pseudo-user %s, cannot restrict", - LIBXL_QEMU_USER_BASE, guest_domid, LIBXL_QEMU_USER_SHARED, - LIBXL_QEMU_USER_RANGE_BASE); + "Could not find user %s or range base pseudo-user %s, cannot restrict", + LIBXL_QEMU_USER_SHARED, LIBXL_QEMU_USER_RANGE_BASE); return ERROR_INVAL; end_search: diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index c4a43bd0b7..b147f3803c 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -4387,7 +4387,6 @@ _hidden int libxl__read_sysfs_file_contents(libxl__gc *gc, int *datalen_r); #define LIBXL_QEMU_USER_PREFIX "xen-qemuuser" -#define LIBXL_QEMU_USER_BASE LIBXL_QEMU_USER_PREFIX"-domid" #define LIBXL_QEMU_USER_SHARED LIBXL_QEMU_USER_PREFIX"-shared" #define LIBXL_QEMU_USER_RANGE_BASE LIBXL_QEMU_USER_PREFIX"-range-base" -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 07/11] libxl: Make killing of device model asynchronous
Or at least, give it an asynchronous interface so that we can make it actually asynchronous in subsequent patches. Create state structures and callback function signatures. Add the state structure to libxl__destroy_domid_state. Break libxl__destroy_domid down into two functions. No functional change intended. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v2: - Note that libxl__devicemodel_destroy_cb may be called reentrantly NB that I retain the comment before libxl__destroy_device_model, in spite of the fact that it looks "pointless", to separate it logically from the previous prototype. CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_dm.c | 11 +++--- tools/libxl/libxl_domain.c | 40 tools/libxl/libxl_internal.h | 20 -- 3 files changed, 58 insertions(+), 13 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 450433452d..ca59df33fe 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2696,19 +2696,24 @@ out: return rc; } -int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid) +void libxl__destroy_device_model(libxl__egc *egc, + libxl__destroy_devicemodel_state *ddms) { +STATE_AO_GC(ddms->ao); int rc; +int domid = ddms->domid; char *path = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid, ""); + if (!xs_rm(CTX->xsh, XBT_NULL, path)) LOGD(ERROR, domid, "xs_rm failed for %s", path); + /* We should try to destroy the device model anyway. */ rc = kill_device_model(gc, GCSPRINTF("/local/domain/%d/image/device-model-pid", domid)); - + libxl__qmp_cleanup(gc, domid); -return rc; +ddms->callback(egc, ddms, rc); } /* Return 0 if no dm needed, 1 if needed and <0 if error. */ diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c index d46b97dedf..0ce1ba1327 100644 --- a/tools/libxl/libxl_domain.c +++ b/tools/libxl/libxl_domain.c @@ -1008,6 +1008,10 @@ static void destroy_finish_check(libxl__egc *egc, } /* Callbacks for libxl__destroy_domid */ +static void dm_destroy_cb(libxl__egc *egc, + libxl__destroy_devicemodel_state *ddms, + int rc); + static void devices_destroy_cb(libxl__egc *egc, libxl__devices_remove_state *drs, int rc); @@ -1066,16 +1070,18 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) if (rc < 0) { LOGEVD(ERROR, rc, domid, "xc_domain_pause failed"); } + if (dm_present) { -if (libxl__destroy_device_model(gc, domid) < 0) -LOGD(ERROR, domid, "libxl__destroy_device_model failed"); +dis->ddms.ao = ao; +dis->ddms.domid = domid; +dis->ddms.callback = dm_destroy_cb; + +libxl__destroy_device_model(egc, >ddms); +return; +} else { +dm_destroy_cb(egc, >ddms, 0); +return; } -dis->drs.ao = ao; -dis->drs.domid = domid; -dis->drs.callback = devices_destroy_cb; -dis->drs.force = 1; -libxl__devices_destroy(egc, >drs); -return; out: assert(rc); @@ -1083,6 +1089,24 @@ out: return; } +static void dm_destroy_cb(libxl__egc *egc, + libxl__destroy_devicemodel_state *ddms, + int rc) +{ +libxl__destroy_domid_state *dis = CONTAINER_OF(ddms, *dis, ddms); +STATE_AO_GC(dis->ao); +uint32_t domid = dis->domid; + +if (rc < 0) +LOGD(ERROR, domid, "libxl__destroy_device_model failed"); + +dis->drs.ao = ao; +dis->drs.domid = domid; +dis->drs.callback = devices_destroy_cb; +dis->drs.force = 1; +libxl__devices_destroy(egc, >drs); +} + static void devices_destroy_cb(libxl__egc *egc, libxl__devices_remove_state *drs, int rc) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index b147f3803c..f9e0bf6578 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1705,8 +1705,6 @@ _hidden int libxl__wait_for_device_model_deprecated(libxl__gc *gc, void *userdata), void *check_callback_userdata); -_hidden int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid); - _hidden const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *g_cfg); _hidden char *libxl__abs_path(libxl__gc *gc, const char *s, const char *path); @@ -3672,6 +3670,7 @@ extern const struct libxl_device_type *device_type_tbl[]; typedef struct libxl__domain_destroy_state libxl__domain_destroy_state; typedef struct libxl__destroy_domid_state libxl__destroy_domid_state; +typedef struct libxl__destroy_devicemodel_state libxl__destroy_devicemodel_state; typedef struct libxl__devices_remove_state
[Xen-devel] [PATCH v3 04/11] dm_depriv: Describe expected usage of device_model_user parameter
A number of subsequent patches rely on as-yet undefined behavior for what the `device_model_user` parameter does. Rather than implement it incorrectly (or randomly), or remove the feature, describe an expected usage for the feature. Further patches will make decisions based on this expected usage. Signed-off-by: George Dunlap Acked-by: Ian Jackson --- v3: - Fix a minor typo v2: - Remove stale comment about device_model_user not being ready RFC: As we'll see in a later patch, this implementation is still incomplete: we need a `reaper` uid from which to kill uids. CC: Ian Jackson CC: Wei Liu CC: Anthony Perard --- docs/features/qemu-deprivilege.pandoc | 17 + tools/libxl/libxl_types.idl | 1 - 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/features/qemu-deprivilege.pandoc b/docs/features/qemu-deprivilege.pandoc index f941525189..ce21a60ef7 100644 --- a/docs/features/qemu-deprivilege.pandoc +++ b/docs/features/qemu-deprivilege.pandoc @@ -66,6 +66,23 @@ this, create a user named `xen-qemuuser-shared`; for example: adduser --no-create-home --system xen-qemuuser-shared +A final way to set up a separate process for qemus is to allocate one +UID per VM, and set the UID in the domain config file with the +`device_model_user` argument. For example, suppose you have a VM +named `c6-01`. You might do the following: + +adduser --system --no-create-home --group xen-qemuuser-c6-01 + +And then in your config file, the following line: + +device_model_user="xen-qemuuser-c6-01" + +NOTE: It is important when using `device_model_user` that EACH VM HAVE +A SEPARATE UID, and that none of these UIDs map to root. xl will +throw an error a uid maps to zero, but not if multiple VMs have the +same uid. Multiple VMs with the same device model uid will cause +problems. + ## Domain config changes The core domain config change is to add the following line to the diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 51cf06a3a2..141c46e42a 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -495,7 +495,6 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("device_model", string), ("device_model_ssidref", uint32), ("device_model_ssid_label", string), -# device_model_user is not ready for use yet ("device_model_user", string), # extra parameters pass directly to qemu, NULL terminated -- 2.19.2 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v2 8/8] xen: Switch parameter in get_page_from_gfn to use typesafe gfn
On 21/12/2018 16:21, Julien Grall wrote: >>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h >>> index 3304921991..1efbc071c5 100644 >>> --- a/xen/include/asm-x86/p2m.h >>> +++ b/xen/include/asm-x86/p2m.h >>> @@ -491,18 +491,21 @@ struct page_info *p2m_get_page_from_gfn(struct >>> p2m_domain *p2m, gfn_t gfn, >>> p2m_query_t q); >>> static inline struct page_info *get_page_from_gfn( >>> - struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q) >>> + struct domain *d, gfn_t gfn, p2m_type_t *t, p2m_query_t q) >>> { >>> struct page_info *page; >>> + mfn_t mfn; >>> if ( paging_mode_translate(d) ) >>> - return p2m_get_page_from_gfn(p2m_get_hostp2m(d), _gfn(gfn), >>> t, NULL, q); >>> + return p2m_get_page_from_gfn(p2m_get_hostp2m(d), gfn, t, >>> NULL, q); >>> /* Non-translated guests see 1-1 RAM / MMIO mappings >>> everywhere */ >>> if ( t ) >>> *t = likely(d != dom_io) ? p2m_ram_rw : p2m_mmio_direct; >>> - page = mfn_to_page(_mfn(gfn)); >>> - return mfn_valid(_mfn(gfn)) && get_page(page, d) ? page : NULL; >>> + >>> + mfn = _mfn(gfn_x(gfn)); >>> + page = mfn_to_page(mfn); >>> + return mfn_valid(mfn) && get_page(page, d) ? page : NULL; >> >> This unfortunately propagates some bad behaviour, because it is not safe >> to use mfn_to_page(mfn); before mfn_valid(mfn) succeeds. (In practice >> it works because mfn_to_page() is just pointer arithmetic.) >> >> Pleas can you express this as: >> >> return (mfn_valid(mfn) && >> (page = mfn_to_page(mfn), get_page(page, d))) ? page : NULL; >> >> which at least gets the order of operations in the correct order from >> C's point of view. >> >> Alternatively, and perhaps easier to follow: >> >> if ( !mfn_valid(mfn) ) >> return NULL; >> >> page = mfn_to_page(mfn); >> >> return get_page(page, d) ? page : NULL; > > I am happy to fix that. However, shouldn't this be handled in a > separate patch? After all, the code is not worst than it currently is. I don't think its worthy of a separate patch. You're touching the code anyway, so might as well do it all in one go. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v3 8/8] xen: Switch parameter in get_page_from_gfn to use typesafe gfn
On 21/12/2018 16:26, Julien Grall wrote: No functional change intended. Only reasonable clean-ups are done in this patch. The rest will use _gfn for the time being. Signed-off-by: Julien Grall Acked-by: Jan Beulich Please ignore this patch. I need to rebase it on Andrew's patch. Cheers, --- Changes in v3: - Add Jan's acked-by Changes in v2: - Remove >> PAGE_SHIFT in svm code - Fix typo in the e-mail address - Small NITs --- xen/arch/arm/guestcopy.c | 2 +- xen/arch/arm/mm.c| 2 +- xen/arch/x86/cpu/vpmu.c | 2 +- xen/arch/x86/domain.c| 12 ++-- xen/arch/x86/domctl.c| 6 +++--- xen/arch/x86/hvm/dm.c| 2 +- xen/arch/x86/hvm/domain.c| 2 +- xen/arch/x86/hvm/hvm.c | 9 + xen/arch/x86/hvm/svm/svm.c | 8 xen/arch/x86/hvm/viridian/time.c | 8 xen/arch/x86/hvm/viridian/viridian.c | 16 xen/arch/x86/hvm/vmx/vmx.c | 4 ++-- xen/arch/x86/hvm/vmx/vvmx.c | 12 ++-- xen/arch/x86/mm.c| 24 ++-- xen/arch/x86/mm/p2m.c| 2 +- xen/arch/x86/mm/shadow/hvm.c | 6 +++--- xen/arch/x86/physdev.c | 3 ++- xen/arch/x86/pv/descriptor-tables.c | 4 ++-- xen/arch/x86/pv/emul-priv-op.c | 6 +++--- xen/arch/x86/pv/mm.c | 2 +- xen/arch/x86/traps.c | 11 ++- xen/common/domain.c | 2 +- xen/common/event_fifo.c | 12 ++-- xen/common/memory.c | 4 ++-- xen/common/tmem_xen.c| 2 +- xen/include/asm-arm/p2m.h| 6 +++--- xen/include/asm-x86/p2m.h| 11 +++ 27 files changed, 95 insertions(+), 85 deletions(-) diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c index 7a0f3e9d5f..55892062bb 100644 --- a/xen/arch/arm/guestcopy.c +++ b/xen/arch/arm/guestcopy.c @@ -37,7 +37,7 @@ static struct page_info *translate_get_page(copy_info_t info, uint64_t addr, return get_page_from_gva(info.gva.v, addr, write ? GV2M_WRITE : GV2M_READ); -page = get_page_from_gfn(info.gpa.d, paddr_to_pfn(addr), , P2M_ALLOC); +page = get_page_from_gfn(info.gpa.d, gaddr_to_gfn(addr), , P2M_ALLOC); if ( !page ) return NULL; diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 01ae20..340a1d1548 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1252,7 +1252,7 @@ int xenmem_add_to_physmap_one( /* Take reference to the foreign domain page. * Reference will be released in XENMEM_remove_from_physmap */ -page = get_page_from_gfn(od, idx, , P2M_ALLOC); +page = get_page_from_gfn(od, _gfn(idx), , P2M_ALLOC); if ( !page ) { put_pg_owner(od); diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index 8a4f753eae..4d8f153031 100644 --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -607,7 +607,7 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params) struct vcpu *v; struct vpmu_struct *vpmu; struct page_info *page; -uint64_t gfn = params->val; +gfn_t gfn = _gfn(params->val); if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) ) return -EINVAL; diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 32dc4253ff..b462a8513b 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -827,7 +827,7 @@ int arch_set_info_guest( unsigned long flags; bool compat; #ifdef CONFIG_PV -unsigned long cr3_gfn; +gfn_t cr3_gfn; struct page_info *cr3_page; unsigned long cr4; int rc = 0; @@ -1091,9 +1091,9 @@ int arch_set_info_guest( set_bit(_VPF_in_reset, >pause_flags); if ( !compat ) -cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]); +cr3_gfn = _gfn(xen_cr3_to_pfn(c.nat->ctrlreg[3])); else -cr3_gfn = compat_cr3_to_pfn(c.cmp->ctrlreg[3]); +cr3_gfn = _gfn(compat_cr3_to_pfn(c.cmp->ctrlreg[3])); cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); if ( !cr3_page ) @@ -1122,7 +1122,7 @@ int arch_set_info_guest( case 0: if ( !compat && !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) -fill_ro_mpt(_mfn(cr3_gfn)); +fill_ro_mpt(_mfn(gfn_x(cr3_gfn))); break; default: if ( cr3_page == current->arch.old_guest_table ) @@ -1137,7 +1137,7 @@ int arch_set_info_guest( v->arch.guest_table = pagetable_from_page(cr3_page); if ( c.nat->ctrlreg[1] ) { -cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]); +cr3_gfn =
Re: [Xen-devel] [PATCH for-4.12 v3 7/8] xenalyze: Build for Both ARM and x86 Platforms
Hi, On 21/12/2018 16:26, Julien Grall wrote: From: Benjamin Sanda Modified to provide building of the xenalyze binary for both ARM and x86 platforms. The xenalyze binary is now built as part of the BIN list for both platforms. Signed-off-by: Benjamin Sanda Signed-off-by: Julien Grall Acked-by: Wei Liu Acked-by: Stefano Stabellini I made a typo in the address e-mail. Stefano, could you fix it on commit? Cheers, --- Changes in v3: - Add Stefano's acked-by Changes in v2: - Add Wei's acked-by --- tools/xentrace/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/xentrace/Makefile b/tools/xentrace/Makefile index 0bad942bdf..9fb7fc96e7 100644 --- a/tools/xentrace/Makefile +++ b/tools/xentrace/Makefile @@ -9,8 +9,7 @@ LDLIBS += $(LDLIBS_libxenevtchn) LDLIBS += $(LDLIBS_libxenctrl) LDLIBS += $(ARGP_LDFLAGS) -BIN-$(CONFIG_X86) = xenalyze -BIN = $(BIN-y) +BIN = xenalyze SBIN = xentrace xentrace_setsize LIBBIN = xenctx SCRIPTS = xentrace_format -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 5/8] xen/arm: Allow a privileged domain to map foreign page from DOMID_XEN
For auto-translated domain, the only way to map a page to itself is the using the foreign map API. The current code does not allow mapping page from special page (such as DOMID_XEN). As xentrace buffers are shared using DOMID_XEN, it is not possible to use tracing for Arm. This could be solved by using the helper get_pg_owner(). This helper will be able to get a reference on DOMID_XEN and therefore allow mapping for privileged domain. This patch replace the call to rcu_lock_domain_by_any_id() with get_pg_owner(). For consistency, all the call to rcu_unlock_domain are replaced by put_pg_owner(). Signed-off-by: Julien grall Reviewed-by: Andrii Anisov Reviewed-by: Stefano Stabellini --- Changes in v3: - Add Stefano's reviewed-by - Fix typoes Changes in v2: - Add Andrii's reviewed-by --- xen/arch/arm/mm.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 3bf11eec4f..01ae20 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1233,20 +1233,20 @@ int xenmem_add_to_physmap_one( struct domain *od; p2m_type_t p2mt; -od = rcu_lock_domain_by_any_id(extra.foreign_domid); +od = get_pg_owner(extra.foreign_domid); if ( od == NULL ) return -ESRCH; if ( od == d ) { -rcu_unlock_domain(od); +put_pg_owner(od); return -EINVAL; } rc = xsm_map_gmfn_foreign(XSM_TARGET, d, od); if ( rc ) { -rcu_unlock_domain(od); +put_pg_owner(od); return rc; } @@ -1255,7 +1255,7 @@ int xenmem_add_to_physmap_one( page = get_page_from_gfn(od, idx, , P2M_ALLOC); if ( !page ) { -rcu_unlock_domain(od); +put_pg_owner(od); return -EINVAL; } @@ -1264,13 +1264,13 @@ int xenmem_add_to_physmap_one( else { put_page(page); -rcu_unlock_domain(od); +put_pg_owner(od); return -EINVAL; } mfn = page_to_mfn(page); -rcu_unlock_domain(od); +put_pg_owner(od); break; } case XENMAPSPACE_dev_mmio: -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 6/8] xen/arm: Initialize trace buffer
From: Benjamin Sanda Now that we allow a privileged domain to map tracing buffer, initialize them so a user can effectively trace Xen. Signed-off-by: Benjamin Sanda [julien: rework commit message] Signed-off-by: Julien Grall Reviewed-by: Andrii Anisov Acked-by: Stefano Stabellini --- Changes in v3: - Add Stefano's acked-by Changes in v2: - Add Andrii's reviewed-by --- xen/arch/arm/setup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c index fb923cdf67..444857a967 100644 --- a/xen/arch/arm/setup.c +++ b/xen/arch/arm/setup.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -899,6 +900,8 @@ void __init start_xen(unsigned long boot_phys_offset, heap_init_late(); +init_trace_bufs(); + init_constructors(); console_endboot(); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 3/8] xen/arm: Add support for read-only foreign mappings
Currently, foreign mappings can only be read-write. A follow-up patch will extend foreign mapping for Xen backend memory (via XEN_DOMID), some of that memory should only be read accessible for the mapping domain. Introduce a new p2m_type to cater read-only foreign mappings. For now, the decision between the two foreign mapping type is based on the type of the guest page mapped. Signed-off-by: Julien Grall --- Cc: Andrii Anisov Changes in v3: - Remove Andrii's reviewed-by - Move out the XEN_DOMID code in a separate patch - Make the new addition future-proof Changes in v2: - Add Andrii's reviewed-by --- xen/arch/arm/mm.c | 5 +++-- xen/arch/arm/p2m.c| 1 + xen/include/asm-arm/p2m.h | 9 +++-- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 7193d83b44..3bf11eec4f 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1259,7 +1259,9 @@ int xenmem_add_to_physmap_one( return -EINVAL; } -if ( !p2m_is_ram(p2mt) ) +if ( p2m_is_ram(p2mt) ) +t = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : p2m_map_foreign_ro; +else { put_page(page); rcu_unlock_domain(od); @@ -1267,7 +1269,6 @@ int xenmem_add_to_physmap_one( } mfn = page_to_mfn(page); -t = p2m_map_foreign_rw; rcu_unlock_domain(od); break; diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 89279fb590..1e7c91e39a 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -478,6 +478,7 @@ static void p2m_set_permission(lpae_t *e, p2m_type_t t, p2m_access_t a) break; case p2m_iommu_map_ro: +case p2m_map_foreign_ro: case p2m_grant_map_ro: case p2m_invalid: e->p2m.xn = 1; diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index a1aef7b793..a03a033a05 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -116,6 +116,7 @@ typedef enum { p2m_mmio_direct_nc, /* Read/write mapping of genuine MMIO area non-cacheable */ p2m_mmio_direct_c, /* Read/write mapping of genuine MMIO area cacheable */ p2m_map_foreign_rw, /* Read/write RAM pages from foreign domain */ +p2m_map_foreign_ro, /* Read-only RAM pages from foreign domain */ p2m_grant_map_rw, /* Read/write grant mapping */ p2m_grant_map_ro, /* Read-only grant mapping */ /* The types below are only used to decide the page attribute in the P2M */ @@ -135,12 +136,16 @@ typedef enum { #define P2M_GRANT_TYPES (p2m_to_mask(p2m_grant_map_rw) | \ p2m_to_mask(p2m_grant_map_ro)) +/* Foreign mappings types */ +#define P2M_FOREIGN_TYPES (p2m_to_mask(p2m_map_foreign_rw) | \ + p2m_to_mask(p2m_map_foreign_ro)) + /* Useful predicates */ #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES) -#define p2m_is_foreign(_t) (p2m_to_mask(_t) & p2m_to_mask(p2m_map_foreign_rw)) +#define p2m_is_foreign(_t) (p2m_to_mask(_t) & P2M_FOREIGN_TYPES) #define p2m_is_any_ram(_t) (p2m_to_mask(_t) & \ (P2M_RAM_TYPES | P2M_GRANT_TYPES | \ - p2m_to_mask(p2m_map_foreign_rw))) + P2M_FOREIGN_TYPES)) /* All common type definitions should live ahead of this inclusion. */ #ifdef _XEN_P2M_COMMON_H -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 8/8] xen: Switch parameter in get_page_from_gfn to use typesafe gfn
No functional change intended. Only reasonable clean-ups are done in this patch. The rest will use _gfn for the time being. Signed-off-by: Julien Grall Acked-by: Jan Beulich --- Changes in v3: - Add Jan's acked-by Changes in v2: - Remove >> PAGE_SHIFT in svm code - Fix typo in the e-mail address - Small NITs --- xen/arch/arm/guestcopy.c | 2 +- xen/arch/arm/mm.c| 2 +- xen/arch/x86/cpu/vpmu.c | 2 +- xen/arch/x86/domain.c| 12 ++-- xen/arch/x86/domctl.c| 6 +++--- xen/arch/x86/hvm/dm.c| 2 +- xen/arch/x86/hvm/domain.c| 2 +- xen/arch/x86/hvm/hvm.c | 9 + xen/arch/x86/hvm/svm/svm.c | 8 xen/arch/x86/hvm/viridian/time.c | 8 xen/arch/x86/hvm/viridian/viridian.c | 16 xen/arch/x86/hvm/vmx/vmx.c | 4 ++-- xen/arch/x86/hvm/vmx/vvmx.c | 12 ++-- xen/arch/x86/mm.c| 24 ++-- xen/arch/x86/mm/p2m.c| 2 +- xen/arch/x86/mm/shadow/hvm.c | 6 +++--- xen/arch/x86/physdev.c | 3 ++- xen/arch/x86/pv/descriptor-tables.c | 4 ++-- xen/arch/x86/pv/emul-priv-op.c | 6 +++--- xen/arch/x86/pv/mm.c | 2 +- xen/arch/x86/traps.c | 11 ++- xen/common/domain.c | 2 +- xen/common/event_fifo.c | 12 ++-- xen/common/memory.c | 4 ++-- xen/common/tmem_xen.c| 2 +- xen/include/asm-arm/p2m.h| 6 +++--- xen/include/asm-x86/p2m.h| 11 +++ 27 files changed, 95 insertions(+), 85 deletions(-) diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c index 7a0f3e9d5f..55892062bb 100644 --- a/xen/arch/arm/guestcopy.c +++ b/xen/arch/arm/guestcopy.c @@ -37,7 +37,7 @@ static struct page_info *translate_get_page(copy_info_t info, uint64_t addr, return get_page_from_gva(info.gva.v, addr, write ? GV2M_WRITE : GV2M_READ); -page = get_page_from_gfn(info.gpa.d, paddr_to_pfn(addr), , P2M_ALLOC); +page = get_page_from_gfn(info.gpa.d, gaddr_to_gfn(addr), , P2M_ALLOC); if ( !page ) return NULL; diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 01ae20..340a1d1548 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1252,7 +1252,7 @@ int xenmem_add_to_physmap_one( /* Take reference to the foreign domain page. * Reference will be released in XENMEM_remove_from_physmap */ -page = get_page_from_gfn(od, idx, , P2M_ALLOC); +page = get_page_from_gfn(od, _gfn(idx), , P2M_ALLOC); if ( !page ) { put_pg_owner(od); diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index 8a4f753eae..4d8f153031 100644 --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -607,7 +607,7 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params) struct vcpu *v; struct vpmu_struct *vpmu; struct page_info *page; -uint64_t gfn = params->val; +gfn_t gfn = _gfn(params->val); if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) ) return -EINVAL; diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 32dc4253ff..b462a8513b 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -827,7 +827,7 @@ int arch_set_info_guest( unsigned long flags; bool compat; #ifdef CONFIG_PV -unsigned long cr3_gfn; +gfn_t cr3_gfn; struct page_info *cr3_page; unsigned long cr4; int rc = 0; @@ -1091,9 +1091,9 @@ int arch_set_info_guest( set_bit(_VPF_in_reset, >pause_flags); if ( !compat ) -cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]); +cr3_gfn = _gfn(xen_cr3_to_pfn(c.nat->ctrlreg[3])); else -cr3_gfn = compat_cr3_to_pfn(c.cmp->ctrlreg[3]); +cr3_gfn = _gfn(compat_cr3_to_pfn(c.cmp->ctrlreg[3])); cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); if ( !cr3_page ) @@ -1122,7 +1122,7 @@ int arch_set_info_guest( case 0: if ( !compat && !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) -fill_ro_mpt(_mfn(cr3_gfn)); +fill_ro_mpt(_mfn(gfn_x(cr3_gfn))); break; default: if ( cr3_page == current->arch.old_guest_table ) @@ -1137,7 +1137,7 @@ int arch_set_info_guest( v->arch.guest_table = pagetable_from_page(cr3_page); if ( c.nat->ctrlreg[1] ) { -cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]); +cr3_gfn = _gfn(xen_cr3_to_pfn(c.nat->ctrlreg[1])); cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); if ( !cr3_page ) @@ -1162,7 +1162,7 @@ int arch_set_info_guest(
[Xen-devel] [PATCH for-4.12 v3 2/8] xen/arm: Rename p2m_map_foreign to p2m_map_foreign_rw
A follow-up patch will introduce another type of foreign mapping. Rename the type to make clear it is only used for read-write mapping. No functional changes intended. Signed-off-by: Julien Grall Reviewed-by: Andrii Anisov Acked-by: Stefano Stabellini --- Changes in v3 - Add Stefano's acked-by Changes in v2: - Add Andrii's reviewed-by --- xen/arch/arm/mm.c | 2 +- xen/arch/arm/p2m.c| 2 +- xen/include/asm-arm/p2m.h | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index d96a6655ee..7193d83b44 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1267,7 +1267,7 @@ int xenmem_add_to_physmap_one( } mfn = page_to_mfn(page); -t = p2m_map_foreign; +t = p2m_map_foreign_rw; rcu_unlock_domain(od); break; diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 7ae5b29699..89279fb590 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -468,7 +468,7 @@ static void p2m_set_permission(lpae_t *e, p2m_type_t t, p2m_access_t a) break; case p2m_iommu_map_rw: -case p2m_map_foreign: +case p2m_map_foreign_rw: case p2m_grant_map_rw: case p2m_mmio_direct_dev: case p2m_mmio_direct_nc: diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index 4db8e8709d..a1aef7b793 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -115,7 +115,7 @@ typedef enum { p2m_mmio_direct_dev,/* Read/write mapping of genuine Device MMIO area */ p2m_mmio_direct_nc, /* Read/write mapping of genuine MMIO area non-cacheable */ p2m_mmio_direct_c, /* Read/write mapping of genuine MMIO area cacheable */ -p2m_map_foreign,/* Ram pages from foreign domain */ +p2m_map_foreign_rw, /* Read/write RAM pages from foreign domain */ p2m_grant_map_rw, /* Read/write grant mapping */ p2m_grant_map_ro, /* Read-only grant mapping */ /* The types below are only used to decide the page attribute in the P2M */ @@ -137,10 +137,10 @@ typedef enum { /* Useful predicates */ #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES) -#define p2m_is_foreign(_t) (p2m_to_mask(_t) & p2m_to_mask(p2m_map_foreign)) +#define p2m_is_foreign(_t) (p2m_to_mask(_t) & p2m_to_mask(p2m_map_foreign_rw)) #define p2m_is_any_ram(_t) (p2m_to_mask(_t) & \ (P2M_RAM_TYPES | P2M_GRANT_TYPES | \ - p2m_to_mask(p2m_map_foreign))) + p2m_to_mask(p2m_map_foreign_rw))) /* All common type definitions should live ahead of this inclusion. */ #ifdef _XEN_P2M_COMMON_H -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 7/8] xenalyze: Build for Both ARM and x86 Platforms
From: Benjamin Sanda Modified to provide building of the xenalyze binary for both ARM and x86 platforms. The xenalyze binary is now built as part of the BIN list for both platforms. Signed-off-by: Benjamin Sanda Signed-off-by: Julien Grall Acked-by: Wei Liu Acked-by: Stefano Stabellini --- Changes in v3: - Add Stefano's acked-by Changes in v2: - Add Wei's acked-by --- tools/xentrace/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/xentrace/Makefile b/tools/xentrace/Makefile index 0bad942bdf..9fb7fc96e7 100644 --- a/tools/xentrace/Makefile +++ b/tools/xentrace/Makefile @@ -9,8 +9,7 @@ LDLIBS += $(LDLIBS_libxenevtchn) LDLIBS += $(LDLIBS_libxenctrl) LDLIBS += $(ARGP_LDFLAGS) -BIN-$(CONFIG_X86) = xenalyze -BIN = $(BIN-y) +BIN = xenalyze SBIN = xentrace xentrace_setsize LIBBIN = xenctx SCRIPTS = xentrace_format -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 1/8] xen/arm: p2m: Introduce p2m_get_page_from_gfn
In a follow-up patch, we will need to handle get_page_from_gfn differently for DOMID_XEN. To keep the code simple move the current content in a new separate helper p2m_get_page_from_gfn. Note the new helper is not anymore a static inline function as the helper is quite complex. Finally, take the opportunity to use typesafe gfn as the change is minor. Signed-off-by: Julien Grall Reviewed-by: Andrii Anisov Reviewed-by: Stefano Stabellini --- Changes in v3: - Fix typeos - Add Stefano's reviewed-by - Fix coding style Changes in v2: - Add Andrii's reviewed-by --- xen/arch/arm/p2m.c| 33 + xen/include/asm-arm/p2m.h | 33 - 2 files changed, 37 insertions(+), 29 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 2b5e43f50a..7ae5b29699 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -406,6 +406,39 @@ mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t) return mfn; } +struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn, +p2m_type_t *t) +{ +struct page_info *page; +p2m_type_t p2mt; +mfn_t mfn = p2m_lookup(d, gfn, ); + +if ( t ) +*t = p2mt; + +if ( !p2m_is_any_ram(p2mt) ) +return NULL; + +if ( !mfn_valid(mfn) ) +return NULL; + +page = mfn_to_page(mfn); + +/* + * get_page won't work on foreign mapping because the page doesn't + * belong to the current domain. + */ +if ( p2m_is_foreign(p2mt) ) +{ +struct domain *fdom = page_get_owner_and_reference(page); +ASSERT(fdom != NULL); +ASSERT(fdom != d); +return page; +} + +return get_page(page, d) ? page : NULL; +} + int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn, unsigned int order) diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index 01cd3ee4b5..4db8e8709d 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -289,38 +289,13 @@ typedef unsigned int p2m_query_t; #define P2M_ALLOC(1u<<0) /* Populate PoD and paged-out entries */ #define P2M_UNSHARE (1u<<1) /* Break CoW sharing */ +struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn, +p2m_type_t *t); + static inline struct page_info *get_page_from_gfn( struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q) { -struct page_info *page; -p2m_type_t p2mt; -mfn_t mfn = p2m_lookup(d, _gfn(gfn), ); - -if (t) -*t = p2mt; - -if ( !p2m_is_any_ram(p2mt) ) -return NULL; - -if ( !mfn_valid(mfn) ) -return NULL; -page = mfn_to_page(mfn); - -/* - * get_page won't work on foreign mapping because the page doesn't - * belong to the current domain. - */ -if ( p2m_is_foreign(p2mt) ) -{ -struct domain *fdom = page_get_owner_and_reference(page); -ASSERT(fdom != NULL); -ASSERT(fdom != d); -return page; -} - -if ( !get_page(page, d) ) -return NULL; -return page; +return p2m_get_page_from_gfn(d, _gfn(gfn), t); } int get_page_type(struct page_info *page, unsigned long type); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 0/8] xen/arm: Add xentrace support
Hi all, This patch series is a rework of the series sent by Benjamin Sanda in April 2016 [1]. It finally adds support for xentrace/xenanalyze on Arm. Cheers, [1] https://lists.xenproject.org/archives/html/xen-devel/2016-04/msg00464.html *** BLURB HERE *** Benjamin Sanda (2): xen/arm: Initialize trace buffer xenalyze: Build for Both ARM and x86 Platforms Julien Grall (6): xen/arm: p2m: Introduce p2m_get_page_from_gfn xen/arm: Rename p2m_map_foreign to p2m_map_foreign_rw xen/arm: Add support for read-only foreign mappings xen/arm: Make get_page_from_gfn working with DOMID_XEN xen/arm: Allow a privileged domain to map foreign page from DOMID_XEN xen: Switch parameter in get_page_from_gfn to use typesafe gfn tools/xentrace/Makefile | 3 +- xen/arch/arm/guestcopy.c | 2 +- xen/arch/arm/mm.c| 19 ++-- xen/arch/arm/p2m.c | 36 ++- xen/arch/arm/setup.c | 3 ++ xen/arch/x86/cpu/vpmu.c | 2 +- xen/arch/x86/domain.c| 12 xen/arch/x86/domctl.c| 6 ++-- xen/arch/x86/hvm/dm.c| 2 +- xen/arch/x86/hvm/domain.c| 2 +- xen/arch/x86/hvm/hvm.c | 9 +++--- xen/arch/x86/hvm/svm/svm.c | 8 ++--- xen/arch/x86/hvm/viridian/time.c | 8 ++--- xen/arch/x86/hvm/viridian/viridian.c | 16 +- xen/arch/x86/hvm/vmx/vmx.c | 4 +-- xen/arch/x86/hvm/vmx/vvmx.c | 12 xen/arch/x86/mm.c| 24 --- xen/arch/x86/mm/p2m.c| 2 +- xen/arch/x86/mm/shadow/hvm.c | 6 ++-- xen/arch/x86/physdev.c | 3 +- xen/arch/x86/pv/descriptor-tables.c | 4 +-- xen/arch/x86/pv/emul-priv-op.c | 6 ++-- xen/arch/x86/pv/mm.c | 2 +- xen/arch/x86/traps.c | 11 +++ xen/common/domain.c | 2 +- xen/common/event_fifo.c | 12 xen/common/memory.c | 4 +-- xen/common/tmem_xen.c| 2 +- xen/include/asm-arm/p2m.h| 57 +--- xen/include/asm-x86/p2m.h| 11 --- 30 files changed, 174 insertions(+), 116 deletions(-) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.12 v3 4/8] xen/arm: Make get_page_from_gfn working with DOMID_XEN
DOMID_XEN is used to share pages beloging to the hypervisor (e.g trace buffers). Unlike other domains, DOMID_XEN is a non-auto translated domain and therefore does not have a P2M. This patch adds a special case for DOMID_XEN in get_page_from_gfn. We may want to provide "non-auto translated helpers" in the future if we see more case. Signed-off-by: Julien Grall --- Changes in v3: - Split from "xen/arm: Add support for read-only foreign mappings" - Use likely rather than unlikely - Fix typoes --- xen/include/asm-arm/p2m.h | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index a03a033a05..041dea827c 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -300,7 +300,38 @@ struct page_info *p2m_get_page_from_gfn(struct domain *d, gfn_t gfn, static inline struct page_info *get_page_from_gfn( struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q) { -return p2m_get_page_from_gfn(d, _gfn(gfn), t); +mfn_t mfn; +p2m_type_t _t; +struct page_info *page; + +/* + * Special case for DOMID_XEN as it is the only domain so far that is + * not auto-translated. + */ +if ( likely(d != dom_xen) ) +return p2m_get_page_from_gfn(d, _gfn(gfn), t); + +if ( !t ) +t = &_t; + +*t = p2m_invalid; + +/* + * DOMID_XEN sees 1-1 RAM. The p2m_type is based on the type of the + * page. + */ +mfn = _mfn(gfn); +page = mfn_to_page(mfn); + +if ( !mfn_valid(mfn) || !get_page(page, d) ) +return NULL; + +if ( page->u.inuse.type_info & PGT_writable_page ) +*t = p2m_ram_rw; +else +*t = p2m_ram_ro; + +return page; } int get_page_type(struct page_info *page, unsigned long type); -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 03/15] x86/cpu/vpmu: Add Hygon Dhyana support for vPMU
On 2018/12/21 21:34, Boris Ostrovsky wrote: > On 12/21/18 5:02 AM, Pu Wen wrote: >> On 2018/12/20 22:25, Boris Ostrovsky wrote: >> ... diff --git a/xen/arch/x86/cpu/vpmu_amd.c b/xen/arch/x86/cpu/vpmu_amd.c index 5efc39b..e9f0a5c 100644 --- a/xen/arch/x86/cpu/vpmu_amd.c +++ b/xen/arch/x86/cpu/vpmu_amd.c @@ -554,6 +554,8 @@ int __init amd_vpmu_init(void) case 0x12: case 0x14: case 0x16: +case 0x17: +case 0x18: >>> >>> This also enables VPMU support for Zen which goes beyond what the >>> commit message claims to do. >> Sorry for the not clear commit message. Will add modification description >> in the commit message and make the changes complete. >> >> On the other hand, since current Xen vPMU still not support Zen. so in >> this patch we enable 0x17 support. If this modification is not preferred, >> will remove AMD Xen 0x17 support in next version. > > Enabling 0x17 should be fine, I just thought commit message should be > explicit about that. OK, will explicit describe the enabling of 0x17 in the commit message in next version patch set. Thanks for the suggestion. >>> Also, why are you choosing to use legacy MSRs (and you did the same in >>> Linux)? Doesn't Zen (which you are saying is similar to Hygon) support >>> c001_020X bank? >> In Linux, the Xen PMU driver use the default branch cases, which also use >> the legacy MSRs way. So we choose to follow legacy MSRs here in Dhyana >> cases. >> >> Since both of Zen and Dhyana support C001_020X MSRs. If use the C001_020X >> is preferred, we will try to modify the related codes and create a patch. > > > I don't have a Zen box available right now but from what I can see 0x17 > counters are compatible with 0x15 so I think switching to C001_020X > should work. And looks like you are using those in Linux (non-Xen part) too. >> Also the Linux Xen PMU driver may need to be updated to use these MSRs. > > Yes, although Linux part is used only by PV guests. Yes, I have tested the MSRs of the 0x15 ones by booting a Dom0 PV guest. It works even when the Linux Xen PMU driver use the legacy MSRs. I'll test the PV guest by using C001_020X in the Linux Xen part tomorrow. Thx. -- Regards, Pu Wen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 v2 8/8] xen: Switch parameter in get_page_from_gfn to use typesafe gfn
Hi Andrew, On 21/12/2018 14:14, Andrew Cooper wrote: On 20/12/2018 19:23, Julien Grall wrote: diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 32dc4253ff..b462a8513b 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -827,7 +827,7 @@ int arch_set_info_guest( unsigned long flags; bool compat; #ifdef CONFIG_PV -unsigned long cr3_gfn; +gfn_t cr3_gfn; I've sent an alternative patch which this patch should be rebased over, at which point all modifications to arch_set_info_guest() should hopefully disappear. The rest of the series should be merged by end of today (Code freeze for Xen Arm). So I will resend this patch separately after my holidays. diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c index 5d5a746a25..73d2da8441 100644 --- a/xen/arch/x86/hvm/domain.c +++ b/xen/arch/x86/hvm/domain.c @@ -297,7 +297,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx) { /* Shadow-mode CR3 change. Check PDBR and update refcounts. */ struct page_info *page = get_page_from_gfn(v->domain, - v->arch.hvm.guest_cr[3] >> PAGE_SHIFT, + gaddr_to_gfn(v->arch.hvm.guest_cr[3]), NULL, P2M_ALLOC); Can you re-indent while modifying this please? Sure. diff --git a/xen/arch/x86/hvm/viridian/time.c b/xen/arch/x86/hvm/viridian/time.c index 840a82b457..a718434456 100644 --- a/xen/arch/x86/hvm/viridian/time.c +++ b/xen/arch/x86/hvm/viridian/time.c @@ -38,16 +38,16 @@ static void dump_reference_tsc(const struct domain *d) static void update_reference_tsc(struct domain *d, bool initialize) { -unsigned long gmfn = d->arch.hvm.viridian.reference_tsc.fields.pfn; -struct page_info *page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); +gfn_t gfn = _gfn(d->arch.hvm.viridian.reference_tsc.fields.pfn); +struct page_info *page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC); HV_REFERENCE_TSC_PAGE *p; if ( !page || !get_page_type(page, PGT_writable_page) ) { if ( page ) put_page(page); -gdprintk(XENLOG_WARNING, "Bad GMFN %#"PRI_gfn" (MFN %#"PRI_mfn")\n", - gmfn, mfn_x(page ? page_to_mfn(page) : INVALID_MFN)); +gdprintk(XENLOG_WARNING, "Bad GFN %#"PRI_gfn" (MFN %#"PRI_mfn")\n", The canonical format for gfns and mfns are just %"PRI_*, without the # Do you mind fixing this seeing as you're changing the string anyway? Sure. diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 3304921991..1efbc071c5 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -491,18 +491,21 @@ struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn, p2m_query_t q); static inline struct page_info *get_page_from_gfn( -struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q) +struct domain *d, gfn_t gfn, p2m_type_t *t, p2m_query_t q) { struct page_info *page; +mfn_t mfn; if ( paging_mode_translate(d) ) -return p2m_get_page_from_gfn(p2m_get_hostp2m(d), _gfn(gfn), t, NULL, q); +return p2m_get_page_from_gfn(p2m_get_hostp2m(d), gfn, t, NULL, q); /* Non-translated guests see 1-1 RAM / MMIO mappings everywhere */ if ( t ) *t = likely(d != dom_io) ? p2m_ram_rw : p2m_mmio_direct; -page = mfn_to_page(_mfn(gfn)); -return mfn_valid(_mfn(gfn)) && get_page(page, d) ? page : NULL; + +mfn = _mfn(gfn_x(gfn)); +page = mfn_to_page(mfn); +return mfn_valid(mfn) && get_page(page, d) ? page : NULL; This unfortunately propagates some bad behaviour, because it is not safe to use mfn_to_page(mfn); before mfn_valid(mfn) succeeds. (In practice it works because mfn_to_page() is just pointer arithmetic.) Pleas can you express this as: return (mfn_valid(mfn) && (page = mfn_to_page(mfn), get_page(page, d))) ? page : NULL; which at least gets the order of operations in the correct order from C's point of view. Alternatively, and perhaps easier to follow: if ( !mfn_valid(mfn) ) return NULL; page = mfn_to_page(mfn); return get_page(page, d) ? page : NULL; I am happy to fix that. However, shouldn't this be handled in a separate patch? After all, the code is not worst than it currently is. Cheers, ~Andrew -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 14/14] libxl: Re-implement domain_suspend_device_model using libxl__ev_qmp
Anthony PERARD writes ("[PATCH v7 14/14] libxl: Re-implement domain_suspend_device_model using libxl__ev_qmp"): > The re-implementation is done because we want to be able to send the > file description that QEMU can use to save its state. When QEMU is > restricted, it would not be able to write to a path. > > This replace both libxl__qmp_stop() and libxl__qmp_save(). > > qmp_qemu_check_version() was only used by libxl__qmp_save(), so it is > replace by a version using libxl__ev_qmp instead. > > Coding style fixed in libxl__domain_suspend_device_model() for the > return value. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 00/14] libxl: Enable save/restore/migration of a restricted QEMU + libxl__ev_qmp_*
Anthony PERARD writes ("[PATCH v7 00/14] libxl: Enable save/restore/migration of a restricted QEMU + libxl__ev_qmp_*"): > Patch series available in this git branch: > https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git > br.libxl-ev-qmp-v Thanks. Sorry for being so slow to review this. It is very nearly ready and I hope it will make 4.12. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 13/14] libxl: Change libxl__domain_suspend_device_model() to be async
Anthony PERARD writes ("[PATCH v7 13/14] libxl: Change libxl__domain_suspend_device_model() to be async"): > This create an extra step for the two call sites of the function. > > libxl__domain_suspend_device_model() in this patch gets an extra error > variable (there is ret and rc), but ret goes away in the next patch. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/2] xen/dom0: Improve documentation for dom0= and dom0-iommu=
On Fri, Dec 21, 2018 at 01:13:25PM +, Andrew Cooper wrote: > On 21/12/2018 12:08, Roger Pau Monné wrote: > > On Thu, Dec 20, 2018 at 11:40:51PM +, Andrew Cooper wrote: > >> Update to the latest metadata style, and expand each of the clauses with > >> more > >> information, including applicable CONFIG_* options. > >> > >> Drop the redundant comment beside parse_dom0_param(), to avoid it getting > >> out > >> of sync with the main documentation. > >> > >> Signed-off-by: Andrew Cooper > > Thanks! A couple of fixes below, because the original text is actually > > wrong... > > TBH, that is my default assumption every time I do work like this :) In this case it's my fault :(, because I changed the code and forgot about the docs. > > > >> --- > >> CC: Jan Beulich > >> CC: Wei Liu > >> CC: Roger Pau Monné > >> CC: Stefano Stabellini > >> CC: Julien Grall > >> > >> Please double check for correctness. The text matches my > >> understanding/reading of the code, but some of it is rather subtle going. > >> > >> It occurs to me that: > >> > >> * The choice of dom0 boot mode should in part be derived from the > >> available > >>CONFIG_* options, and ELF notes advertised in the dom0 kernel. > > This is indeed doable, but would require parsing the dom0 kernel > > before building the domain. > > I don't see anything wrong with parsing the ELF headers ahead of > building the domain. From the overall boot time, its just an > order-of-operations issue. Oh yes, I didn't mean my comment to sound like criticism. I agree there should be no issues in parsing the ELF earlier, or if there are issues they should be fixed. > > > >> * AMD probably needs to gain an `ivmd=` to mirror `rmrr=` on the Intel > >> side, > >>because we know there are other errors in the IVRS table. > > Yes, albeit using rmrr is quite cumbersome because it's mostly a > > trial-and-error process until there are no more iommu faults (unless > > you can get the correct rmrr command for your hardware somewhere). > > > >> * Neither of map-{inclusive,reserved} should be active by default, even on > >>Intel hardware, and we should (wherever possible) have quirks like we > >> have > >>for all other firmware screwups. Requiring the user to diagnose/work > >>around firmware problems like this is quite rude. > > That would indeed be nice, but I think there are too many vendor > > firmware versions to be able to correctly identify such quirks, the > > more that vendors don't even list missing RMRR as erratum. > > I don't agree. We already have quirks based on DMI (at the moment, > mainly for reboot overrides), and the vast majority of the offending > cases are the BMC shared mailbox, which will be in a fixed per-platform > location. IIRC I've only found a single box that worked without map-reserved, and that's my NUC which has firmware from Intel. And even in that case the USB ports weren't fully working. I guess such quirks could be applied based on the chipset version then if Xen realizes the firmware is either wrong or missing obvious RMRR regions? > I don't expect we'll ever find and fix all quirks, but where we do find > suitable ones, we should put them into the boot code. Sadly I agree. What I'm worried about is turning the default map-{inclusive/reserved} to off, that's likely to make dom0 unable to boot on a huge amount of hardware. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 12/14] libxl_qmp: Store advertised QEMU version in libxl__ev_qmp
Anthony PERARD writes ("[PATCH v7 12/14] libxl_qmp: Store advertised QEMU version in libxl__ev_qmp"): > This will be used in a later patch. Acked-by: Ian Jackson > +o = libxl__json_map_get("QMP", resp, JSON_MAP); > +o = libxl__json_map_get("version", o, JSON_MAP); > +o = libxl__json_map_get("qemu", o, JSON_MAP); > +#define GRAB_VERSION(level) do { \ > +ev->qemu_version.level = libxl__json_object_get_integer( \ > +libxl__json_map_get(#level, o, JSON_INTEGER)); \ > +} while (0) > +GRAB_VERSION(major); > +GRAB_VERSION(minor); > +GRAB_VERSION(micro); Earlier I wrote: I would prefer the indentation to be such that the statement inside the macro is indented like the ones outside. Ie like this: +#define GRAB_VERSION(level) do { \ +ev->qemu_version.level = libxl__json_object_get_integer( \ +libxl__json_map_get(#level, o, JSON_INTEGER)); \ +} while (0) +GRAB_VERSION(major); But up to you. My ack stands either way. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 11/14] libxl: QEMU startup sync based on QMP
Anthony PERARD writes ("[PATCH v7 11/14] libxl: QEMU startup sync based on QMP"): > This is only activated when dm_restrict=1, as explained in a previous > patch "libxl_dm: Pre-open QMP socket for QEMU" > > Signed-off-by: Anthony PERARD > Reviewed-by: Roger Pau Monné Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 09/14] libxl_dm: Pre-open QMP socket for QEMU
Anthony PERARD writes ("[PATCH v7 09/14] libxl_dm: Pre-open QMP socket for QEMU"): > This patch moves the creation of the QMP unix socket from QEMU to libxl. > But libxl doesn't rely on this yet. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 08/14] libxl: Add init/dispose of for libxl__domain_build_state
Anthony PERARD writes ("[PATCH v7 08/14] libxl: Add init/dispose of for libxl__domain_build_state"): > These two new functions libxl__domain_build_state_{init,dispose} should > be called every time a new libxl__domain_build_state comes to existance. > > There seems to be two of them, one with the domain creation machinery, > and one in the stub_dm_spawn. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 06/14] libxl_qmp: Implementation of libxl__ev_qmp_*
Ian Jackson writes ("Re: [PATCH v7 06/14] libxl_qmp: Implementation of libxl__ev_qmp_*"): > I have only fairly minor comments now. The biggest one remaining is > about the use of EGC_GC which I think probably wants to become > STATE_AO_GC throughout. See below... I realise that I should state explicitly that libxl__ev_qmp_callback must still get an egc. The egc should be passed through all the layers, but its gc should generally not be used. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 07/14] libxl_exec: Add libxl__spawn_initiate_failure
Anthony PERARD writes ("[PATCH v7 07/14] libxl_exec: Add libxl__spawn_initiate_failure"): > This function can be used by user of libxl__spawn_* when they setup a > notification other than xenstore. The parent can already report success > via libxl__spawn_initiate_detach(), this new function can be used for > failure instead of waiting for the timeout. Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] xen/dom0: Add a dom0-iommu=none option
On Fri, Dec 21, 2018 at 01:01:22PM +, Andrew Cooper wrote: > On 21/12/2018 12:44, Roger Pau Monné wrote: > > On Thu, Dec 20, 2018 at 11:40:52PM +, Andrew Cooper wrote: > >> For development purposes, it is very convenient to boot Xen as a PVH guest, > >> with an XTF PV or PVH "dom0". The edit-compile-go cycle is a matter of > >> seconds, and you can resonably insert printk() debugging in places which > >> which > >> would be completely infeasible when booting fully-fledged guests. > >> > >> However, the PVH dom0 path insists on having a working IOMMU, which doesn't > >> exist when virtualised as a PVH guest, and isn't necessary for XTF anyway. > >> > >> Introduce a developer mode to skip the IOMMU requirement. > > This looks very similar to the current 'passthrough' option, maybe it > > would be enough to allow PVH dom0 to use the passthrough option > > provided a warning is added to > > arch_iommu_check_autotranslated_hwdom? > > I considered that, but "dom0-iommu=passthrough" isn't an accurate > description of what is going on. Frankly, its not correct for PV either. And what about turning dom0-iommu into a boolean option itself, so that you can do "dom0-iommu=false"? I think that's more similar to other Xen command line options that have a boolean value and/or a list of sub-options. > TBH, dom0-iommu=none is better for both. How about I introduce that as > the new option, and leave passthrough as a legacy alias? That sounds good, I would like to avoid (if possible) the proliferation of new iommu_hwdom_* variables, because we already have a bunch and the functionality introduced by the 'none' option looks very similar to what 'passthrough' aims to achieve. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12 6/8] xen/arm: Implement workaround for Cortex-A76 erratum 1165522
On 28.11.18 18:49, Julien Grall wrote: Early version of Cortex-A76 can end-up with corrupt TLBs if they speculate an AT instruction while the S1/S2 system registers are in an inconsistent state. This can happen during guest context switch and when invalidating the TLBs for other than the current VMID. The workaround implemented in Xen will: - Use an empty stage-2 with a reserved VMID while context switching between 2 guests - Use an empty stage-2 with the VMID where TLBs need to be flushed Signed-off-by: Julien Grall Reviewed-by: Andrii Anisov -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel