Re: [Xen-devel] [PATCH v5 6/6] Qemu-Xen-vTPM: Add a parameter indicating whether the command that was a selftest
On 04/12/2015 04:50 PM, Stefan Berger wrote: On 04/10/2015 02:59 AM, Quan Xu wrote: and whether it completed successfully. Move tpm_passthrough_is_selftest() into tpm_util.c and rename it to tpm_util_is_selftest(). Signed-off-by: Quan Xu quan...@intel.com --- hw/tpm/Makefile.objs | 2 +- hw/tpm/tpm_passthrough.c | 13 +-- hw/tpm/tpm_util.c| 50 hw/tpm/tpm_xenstubdoms.c | 36 +++-- include/sysemu/tpm_backend_int.h | 1 + 5 files changed, 82 insertions(+), 20 deletions(-) create mode 100644 hw/tpm/tpm_util.c +#include dirent.h + +#include qemu-common.h +#include qapi/error.h +#include qemu/sockets.h +#include sysemu/tpm_backend.h +#include tpm_int.h +#include hw/hw.h +#include hw/i386/pc.h +#include sysemu/tpm_backend_int.h +#include tpm_tis.h Can you reduce the includes to its minimum? + +bool tpm_util_is_selftest(const uint8_t *in, uint32_t in_len) +{ +struct tpm_req_hdr *hdr = (struct tpm_req_hdr *)in; + +if (in_len = sizeof(*hdr)) { +return (be32_to_cpu(hdr-ordinal) == TPM_ORD_ContinueSelfTest); +} + +return false; +} diff --git a/hw/tpm/tpm_xenstubdoms.c b/hw/tpm/tpm_xenstubdoms.c index 3d046fc..992f2ae 100644 --- a/hw/tpm/tpm_xenstubdoms.c +++ b/hw/tpm/tpm_xenstubdoms.c @@ -65,11 +65,17 @@ typedef struct TPMXenstubdomsState TPMXenstubdomsState; /* Functions */ static void tpm_xenstubdoms_cancel_cmd(TPMBackend *tb); -static int tpm_xenstubdoms_unix_transfer(const TPMLocality *locty_data) +static int tpm_xenstubdoms_unix_transfer(const TPMLocality *locty_data, + bool *selftest_done) { size_t rlen; struct XenDevice *xendev; int ret; +bool is_selftest; +const struct tpm_resp_hdr *hdr; + +is_selftest = tpm_util_is_selftest(locty_data-w_buffer.buffer, + locty_data-w_buffer.size); xendev = xen_find_xendev(vtpm, xen_domid, xenstore_dev); if (xendev == NULL) { @@ -81,12 +87,26 @@ static int tpm_xenstubdoms_unix_transfer(const TPMLocality *locty_data) locty_data-r_buffer.size, locty_data-w_offset); if (ret 0) { xen_be_printf(xendev, 0, Can not send vtpm command.\n); -return -1; +goto err_exit; } -vtpm_recv(xendev, locty_data-r_buffer.buffer, locty_data-r_buffer.size, - rlen); -return 0; +ret = vtpm_recv(xendev, locty_data-r_buffer.buffer, +locty_data-r_buffer.size, rlen); +if (ret 0) { +xen_be_printf(xendev, 0, vtpm reception command error.\n); +goto err_exit; +} + +if (is_selftest (ret = sizeof(struct tpm_resp_hdr))) { +hdr = (struct tpm_resp_hdr *)locty_data-r_buffer.buffer; +*selftest_done = (be32_to_cpu(hdr-errcode) == 0); +} + +err_exit: +if (ret 0) { +xen_be_printf(xendev, 0, vtpm command error.\n); +} +return ret; } static void tpm_xenstubdoms_worker_thread(gpointer data, @@ -94,13 +114,15 @@ static void tpm_xenstubdoms_worker_thread(gpointer data, { TPMXenstubdomsThreadParams *thr_parms = user_data; TPMBackendCmd cmd = (TPMBackendCmd)data; +bool selftest_done = false; switch (cmd) { case TPM_BACKEND_CMD_PROCESS_CMD: - tpm_xenstubdoms_unix_transfer(thr_parms-tpm_state-locty_data); + tpm_xenstubdoms_unix_transfer(thr_parms-tpm_state-locty_data, + selftest_done); thr_parms-recv_data_callback(thr_parms-tpm_state, thr_parms-tpm_state-locty_number, - false); + selftest_done); break; case TPM_BACKEND_CMD_INIT: case TPM_BACKEND_CMD_END: diff --git a/include/sysemu/tpm_backend_int.h b/include/sysemu/tpm_backend_int.h index 05d94d0..e18acab 100644 --- a/include/sysemu/tpm_backend_int.h +++ b/include/sysemu/tpm_backend_int.h @@ -34,6 +34,7 @@ void tpm_backend_thread_create(TPMBackendThread *tbt, void tpm_backend_thread_end(TPMBackendThread *tbt); void tpm_backend_thread_tpm_reset(TPMBackendThread *tbt, GFunc func, gpointer user_data); +bool tpm_util_is_selftest(const uint8_t *in, uint32_t in_len); Put this into tpm_util.h. typedef enum TPMBackendCmd { TPM_BACKEND_CMD_INIT = 1, This patch per se looks good, but some of your modifications will not compile until this patch is applied, due to the missing selftest_done parameter. So you should merge this patch into the 3rd patch in this series. Actually, please put the part that splits out the tpm_util_is_selftest function into tpm_util.c,.h and the changes to the Makefile into a patch before your current patch 3. Stefan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [rumpuserxen test] 50399: regressions - FAIL
flight 50399 rumpuserxen real [real] http://logs.test-lab.xenproject.org/osstest/logs/50399/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866 build-amd64-rumpuserxen 5 rumpuserxen-build fail REGR. vs. 33866 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a version targeted for testing: rumpuserxen 80294081448af74270ca2dd2ba758f0ed308f46c baseline version: rumpuserxen 30d72f3fc5e35cd53afd82c8179cc0e0b11146ad People who touched revisions under test: Antti Kantee po...@iki.fi Ian Jackson ian.jack...@eu.citrix.com Martin Lucina mar...@lucina.net Wei Liu l...@liuw.name jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-i386-rumpuserxen-i386 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/osstest/pub/logs images: /home/osstest/pub/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 527 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Monday, April 13, 2015 8:13 PM To: Wu, Feng Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: RE: [RFC v1 00/15] Add VT-d Posted-Interrupts support On 01.04.15 at 15:21, feng...@intel.com wrote: Hi Jan, Any more comments about this series? Thanks a lot! I think it makes little sense to wait for my review before posting v2 with issues addressed which others have pointed out. I just returned from vacation, and would have time to look at v1 next week the earliest (with the week after being occupied by travel). Thanks a lot, Jan! Any comments are appreciated! Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC][PATCH 10/13] tools: extend XENMEM_set_memory_map
On 2015/4/13 19:02, Wei Liu wrote: On Mon, Apr 13, 2015 at 10:09:51AM +0800, Chen, Tiejun wrote: [...] Hardcoded value? Yes. Actually, we intend to use this to present that lowmem entry, tools/firmware/hvmloader/e820.c: /* Low RAM goes here. Reserve space for special pages. */ ... e820[nr].addr = 0x10; I don't like the idea of having two hardcoded values in different Just one place since based on our logic, hvmloader doesn't have this setting anymore and actually it really grab that info from here. Please refer to patch #13. locations. Please put this value into a header file and reference it here and in hvmloader. Anyway, I'd like to define this here directly since no one consumes this again. diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 5134b33..af747e6 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -787,6 +787,7 @@ out: return rc; } +#define GUEST_LOW_MEM_START_DEFAULT 0x10 static int libxl__domain_construct_memmap(libxl__gc *gc, libxl_domain_config *d_config, uint32_t domid, @@ -812,8 +813,8 @@ static int libxl__domain_construct_memmap(libxl__gc *gc, e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries); /* Low memory */ -e820[nr].addr = 0x10; -e820[nr].size = args-lowmem_size - 0x10; +e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT; +e820[nr].size = args-lowmem_size - GUEST_LOW_MEM_START_DEFAULT; e820[nr].type = E820_RAM; nr++; Is this fine to you? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI
Hi Julien, From: Julien Grall julien.grall@gmail.com Sent: Monday, April 13, 2015 3:49 PM To: Jaggi, Manish; xen Devel; Stefano Stabellini; Julien Grall; Ian Campbell; Kumar, Vijaya; prasun.kap...@cavium.com Subject: Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI Hi Manish, On 13/04/15 08:37, Manish Jaggi wrote: Xen currently does not have PCI support for ARM builds. This patch set makes the code compilable for ARM PCI and adds places-holder code which would be replaced with PCI pass-through support patch series. May I ask why you did send directly all the code to support PCI on ARM? Without the rest it's hard to tell whether these patches make sense or not. [Manish] As I have mentioned these two patches are only for refactoring msi specific functions to x86 files and providing the constructs that make PCI ARM code compilable. The code for PCI ARM is very basic in the patch and PCI passthrough code patches will follow in next series. Please let me know which code does not makes sense and I will modify it. Re-factor MSI Handling - There is a some x86 specific code which is found in common code: xen/drivers/passthrough/pci.c which needs to be re factored. MSI/X are configured and handled by dom0 or domU code on ARM64 and is not required to be part of common code. However there are functions which are used as part of common code and calls to these functions cannot be easily re factored like pci_cleanup_msi. On x86, the hypervisor is taking care of MSI (enabling/disabling in the config space) as long as doing sanity check on MSI used by a given domain. How do you plan to handle it on ARM? IHMO, the MSI code is very useful and would also help GIC MSI implement (ITS + V2M). [manish] As we had discussed in Linaro Connect and over the mailing list MSIs (LPIs in ARMv8) are handled directly on ARM by the guest and hypervisor is only doing injection of the LPIs using GIVv3 emulation. FWIW, I've pointed out the same issue on the ITS series a couple of weeks ago. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code
Hi Julien, From: xen-devel-boun...@lists.xen.org xen-devel-boun...@lists.xen.org on behalf of Julien Grall julien.gr...@citrix.com Sent: Monday, April 13, 2015 4:12 PM To: Jaggi, Manish; Xen Devel; Stefano Stabellini; Julien Grall; Ian Campbell; Kumar, Vijaya; prasun.kap...@cavium.com Subject: Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code Hi Manish, On 13/04/15 08:48, Manish Jaggi wrote: Add ARM PCI Support --- a) Place holder functions are added for pci_conf_read/write calls. b) Macros dev_is_pci, pci_to_dev are implemented in drivers/passthrough/pci/arm code Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com --- xen/arch/arm/Makefile|1 + xen/arch/arm/pci.c | 60 +++ xen/drivers/passthrough/arm/Makefile |1 + xen/drivers/passthrough/arm/pci.c| 88 ++ xen/drivers/passthrough/arm/smmu.c |1 - xen/drivers/passthrough/pci.c|9 ++-- xen/include/asm-arm/device.h | 33 + xen/include/asm-arm/domain.h |3 ++ xen/include/asm-arm/pci.h|7 +-- 9 files changed, 186 insertions(+), 17 deletions(-) [...] diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 004aba9..68c292b 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1252,9 +1252,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn) /* Prevent device assign if mem paging or mem sharing have been * enabled for this domain */ if ( unlikely(!need_iommu(d) -(d-arch.hvm_domain.mem_sharing_enabled || - d-mem_event-paging.ring_page || - p2m_get_hostp2m(d)-global_logdirty)) ) +(d-mem_event-paging.ring_page +#ifdef CONFIG_X86 + || d-arch.hvm_domain.mem_sharing_enabled || + p2m_get_hostp2m(d)-global_logdirty +#endif +)) ) Please avoid #ifdef CONFIG_* and introduce an arch macro. [Manish] ok return -EXDEV; if ( !spin_trylock(pcidevs_lock) ) diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h index a72f7c9..009ff0a 100644 --- a/xen/include/asm-arm/device.h +++ b/xen/include/asm-arm/device.h @@ -6,6 +6,17 @@ enum device_type { DEV_DT, +DEV_ENUMERATED, +}; + +enum device_class +{ +DEVICE_SERIAL, +DEVICE_IOMMU, +DEVICE_GIC, +DEVICE_PCI, +/* Use for error */ +DEVICE_UNKNOWN, }; Why? It will be very unlikely that we have to create a structure device for the IOMMU, GIC and SERIAL. It would make more sense to add a DEV_PCI directly to device_type. [manish] ok. struct dev_archdata { @@ -16,28 +27,30 @@ struct dev_archdata { struct device { enum device_type type; +enum device_class class; #ifdef HAS_DEVICE_TREE struct dt_device_node *of_node; /* Used by drivers imported from Linux */ #endif struct dev_archdata archdata; +#ifdef HAS_PCI +void *pci_dev; This field is not necessary. I've added one for the DT for keeping compatibility with Linux. [Manish] pci_dev is not in struct device currently. Didn't get you I have added as It is required for to_pci_dev call. +#endif [..] +int dev_is_pci(device_t *dev); This could easily be a macro. [manish] ok struct device_desc { /* Device name */ diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 9e0419e..41ae847 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -42,6 +42,8 @@ struct vtimer { uint64_t cval; }; +#define has_arch_pdevs(d)(!list_empty((d)-arch.pdev_list)) + struct arch_domain { #ifdef CONFIG_ARM_64 @@ -56,6 +58,7 @@ struct arch_domain xen_pfn_t *grant_table_gpfn; struct io_handler io_handlers; +struct list_head pdev_list; /* Continuable domain_relinquish_resources(). */ enum { RELMEM_not_started, diff --git a/xen/include/asm-arm/pci.h b/xen/include/asm-arm/pci.h index de13359..b8ec882 100644 --- a/xen/include/asm-arm/pci.h +++ b/xen/include/asm-arm/pci.h @@ -1,7 +1,8 @@ -#ifndef __X86_PCI_H__ -#define __X86_PCI_H__ +#ifndef __ARM_PCI_H__ +#define __ARM_PCI_H__ struct arch_pci_dev { +void *dev; void * is error-prone. Why can't you use the use the real structure? [manish]Will change it. I believe dev_archdata structure has also a void * (in asm-arm/device.h). So all void * are error prone in xen ? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [libvirt test] 50401: regressions - FAIL
flight 50401 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/50401/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 5 libvirt-build fail REGR. vs. 50368 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 11 guest-start fail never pass test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass version targeted for testing: libvirt b487bb810ec95df862e7e80468c8e861ed80b0cb baseline version: libvirt 225aa80246d5e4a9e3a16ebd4c482525045da3db People who touched revisions under test: Cédric Bosdonnat cbosdon...@suse.com Dawid Zamirski dzamir...@datto.com Dmitry Guryanov dgurya...@parallels.com Erik Skultety eskul...@redhat.com John Ferlan jfer...@redhat.com Ján Tomko jto...@redhat.com Lubomir Rintel lkund...@v3.sk Luyao Huang lhu...@redhat.com Maxim Nestratov mnestra...@parallels.com Michael Chapman m...@very.puzzling.org Michal Privoznik mpriv...@redhat.com Peter Krempa pkre...@redhat.com jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt fail build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-xsm fail test-armhf-armhf-libvirt-xsm blocked test-amd64-i386-libvirt-xsm fail test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt blocked test-amd64-i386-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/osstest/pub/logs images: /home/osstest/pub/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 848 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory
On 04/13/15 12:20, Wei Liu wrote: On Mon, Apr 13, 2015 at 12:09:13PM -0400, Don Slutz wrote: If QEMU has called on xc_domain_setmaxmem to add more memory for option ROMs, domain restore needs to also increase the memory. Signed-off-by: Don Slutz dsl...@verizon.com --- To see the hvmloader loader issue: ... +/* Leave some slack so that hvmloader does not complain about lack of + * memory at boot time (Could not allocate order=0 extent). + * Once hvmloader is modified to cope with that situation without + * printing warning messages, QEMU_SPARE_PAGES can be removed. Yes, please properly modify hvmloader to do this instead of trying to workaround this over and over again. At some point I might get to this. But didn't Stefano make some changes to libxl too to handle the extra ram needed? Not that I have seen. + */ +#define QEMU_SPARE_PAGES 16 + This is still arbitrary and prone to breakage. Yes, but turns out not to be the important part. struct restore_ctx { unsigned long max_mfn; /* max mfn of the current host machine */ unsigned long hvirt_start; /* virtual starting address of the hypervisor */ @@ -209,12 +216,44 @@ static int uncanonicalize_pagetable( if (!ctx-hvm ctx-superpages) rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns); else +{ ... +} This is not maintainable. Trying to do smart things inside libxc without libxl knowing it is prone to breakage. Some how I forgot to include that this is a bug in master (4.6-unstable). Much more info in my reply to Andrew Cooper (short form below). [root@dcs-xen-52 don]# xl cre -p /home/don/e1000x8.xfg Parsing config from /home/don/e1000x8.xfg got a tsc mode string: native_paravirt [root@dcs-xen-52 don]# xl save e1000x8 e1000x8.save Saving to e1000x8.save new xl format (info 0x1/0x0/3506) xc: Saving memory: iter 0 (last sent 0 skipped 0): 1044481/1044481 100% [root@dcs-xen-52 don]# xl restore e1000x8.save Loading new save file e1000x8.save (new xl fmt info 0x1/0x0/3506) Savefile contains xl domain config in JSON format Parsing config from saved xc: error: Failed to allocate memory for batch.!: Internal error libxl: error: libxl_create.c:1057:libxl__xc_domain_restore_done: restoring domain: Cannot allocate memory libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build domain: -3 libxl: error: libxl.c:1576:libxl__destroy_domid: non-existant domain 2 libxl: error: libxl.c:1540:domain_destroy_callback: unable to destroy guest with domid 2 libxl: error: libxl_create.c:1490:domcreate_destruction_cb: unable to destroy domain 2 following failed creation [root@dcs-xen-52 don]# -Don Slutz Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/hvm: Fix the unknown nested vmexit reason 80000021 bug
This would be easier to read as if ( cpu_has_vmx_vnmi (idtv_info INTR_INFO_INTR_TYPE_MASK) == (X86_EVENTTYPE_NMI8)) ) I was going to say something similar, but I think in the past Jan has said that Liang's original is more in line with the coding style. No, my complaint here wouldn't be about coding style, but about the hard- coded 8 - it's not been that long ago that I replaced may of them, and I'd really like to see it replaced here too. Liang - can you please submit an incremental change (as the original one got committed already)? There should be several examples in VMX code on how the 8 can be avoided. Jan Of course, with pleasure. Liang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory
This patch also fix a triple fault when guests running under COLO mode. (XEN) d0v1 Over-allocation for domain 1: 524545 524544 (XEN) memory.c:155:d0v1 Could not allocate order=0 extent: id=1 memflags=0 (181 of 235) (XEN) d1v1 Triple fault - invoking HVM shutdown action 1 On 04/14/2015 12:09 AM, Don Slutz wrote: If QEMU has called on xc_domain_setmaxmem to add more memory for option ROMs, domain restore needs to also increase the memory. Signed-off-by: Don Slutz dsl...@verizon.com --- To see the hvmloader loader issue: xl cre -p e1000x8.xfg xl save e1000x8 e1000x8.save xl restore e1000x8.save With e1000x8.xfg: --- builder = hvm device_model_args_hvm = [ -monitor, pty, -boot, menu=on, ] device_model_version = qemu-xen disk = [ /dev/etherd/e500.1,,xvda, /dev/etherd/e500.2,,xvdb, /dev/etherd/e500.3,,xvde, /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD1.iso,,hdc,cdrom, /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD2.iso,,hdd,cdrom, ] maxmem = 8192 memory = 8192 name = e1000x8 serial = pty tsc_mode = native_paravirt uuid = e5000105-3d83-c962-07ae-2bc46c3644a0 videoram = 16 vif = [ model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:a0, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:aa, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:b4, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:be, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:c8, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:d2, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:dc, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:e6, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:f0, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:fa, ] viridian = 0 xen_platform_pci = 1 mmio_hole = 2048 vcpus = 1 maxvcpus = 6 on_poweroff = preserve on_reboot = preserve on_watchdog = preserve on_crash = preserve --- tools/libxc/xc_domain_restore.c | 75 +++-- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index 2ab9f46..28b4fa6 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -47,6 +47,13 @@ #include xen/hvm/ioreq.h #include xen/hvm/params.h +/* Leave some slack so that hvmloader does not complain about lack of + * memory at boot time (Could not allocate order=0 extent). + * Once hvmloader is modified to cope with that situation without + * printing warning messages, QEMU_SPARE_PAGES can be removed. + */ +#define QEMU_SPARE_PAGES 16 + struct restore_ctx { unsigned long max_mfn; /* max mfn of the current host machine */ unsigned long hvirt_start; /* virtual starting address of the hypervisor */ @@ -209,12 +216,44 @@ static int uncanonicalize_pagetable( if (!ctx-hvm ctx-superpages) rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns); else +{ +xc_domaininfo_t info; +unsigned long free_pages; + +if ((xc_domain_getinfolist(xch, dom, 1, info) != 1) || +(info.domain != dom)) { +ERROR(Failed xc_domain_getinfolist for batch (uncanonicalize_pagetable)\n); +errno = ENOMEM; +return 0; +} +free_pages = info.max_pages - info.tot_pages; +if (free_pages QEMU_SPARE_PAGES) { +free_pages -= QEMU_SPARE_PAGES; +} else { +free_pages = 0; +} +if (free_pages nr_mfns) +{ +DPRINTF(Adjust memory for batch (uncanonicalize_pagetable): free_pages=%lu nr_mfns=%d max_pages=%lu tot_pages=%lu max_mfn=%lu\n, +free_pages, nr_mfns, (unsigned long)info.max_pages, +(unsigned long)info.tot_pages, ctx-max_mfn); +if (xc_domain_setmaxmem(xch, dom, +((info.max_pages + nr_mfns - free_pages) + (XC_PAGE_SHIFT - 10))) 0) +{ +ERROR(Failed xc_domain_setmaxmem for batch (uncanonicalize_pagetable)\n); +errno = ENOMEM; +return 0; +} +} rc = xc_domain_populate_physmap_exact(xch, dom, nr_mfns, 0, 0, ctx-p2m_batch); +} if (rc) { -ERROR(Failed to allocate memory for batch.!\n); +ERROR(Failed to allocate memory for batch. rc=%d nr_mfns=%d!\n, + rc, nr_mfns); errno = ENOMEM; return 0; } @@ -1241,12 +1280,44 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx, if (!ctx-hvm ctx-superpages) rc =
Re: [Xen-devel] [RFC PATCH COLO v5 01/29] Add readme
Wei, Thanks for the review and sorry for the late reply, was debugging some triple fault bug. However, it is now fixed, and COLO running much more stable now. For the readme, it is kind of outdated, we will update it and also address your comments, then we'll put it onto wiki page. The intree readme will be some simple desciption and links to wiki pages. On 04/09/2015 02:11 AM, Wei Liu wrote: On Wed, Apr 01, 2015 at 02:41:37PM +0800, Yang Hongyang wrote: From: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- docs/README.colo | 92 1 file changed, 92 insertions(+) create mode 100644 docs/README.colo diff --git a/docs/README.colo b/docs/README.colo new file mode 100644 index 000..60f487d --- /dev/null +++ b/docs/README.colo @@ -0,0 +1,92 @@ +COLO provides fault tolerance for virtual machines by sending continuous +checkpoints to a backup, which will activate if the target VM fails. It +only supports HVM guest(without pv extensions). ^ PV + +Requriements: +1. Hardware requriements + There is at least one directly connected nic to forward the nic from client + to secondary vm. The directly connected nic must not be used by any other + purpose. If your guest has more than one nic, you should have directly + connected nic for each guest nic. If you don't have enouth directly connected + nic, you can use vlan. +2. Dom0 requirements + - Support dom0 + - kernel module: +sch_ingress +cls_basic +cls_tcindex +cls_u32 +act_mirred + - libnl-tools = 3.0. This package provides the command nl-qdisc-list, and + colo need this command. ^ COLO + - If your host os has OEM-released xen tools, please uninstall it first. ^OS ^ Xen (and please fix other occurrences of wrong capitalisations as well) OK This is a very broad statement and it is not very helpful from both developers and users' point of view. Can you elaborate on what functionalities that COLO needs to have exclusive access to? + - You can load the module which is not provided by OEM. What does this mean? Means you may need to compile the module yourself, will make it clear. +3. Guest requirements + Only HVM guest(without pv extensions) is supported now. If you want to + use OEM released guest os, please use SUSE. REDHAT and Ubuntu is not + supported now because I don't find any way to disable pv extensions. + If you want to use REDHAT or Ubuntu, you need to build the newest + kernel which has the parameter xen_nopv. + FWIW, does xen_platform_pci=0 in your xl.cfg work for RH and Ubuntu guests? It works only if you compile the newer kernel by yourself which support this option. +Network link topology + Please refer to: http://wiki.qemu.org/Features/COLO#Network_link_topology + +The steps to setup COLO environment: +You need to recompile your host kernel because colo-proxy module need cooperate +with linux kernel. +Please refer to: http://wiki.qemu.org/Features/COLO#Test_environment_prepare +1. Build and install xen +2. Apply the patch for qemu xen, and rebuild xen tools: +- cd tools/qemu-xen-dir +- use git am to apply the patch: + https://raw.githubusercontent.com/wencongyang/colo-files/master/patch_for_qemu/*.patch +- make tools make install-tools +Note: You must use qemu-xen. qemu-xen-traditional is not supported. Note that you will eventually need to upstream your changes to QEMU. Sure, we have already posted the block patches to QEMU. It is under review now. http://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00399.html +3. Install COLO proxy module: +3.1 Download COLO proxy, compile and install it: +https://github.com/gao-feng/colo-proxy.git +3.2 Download iptables patch, it is based on v1.4.21 compile and install it: + https://github.com/gao-feng/colo-proxy/blob/master/colo-patch-for-kernel.patch +4. Install the guest +4.1 Add xen_platform_pci=0 into the guest configfile +4.2 If you use suse, please select physical machine +4.3 copy the disk image to the secondary host +5. Update your guest config file for COLO: +5.1 disk +disk = [ + 'format=raw,devtype=disk,access=w,vdev=hda,backendtype=qdisk,colo,colo-params=192.168.3.1:9000:exportname=qdisk1,active-disk=/mnt/ramfs/active_disk.img,hidden-disk=/mnt/ramfs/hidden_disk.img,target=/root/images/colo-hvm.img' ] It's unclear which parts are updated compared to the original config, i.e. can you list the additional bits to enable COLO? Presumably it's only those options starting with colo? mostly, will update the doc to make it clear. +5.2 nic +vif = [ 'mac=00:16:4f:00:00:11, bridge=br0, model=e1000, forwarddev=eth0, forwardbr=br1' ] +Note: +a. The
[Xen-devel] [linux-3.16 test] 50396: regressions - FAIL
flight 50396 linux-3.16 real [real] http://logs.test-lab.xenproject.org/osstest/logs/50396/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-credit2 17 guest-localmigrate/x10fail REGR. vs. 34167 Tests which are failing intermittently (not blocking): test-amd64-i386-pair 21 guest-migrate/src_host/dst_host fail pass in 50289 test-amd64-amd64-xl-qemuu-win7-amd64 9 windows-install fail pass in 50360 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-xsm 7 capture-logs(7) broken in 50360 blocked in 34167 test-armhf-armhf-libvirt-xsm 7 capture-logs(7) broken in 50360 blocked in 34167 test-armhf-armhf-libvirt 12 capture-logs(12) broken in 50360 blocked in 34167 test-armhf-armhf-xl-cubietruck 18 capture-logs(18) broken in 50360 blocked in 34167 test-armhf-armhf-xl 18 capture-logs(18) broken in 50360 blocked in 34167 test-armhf-armhf-xl-sedf-pin 18 capture-logs(18) broken in 50360 blocked in 34167 test-armhf-armhf-xl-sedf 18 capture-logs(18) broken in 50360 blocked in 34167 test-armhf-armhf-xl-multivcpu 18 capture-logs(18) broken in 50360 blocked in 34167 test-armhf-armhf-xl-credit2 18 capture-logs(18) broken in 50360 blocked in 34167 test-armhf-armhf-xl-arndale 18 capture-logs(18) broken in 50360 blocked in 34167 test-armhf-armhf-libvirt 9 guest-start fail in 50289 like 34167 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stopfail in 50289 never pass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass test-amd64-i386-xl-xsm 11 guest-start fail never pass test-amd64-amd64-xl-sedf 17 guest-localmigrate/x10 fail never pass test-amd64-amd64-xl-xsm 11 guest-start fail never pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass test-amd64-amd64-xl-multivcpu 17 guest-localmigrate/x10 fail never pass test-amd64-i386-libvirt-xsm 11 guest-start fail never pass test-armhf-armhf-xl-xsm 6 xen-boot fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-freebsd10-i386 9 freebsd-install fail never pass test-armhf-armhf-libvirt-xsm 6 xen-boot fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-i386-freebsd10-amd64 9 freebsd-install fail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-winxpsp3 16 guest-stopfail never pass test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 16 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 12 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-amd64-i386-xl-winxpsp3 16 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-sedf-pin 17 guest-localmigrate/x10 fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 16 guest-stopfail never pass version targeted for testing: linux0596eeda8839481a3a2d64b07116581c0fc5fa0a baseline version:
Re: [Xen-devel] [PATCH 4/4] hvmloader: add knob for fixed VGABIOS date string
On 02.04.15 at 11:45, ian.campb...@citrix.com wrote: On Wed, 2015-04-01 at 13:28 +, Olaf Hering wrote: To allow reproducible builds of hvmloader introduce a make variable VGABIOS_REL_DATE=dd Mon to provide a fixed date string. Without this change the hvmloader binary changes with every rebuild. Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com It's not clear if vgabios should be maintained by us tools folks or if it should have been taken under the hvmloader umbrella and maintained by the hypervisor folks. I'm not sure either - hvmloader is really code we grew and hence maintaining it together with its hypervisor side makes sense. For the various firmware blobs, otoh, I think they match more closely with the qemu/tools pattern. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code
Add ARM PCI Support --- a) Place holder functions are added for pci_conf_read/write calls. b) Macros dev_is_pci, pci_to_dev are implemented in drivers/passthrough/pci/arm code Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com --- xen/arch/arm/Makefile|1 + xen/arch/arm/pci.c | 60 +++ xen/drivers/passthrough/arm/Makefile |1 + xen/drivers/passthrough/arm/pci.c| 88 ++ xen/drivers/passthrough/arm/smmu.c |1 - xen/drivers/passthrough/pci.c|9 ++-- xen/include/asm-arm/device.h | 33 + xen/include/asm-arm/domain.h |3 ++ xen/include/asm-arm/pci.h|7 +-- 9 files changed, 186 insertions(+), 17 deletions(-) diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 935999e..aaf5d88 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -39,6 +39,7 @@ obj-y += device.o obj-y += decode.o obj-y += processor.o obj-y += smc.o +obj-$(HAS_PCI) += pci.o #obj-bin-y += o diff --git a/xen/arch/arm/pci.c b/xen/arch/arm/pci.c new file mode 100644 index 000..9438f41 --- /dev/null +++ b/xen/arch/arm/pci.c @@ -0,0 +1,60 @@ +#include xen/pci.h + +void _pci_conf_write( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint32_t data, int bytes) +{ +} + +uint32_t _pci_conf_read( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, int bytes) +{ +return 0; +} + +uint8_t pci_conf_read8( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 1); +} + +uint16_t pci_conf_read16( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 2); +} + +uint32_t pci_conf_read32( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 4); +} + +void pci_conf_write8( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint8_t data) +{ + _pci_conf_write(seg, bus, dev, func, reg, data, 1); +} + +void pci_conf_write16( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint16_t data) +{ + _pci_conf_write(seg, bus, dev, func, reg, data, 2); +} + +void pci_conf_write32( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint32_t data) +{ + _pci_conf_write(seg, bus, dev, func, reg, data, 4); +} + +void arch_pci_ro_device(int seg, int bdf) +{ +} diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile index f4cd26e..1a41549 100644 --- a/xen/drivers/passthrough/arm/Makefile +++ b/xen/drivers/passthrough/arm/Makefile @@ -1,2 +1,3 @@ obj-y += iommu.o obj-y += smmu.o +obj-$(HAS_PCI) += pci.o diff --git a/xen/drivers/passthrough/arm/pci.c b/xen/drivers/passthrough/arm/pci.c new file mode 100644 index 000..07a5a78 --- /dev/null +++ b/xen/drivers/passthrough/arm/pci.c @@ -0,0 +1,88 @@ +/* + * PCI helper functions for arm for passthrough support. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) 2015 Cavium Networks + * + * Author: Manish Jaggi manish.ja...@cavium.com + */ +#include xen/pci.h +#include asm/device.h +#include xen/sched.h + +int _dump_pci_devices(struct pci_seg *pseg, void *arg) +{ +struct pci_dev *pdev; + +printk( segment %04x \n, pseg-nr); + +list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list ) +{ +printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs , + pseg-nr, pdev-bus, + PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn), + pdev-domain ? pdev-domain-domain_id : -1, + (pdev-node != NUMA_NO_NODE) ? pdev-node : -1); +printk(\n); +} + +return 0; +} + +struct device* pci_to_dev(struct pci_dev *pci) +{ +if(!pci-arch.dev) +{ +device_t *dev; +dev = xzalloc (struct device); +pci-arch.dev = dev; +dev-class = DEVICE_PCI; +dev-type = DEV_ENUMERATED; +} +return pci-arch.dev; +} + +struct pci_dev* to_pci_dev(struct device *dev) +{ +if(dev-class == DEVICE_PCI) +return
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. In any case we should make it clear somewhere who is supposed to write to the command register (and other PCI reigsters) at any given time, otherwise it would be very easy for a new kernel update to break the hypervisor and we wouldn't even know why it happened. We should, but at this point in time this is going to be rather problematic. Such a separation of responsibilities should have been done before all the pass-through code got written. @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_ } break; case PCI_CAP_ID_MSIX: -{ -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET; -writel(flag, entry-mask_base + offset); -readl(entry-mask_base + offset); -break; -} +if ( likely(memory_decoded(pdev)) ) +{ +writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +break; +} +if ( flag ) +{ +u16 control; +domid_t domid = pdev-domain-domain_id; + +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos)); +if ( control PCI_MSIX_FLAGS_MASKALL ) +break; +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), + control | PCI_MSIX_FLAGS_MASKALL); How is that going to interact with Linux (Dom0) writing to the command register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for devices assigned to HVM guests. Could this cause any conflicts? I don't see the relation to the comment register here - all quoted code only deals with the MSI-X control register. One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but as a matter of fact QEMU has done that for years and we cannot break that behaviour without introducing regressions. In fact as it stands QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM guests, not Xen. Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed through devices to running domains? I think that might be a good enough separation of responsibilities between Xen and QEMU. No, the bits has to be under hypervisor control (just like any other hardware bits controlling interrupts). Dealing with this is the subject of patches I have ready, but didn't post yet. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 7/7] Add test job for nest test case
1. This patch adds creation of the nested test job, when job creation procedure is invoked. 2. Set nested L1's vif model as e1000 by make-flight. Signed-off-by: longtao.pang longtaox.p...@intel.com --- Changes in v8: 1. Use 'nestedl1' and 'nestedl2' as L1 and L2's clearer name in nested test job. 2. Setting Debian_ISO as 'debian-7.2.0-amd64-CD-1.iso' in make-flight. --- make-flight | 20 1 file changed, 20 insertions(+) diff --git a/make-flight b/make-flight index cc8ecdb..40ca808 100755 --- a/make-flight +++ b/make-flight @@ -204,6 +204,25 @@ do_hvm_win7_x64_tests () { all_hostflags=$most_hostflags,hvm } +do_hvm_debian_nested_tests () { + if [ $xenarch != amd64 -a $dom0arch != amd64 ]; then +return + fi + + job_create_test test-$xenarch$kern-$dom0arch-nested test-nested xl \ +$xenarch $dom0arch \ +nestedl1_image=debian-7.2.0-amd64-CD-1.iso \ +nestedl2_image=debian-7.2.0-amd64-CD-1.iso \ +bios=seabios \ +kernbuildjob=build-amd64-pvops \ +kernkind=pvops \ +nestedl1_vifmodel='e1000' \ +nested_disk='15000' \ +nestedl1_memory='3072' \ +device_model_version=qemu-xen \ +all_hostflags=$most_hostflags,hvm +} + do_hvm_debian_test_one () { testname=$1 bios=$2 @@ -430,6 +449,7 @@ test_matrix_do_one () { done fi + do_hvm_debian_nested_tests #do_passthrough_tests } -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 4/7] Changes on test step of Debian hvm guest install
1. Increase disk size to accommodate to nested test requirement. 2. Since 'Debain-xxx-.iso' image will be stored in rootfs of L1 guest, therefore needs more disk capacity, increase root partition size in preseed generation. 3. In L1 installation context, assign more memory to it; Since it acts as a nested hypervisor anyway. 4. Comment out CDROM entry in sources.list to make HTTP URL entry available for L1 hvm guest. 5. Enable nestedhvm feature in ExtraConfig for nested job. Signed-off-by: longtao.pang longtaox.p...@intel.com --- Changes in v8: 1. Update a conventional way to comment out CDROM entry in sources.list. 2. Enable nestedhvm feature only for the nested job. 3. Set nested disk and memroy size to be driven from runvars in 'ts-debian-hvm-install'. --- ts-debian-hvm-install |9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/ts-debian-hvm-install b/ts-debian-hvm-install index cfd5144..13009d0 100755 --- a/ts-debian-hvm-install +++ b/ts-debian-hvm-install @@ -36,7 +36,7 @@ our $ho= selecthost($whhost); # guest memory size will be set based on host free memory, see below our $ram_mb; -our $disk_mb= 1; +our $disk_mb= $r{'nested_disk'} ? $r{'nested_disk'} : 1; our $guesthost= $gn.guest.osstest; our $gho; @@ -60,7 +60,7 @@ d-i partman-auto/expert_recipe string \\ use_filesystem{ } filesystem{ vfat } \\ mountpoint{ /boot/efi } \\ . \\ -5000 50 5000 ext4 \\ +1 50 1 ext4 \\ method{ format } format{ } \\ use_filesystem{ } filesystem{ ext4 } \\ mountpoint{ / } \\ @@ -75,6 +75,7 @@ d-i preseed/late_command string \\ in-target mkdir -p /boot/efi/EFI/boot; \\ in-target cp /boot/efi/EFI/debian/grubx64.efi /boot/efi/EFI/boot/bootx64.efi ;\\ in-target mkdir -p /root/.ssh; \\ +in-target sed -i 's/^deb *cdrom/#/g' /etc/apt/sources.list; \\ in-target sh -c echo -e '$authkeys' /root/.ssh/authorized_keys; END return $preseed_file; @@ -149,9 +150,11 @@ sub prep () { target_putfilecontents_root_stash($ho, 10, preseed(), $preseed_file_path); +my $nestedhvm_feature = $gn eq 'nestedl1' ? 'nestedhvm=1' : ''; more_prepareguest_hvm($ho,$gho, $ram_mb, $disk_mb, OnReboot = 'preserve', Bios = $r{bios}, + ExtraConfig = $nestedhvm_feature, PostImageHook = sub { my $cmds = iso_copy_content_from_image($gho, $newiso); $cmds .= prepare_initrd($initrddir,$newiso,$preseed_file_path); @@ -174,7 +177,7 @@ my $ram_lots = 5000; if ($host_freemem_mb $ram_lots * 2 + $ram_minslop) { $ram_mb = $ram_lots; } else { -$ram_mb = 768; +$ram_mb = $r{${gn}_memory} ? $r{${gn}_memory} : 768; } logm(Host has $host_freemem_mb MB free memory, setting guest memory size to $ram_mb MB); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 3/7] Edit some APIs in TestSupport.pm for nested test
1. Designate vif model by make-flight. 2. In L2 installation context, its host (L1) IP address is not queried from DNS, but after running ts-nested-setup + host + nested, L1 IP is stored in runvar. Signed-off-by: longtao.pang longtaox.p...@intel.com --- Changes in v8: Remove the unnecessary symbol of ';' in 'TestSupport.pm'. --- Osstest/TestSupport.pm |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index 1bde67d..4cdacfc 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -818,6 +818,10 @@ sub selecthost ($) { logm(Host $name is in HostGroup $ho-{Properties}{HostGroup}); } +if ( $r{${name}_ip} ) { +$setprop-(IpAddr, $r{${name}_ip}); +} + # Next, we use the config file's general properites as defaults foreach my $k (keys %c) { next unless $k =~ m/^HostProp_([A-Z].*)$/; @@ -1548,11 +1552,13 @@ sub prepareguest_part_xencfg ($) { my $oncrash= $xopts-{OnCrash} || 'preserve'; my $vcpus= guest_var($gho, 'vcpus', $xopts-{DefVcpus} || 2); my $xoptcfg= $xopts-{ExtraConfig}; +my $vif= guest_var($gho, 'vifmodel',''); +my $vifmodel= $vif ? model=$vif : ''; $xoptcfg='' unless defined $xoptcfg; my $xencfg= END; name= '$gho-{Name}' memory = ${ram_mb} -vif = [ 'type=ioemu,mac=$gho-{Ether}' ] +vif = [ 'type=ioemu,${vifmodel},mac=$gho-{Ether}' ] # on_poweroff = '$onpoweroff' on_reboot = '$onreboot' -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 6/7] Compose the main recipe of nested test job
Signed-off-by: longtao.pang longtaox.p...@intel.com --- Changes in v8: Change the patch order from 6 to 5. --- sg-run-job | 11 +++ 1 file changed, 11 insertions(+) diff --git a/sg-run-job b/sg-run-job index eae159d..2ca5ebf 100755 --- a/sg-run-job +++ b/sg-run-job @@ -299,6 +299,17 @@ proc run-job/test-pair {} { #run-ts . remus-failover ts-remus-check src_host dst_host + debian } +proc need-hosts/test-nested {} {return host} +proc run-job/test-nested {} { +run-ts . = ts-debian-hvm-install + host + nestedl1 +run-ts . = ts-nested-setup + host + nestedl1 +run-ts . = ts-xen-install nested_l1 +run-ts . = ts-host-reboot nested_l1 +run-ts . = ts-debian-hvm-install nested_l1 nestedl2 +run-ts . = ts-guest-stop nested_l1 nestedl2 +run-ts . = ts-guest-destroy host nestedl1 +} + proc test-guest-migr {g} { if {[catch { run-ts . = ts-migrate-support-check + host $g }]} return -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 5/7] Add new script to customize nested test configuration
1. In this script, make some appropriate runvars which selecthost would recognise. 2. Prepare the configurations for installing L2 guest VM. 3. Create a lv disk in L0 and hot-attach it to L1, need to restart L1 to make the block disk to be recognized by L1 system, then using this disk to create a VG that used for installing L2. Signed-off-by: longtao.pang longtaox.p...@intel.com --- Changes in v8: 1. Replace '$nested_host' by '$l1-{Guest}'. --- ts-nested-setup | 52 1 file changed, 52 insertions(+) create mode 100755 ts-nested-setup diff --git a/ts-nested-setup b/ts-nested-setup new file mode 100755 index 000..41d5e80 --- /dev/null +++ b/ts-nested-setup @@ -0,0 +1,52 @@ +#!/usr/bin/perl -w +# This is part of osstest, an automated testing framework for Xen. +# Copyright (C) 2015 Intel Inc. +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + +use strict qw(vars); +use DBI; +use Osstest; +use Osstest::Debian; +use Osstest::TestSupport; + +tsreadconfig(); +our ($l0,$l1) = ts_get_host_guest(@ARGV); + +guest_check_ip($l1); +target_cmd_root($l1, mkdir -p /home/osstest/.ssh cp /root/.ssh/authorized_keys /home/osstest/.ssh/); +my $url = $c{WebspaceUrl}.$c{WebspaceCommon}.$l0-{Name}._.'overlay.tar'; +target_cmd_root($l1, END); +wget -O overlay.tar $url +tar -xf overlay.tar -C / +rm overlay.tar -f +update-rc.d osstest-confirm-booted start 99 2 . +END + +store_runvar('nested_l1',$l1-{Guest}); +store_runvar($l1-{Guest}_ip,$l1-{Ip}); + +my $l2_disk_mb = 2; +my $l2_lv_name = 'nestedl2-disk'; +my $vgname = $l0-{Name}; +$vgname .= .$c{TestHostDomain} if ($l0-{Suite} =~ m/lenny/); +target_cmd_root($l0, END); +lvremove -f /dev/${vgname}/${l2_lv_name} ||: +lvcreate -L ${l2_disk_mb}M -n $l2_lv_name $vgname +dd if=/dev/zero of=/dev/${vgname}/${l2_lv_name} count=10 +xl block-attach $l1-{Name} /dev/${vgname}/${l2_lv_name},raw,sdb,rw +END +target_install_packages_norec($l1, qw(lvm2 rsync genisoimage)); +target_reboot($l1); +target_cmd_root($l1, pvcreate /dev/xvdb vgcreate ${l2_lv_name}_vg /dev/xvdb); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 2/7] Changes to support '/boot' leading paths of kernel, xen, in grub
Support situations of grub that have vmlinuz and other things starting with path of '/boot' rather than '/'. Signed-off-by: longtao.pang longtaox.p...@intel.com --- Osstest/Debian.pm |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm index 378af54..1a57cdd 100644 --- a/Osstest/Debian.pm +++ b/Osstest/Debian.pm @@ -449,21 +449,21 @@ sub setupboot_grub2 () { if (m/^submenu\s+[\'\](.*)[\'\].*\{\s*$/) { $submenu={ StartLine =$.}; } -if (m/^\s*multiboot\s*\/(xen\-[0-9][-+.0-9a-z]*\S+)/) { +if (m/^\s*multiboot\s*(?:\/boot)?\/(xen\S+)/) { die unless $entry; $entry-{Hv}= $1; } -if (m/^\s*multiboot\s*\/(vmlinu[xz]-(\S+))/) { +if (m/^\s*multiboot\s*(?:\/boot)?\/(vmlinu[xz]-(\S+))/) { die unless $entry; $entry-{KernOnly}= $1; $entry-{KernVer}= $2; } -if (m/^\s*module\s*\/(vmlinu[xz]-(\S+))/) { +if (m/^\s*module\s*(?:\/boot)?\/(vmlinu[xz]-(\S+))/) { die unless $entry; $entry-{KernDom0}= $1; $entry-{KernVer}= $2; } -if (m/^\s*module\s*\/(initrd\S+)/) { +if (m/^\s*module\s*(?:\/boot)?\/(initrd\S+)/) { $entry-{Initrd}= $1; } if (m/^\s*module\s*\/(xenpolicy\S+)/) { -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 1/7] parsing grub which has 'submenu' primitive
From a hvm kernel build from Linux stable Kernel tree, the auto generated grub2 menu will have 'submenu' primitive, upon the 'menuentry' items. Xen boot entries will be grouped into a submenu. This patch adds capability to support such grub formats. Signed-off-by: longtao.pang longtaox.p...@intel.com --- Changes in v8: Separate unrelated codes to the use of submenu. --- Osstest/Debian.pm | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/Osstest/Debian.pm b/Osstest/Debian.pm index 69530fb..378af54 100644 --- a/Osstest/Debian.pm +++ b/Osstest/Debian.pm @@ -404,10 +404,18 @@ sub setupboot_grub2 () { my $count= 0; my $entry; +my $submenu; while ($f) { next if m/^\s*\#/ || !m/\S/; if (m/^\s*\}\s*$/) { -die unless $entry; +die unless $entry || $submenu; +if(!defined $entry defined $submenu){ +logm(Met end of a submenu starting from . +$submenu-{StartLine}. . +Our want kern is $want_kernver); +$submenu=undef; +next; +} my (@missing) = grep { !defined $entry-{$_} } (defined $xenhopt @@ -438,6 +446,9 @@ sub setupboot_grub2 () { $entry= { Title = $1, StartLine = $., Number = $count }; $count++; } +if (m/^submenu\s+[\'\](.*)[\'\].*\{\s*$/) { +$submenu={ StartLine =$.}; +} if (m/^\s*multiboot\s*\/(xen\-[0-9][-+.0-9a-z]*\S+)/) { die unless $entry; $entry-{Hv}= $1; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI
Xen currently does not have PCI support for ARM builds. This patch set makes the code compilable for ARM PCI and adds places-holder code which would be replaced with PCI pass-through support patch series. Re-factor MSI Handling - There is a some x86 specific code which is found in common code: xen/drivers/passthrough/pci.c which needs to be re factored. MSI/X are configured and handled by dom0 or domU code on ARM64 and is not required to be part of common code. However there are functions which are used as part of common code and calls to these functions cannot be easily re factored like pci_cleanup_msi. xen/drivers/passthrough/arch/pci.c files handle these functions. Add ARM PCI Support --- a) Place holder functions are added for pci_conf_read/write calls. b) Macros dev_is_pci, pci_to_dev are implemented in drivers/passthrough/pci/arm code Manish Jaggi (2): xen/x86: Patch re-factors MSI/X config code from drivers/passthrough/pci.c to x86 specific xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code xen/arch/arm/Makefile|1 + xen/arch/arm/pci.c | 60 ++ xen/drivers/passthrough/arm/Makefile |1 + xen/drivers/passthrough/arm/pci.c| 88 ++ xen/drivers/passthrough/arm/smmu.c |1 - xen/drivers/passthrough/pci.c| 111 +++- xen/drivers/passthrough/x86/Makefile |1 + xen/drivers/passthrough/x86/pci.c| 115 ++ xen/include/asm-arm/device.h | 33 +++--- xen/include/asm-arm/domain.h |3 + xen/include/asm-arm/pci.h|7 ++- xen/include/asm-x86/msi.h|1 - xen/include/xen/pci.h| 20 +- 13 files changed, 323 insertions(+), 119 deletions(-) create mode 100644 xen/arch/arm/pci.c create mode 100644 xen/drivers/passthrough/arm/pci.c create mode 100644 xen/drivers/passthrough/x86/pci.c -- 1.7.9.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] preparing for 4.5.1
On 02.04.15 at 12:01, ian.campb...@citrix.com wrote: On Thu, 2015-03-26 at 17:07 +, Jan Beulich wrote: having been released mid January, it is time to get ready for 4.5.1-rc1. Please reply with backport requests which you consider necessary but still missing. Please also be patient with them being pulled in - I'll be gone the coming two weeks, and I'd hope to get an RC1 put together soon thereafter. On my list for ARM I have this patch, which also touches x86. What do you think? If it fixes a real problem in 4.5 (rather than just a theoretical one, exposed only in 4.6), then I'm fine with this (and assume since it's mostly for ARM you'll take care of doing and applying the backport). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Outstanding fixes for Linux 4.1?
Hi David, On 04/03/2015 03:22 AM, Konrad Rzeszutek Wilk wrote: On Thu, Apr 02, 2015 at 05:21:50PM +0100, David Vrabel wrote: All, Are there any outstanding patches that should be added to Linux 4.1 (the merge window for which will be opening shortly)? Bob's two patches to the deferred grant mechanism. I've already posted them to xen-devel several days ago. [PATCH 1/2] xen/blkback: safely unmap purge persistent grants http://lists.xenproject.org/archives/html/xen-devel/2015-04/msg00392.html [PATCH v2 2/2] xen/grant: introduce func gnttab_unmap_refs_sync() http://lists.xenproject.org/archives/html/xen-devel/2015-04/msg00393.html Should I rebase them to your latest tree and repost? Thanks, -Bob Also the XSA-120 addendum. Will send that out on Friday along with some other potential fixes. So far I have queued: Dan Carpenter (1): xen/mce: fix up xen_late_init_mcelog() error handling David Vrabel (2): xen: unify foreign GFN map/unmap for auto-xlated physmap guests xen/privcmd: improve performance of MMAPBATCH_V2 Jan Beulich (1): xen-pciback: also support disabling of bus-mastering and memory-write-inva Juergen Gross (6): xen: build infrastructure for generating hypercall depending symbols xen: synchronize include/xen/interface/xen.h with xen xen: use generated hypervisor symbols in arch/x86/xen/trace.c xen: use generated hypercall symbols in arch/x86/xen/xen-head.S xen: scsiback: add LUN of restored domain xen: support suspend/resume in pvscsi frontend Konrad Rzeszutek Wilk (3): xen/pciback: Don't print scary messages when unsupported by hypervisor. x86/xen: Provide a Xen PV APIC driver to support 255 VCPUs x86/xen/apic: WARN with details. Takashi Iwai (2): xen: pcpu: Use static attribute groups for sysfs entry xen: balloon: Use static attribute groups for sysfs entries Tao Chen (1): xen-scsiback: define a pr_fmt macro with xen-pvscsi David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST Nested PATCH v8 0/7] Introduction of netsted HVM test job
This patch set adds nested HVM test case for osstest. In this test case, a Xen hypervisor (L1) runs on top of another Xen hypervisor (L0). Upon L1 hypervisor, we will then create a nested guest (L2), and test if the Linux guest can then be installed and run well. About nested Xen virtualization, refer to http://wiki.xenproject.org/wiki/Nested_Virtualization_in_Xen. Test steps 0. To run osstest in standalone mode, write a config file in '~/.xen-osstest/config', and then create a standalone.config file to define 'TREE_LINUX', 'REVISION_LINUX' which will be used for nested test. The directory path of 'Debian Images' could be defined in '~/.xen-osstest/config'. 1. run 'build-amd64' job and then 'build-amd64-pvops', to prepare xen installation tarball and hvm guest kernel. 2. run 'test-amd64-amd6-nested' job, it does following: a. invoke test step of 'ts-debain-hvm-install' to install a normal HVM guest b. invoke test step of 'ts-nested-setup' to make some appropriate runvars which selecthost() would recognise and prepare the configurations for installing L2 guest VM. c. invoke test step of 'ts-xen-install' to install xen on the normal guest, alter it into a L1 hypervisor d. invoke test step of 'ts-debain-hvm-install' again, but take the L1 hypervisor as host, install the L2 guest on it e. invoke test step of 'ts-guest-stop', stop L2 guest. f. invoke test step of 'ts-guest-destroy' to destroy L1 guest. This patch set reuse 'ts-debian-hvm-install' for both L1 installation and L2 installation, define 'nestedl1' and 'nestedl2' as L1 and L2's hostname, define 'nested_l1 as L1's host identity. It also reuses 'ts-xen-install' with L1's hostname 'nestedl1' input parameter to differentiate from L0 Xen installation. This patch series has been tested on test machines of amd64 arch, Debian-7.2.0-amd64 as guests OS, with hvm domain0 of Linux kernel 3.18.5, in standalone mode. Also, we use linux-stable tree as domain0 kernel source. longtao.pang (7): parsing grub which has 'submenu' primitive Changes to support '/boot' leading paths of kernel, xen, in grub Edit some APIs in TestSupport.pm for nested test Changes on test step of Debian hvm guest install Add new script to customize nested test configuration Compose the main recipe of nested test job Add test job for nest test case Osstest/Debian.pm | 21 - Osstest/TestSupport.pm |8 +++- make-flight| 20 sg-run-job | 11 +++ ts-debian-hvm-install |9 ++--- ts-nested-setup| 52 6 files changed, 112 insertions(+), 9 deletions(-) create mode 100755 ts-nested-setup ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/2] xen/x86: Patch re-factors MSI/X config code from, drivers/passthrough/pci.c to x86 specific
This patch re-factors msi specific code to x86 specific files from xen/drivers/passthrough/pci.c. Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com --- xen/drivers/passthrough/pci.c| 102 +- xen/drivers/passthrough/x86/Makefile |1 + xen/drivers/passthrough/x86/pci.c| 115 ++ xen/include/asm-x86/msi.h|1 - xen/include/xen/pci.h| 20 +- 5 files changed, 137 insertions(+), 102 deletions(-) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index ecd061e..004aba9 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -23,7 +23,6 @@ #include xen/iommu.h #include xen/irq.h #include asm/hvm/iommu.h -#include asm/hvm/irq.h #include xen/delay.h #include xen/keyhandler.h #include xen/event.h @@ -33,21 +32,6 @@ #include xen/softirq.h #include xen/tasklet.h #include xsm/xsm.h -#include asm/msi.h - -struct pci_seg { -struct list_head alldevs_list; -u16 nr; -unsigned long *ro_map; -/* bus2bridge_lock protects bus2bridge array */ -spinlock_t bus2bridge_lock; -#define MAX_BUSES 256 -struct { -u8 map; -u8 bus; -u8 devfn; -} bus2bridge[MAX_BUSES]; -}; spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED; static struct radix_tree_root pci_segments; @@ -282,22 +266,10 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) *((u8*) pdev-bus) = bus; *((u8*) pdev-devfn) = devfn; pdev-domain = NULL; -INIT_LIST_HEAD(pdev-msi_list); - -if ( pci_find_cap_offset(pseg-nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), - PCI_CAP_ID_MSIX) ) +if (!pci_alloc_msix (pdev, pseg, bus, devfn)) { -struct arch_msix *msix = xzalloc(struct arch_msix); - -if ( !msix ) -{ -xfree(pdev); -return NULL; -} -spin_lock_init(msix-table_lock); -pdev-msix = msix; +return NULL; } - list_add(pdev-alldevs_list, pseg-alldevs_list); /* update bus2bridge */ @@ -755,54 +727,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) return ret; } -static int pci_clean_dpci_irq(struct domain *d, - struct hvm_pirq_dpci *pirq_dpci, void *arg) -{ -struct dev_intx_gsi_link *digl, *tmp; - -pirq_guest_unbind(d, dpci_pirq(pirq_dpci)); - -if ( pt_irq_need_timer(pirq_dpci-flags) ) -kill_timer(pirq_dpci-timer); - -list_for_each_entry_safe ( digl, tmp, pirq_dpci-digl_list, list ) -{ -list_del(digl-list); -xfree(digl); -} - -return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0; -} - -static int pci_clean_dpci_irqs(struct domain *d) -{ -struct hvm_irq_dpci *hvm_irq_dpci = NULL; - -if ( !iommu_enabled ) -return 0; - -if ( !is_hvm_domain(d) ) -return 0; - -spin_lock(d-event_lock); -hvm_irq_dpci = domain_get_irq_dpci(d); -if ( hvm_irq_dpci != NULL ) -{ -int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); - -if ( ret ) -{ -spin_unlock(d-event_lock); -return ret; -} - -d-arch.hvm_domain.irq.dpci = NULL; -free_hvm_irq_dpci(hvm_irq_dpci); -} -spin_unlock(d-event_lock); -return 0; -} - int pci_release_devices(struct domain *d) { struct pci_dev *pdev; @@ -1186,28 +1110,6 @@ bool_t pcie_aer_get_firmware_first(const struct pci_dev *pdev) } #endif -static int _dump_pci_devices(struct pci_seg *pseg, void *arg) -{ -struct pci_dev *pdev; -struct msi_desc *msi; - -printk( segment %04x \n, pseg-nr); - -list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list ) -{ -printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs , - pseg-nr, pdev-bus, - PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn), - pdev-domain ? pdev-domain-domain_id : -1, - (pdev-node != NUMA_NO_NODE) ? pdev-node : -1); -list_for_each_entry ( msi, pdev-msi_list, list ) - printk(%d , msi-irq); -printk(\n); -} - -return 0; -} - static void dump_pci_devices(unsigned char ch) { printk( PCI devices \n); diff --git a/xen/drivers/passthrough/x86/Makefile b/xen/drivers/passthrough/x86/Makefile index a70cf94..a2bcf94 100644 --- a/xen/drivers/passthrough/x86/Makefile +++ b/xen/drivers/passthrough/x86/Makefile @@ -1,2 +1,3 @@ obj-y += ats.o obj-y += iommu.o +obj-$(HAS_PCI) += pci.o diff --git a/xen/drivers/passthrough/x86/pci.c b/xen/drivers/passthrough/x86/pci.c new file mode 100644 index 000..cf37b0a --- /dev/null +++ b/xen/drivers/passthrough/x86/pci.c @@ -0,0 +1,115 @@ +/* + * x86 specific code for PCI MSI + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + *
[Xen-devel] linux-next: build failure after merge of the xen-tip tree
Hi all, After merging the xen-tip tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/char/tpm/xen-tpmfront.c: In function 'setup_ring': drivers/char/tpm/xen-tpmfront.c:203:7: warning: passing argument 2 of 'xenbus_grant_ring' makes pointer from integer without a cast rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr)); ^ In file included from drivers/char/tpm/xen-tpmfront.c:17:0: include/xen/xenbus.h:206:5: note: expected 'void *' but argument is of type 'long unsigned int' int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr, ^ drivers/char/tpm/xen-tpmfront.c:203:7: error: too few arguments to function 'xenbus_grant_ring' rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr)); ^ In file included from drivers/char/tpm/xen-tpmfront.c:17:0: include/xen/xenbus.h:206:5: note: declared here int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr, ^ Caused by commit 1b1586eeeb8c (xenbus_client: Extend interface to support multi-page ring). I have used the xen-tip tree from next-20150410 for today. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpc23mvwiL3u.pgp Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Question. Inject virq to Domain on Xen ARM.
HI I have a question about Inject virq to Domain on Xen ARM. Function 'vgic_vcpu_inject_irq' is inject virq to target vcpu. At the end of vgic_vcpu_inject_irq, like below -- running = v-gt;is_running; vcpu_unblock(v); if ( running amp;amp; v != current ) smp_send_event_check_mask(cpumask_of(v-gt;processor)); -- In code if target vcpu is not current, send SGI to v-gt;processor; I think this function help inject virq to target vcpu immediately in Xen 4.5.0. In Xen 4.5.0, vcpu that receive SGI, will execute 'enter_hypervisor_head' function and move irq from lr_queue to linked register. But I think that in Xen 4.4.X above code is not help inject virq. because there is code to move irq from lr_queue to linked register in hyp handler routine. Thanks ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On 01.04.15 at 11:59, m...@redhat.com wrote: On Wed, Apr 01, 2015 at 10:41:12AM +0100, Andrew Cooper wrote: On 01/04/15 10:20, Stefano Stabellini wrote: CC'ing the author of the patch and xen-devel. FYI I think that Jan is going to be on vacation for a couple of weeks. On Wed, 1 Apr 2015, Michael S. Tsirkin wrote: On Tue, Mar 31, 2015 at 03:18:03PM +0100, Stefano Stabellini wrote: From: Jan Beulich jbeul...@suse.com Otherwise the guest can abuse that control to cause e.g. PCIe Unsupported Request responses (by disabling memory and/or I/O decoding and subsequently causing [CPU side] accesses to the respective address ranges), which (depending on system configuration) may be fatal to the host. This is CVE-2015-2756 / XSA-126. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com The patch description seems somewhat incorrect to me. UR should not be fatal to the system, and it's not platform specific. I think that people have been able to repro this, but I'll let Jan comment on it. Depending on how the BIOS sets up AER (if even available), a UR can very easily be fatal to the system. If firmware first mode is set, Xen (or indeed Linux) can't fix a problematic setup. Experimentally, doing so can cause infinite loops in certain vendors SMM handlers. I think it can, just disable UR reporting, this is up to OS. This is what the PCI spec says - you have snipped the relevant part out from the mail you are replying to. As already said by Andrew, the OS must not do such when the system is in APEI firmware first mode. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.3-testing test] 50394: regressions - FAIL
flight 50394 xen-4.3-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/50394/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-rumpuserxen 4 capture-logs!broken in 50358 [st=!broken!] build-i386-rumpuserxen4 capture-logs!broken in 50358 [st=!broken!] build-amd64-rumpuserxen 3 host-install(3) broken in 50358 REGR. vs. 36755 build-i386-rumpuserxen 3 host-install(3) broken in 50358 REGR. vs. 36755 test-amd64-i386-freebsd10-i386 10 guest-start fail REGR. vs. 36755 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-sedf 3 host-install(3) broken in 50358 pass in 50394 test-amd64-amd64-xl-qemuu-winxpsp3 15 guest-localmigrate/x10 fail in 50358 pass in 50394 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail pass in 50358 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-credit2 7 capture-logs(7) broken in 50358 blocked in 36755 test-armhf-armhf-xl-cubietruck 7 capture-logs(7) broken in 50358 blocked in 36755 test-armhf-armhf-libvirt7 capture-logs(7) broken in 50358 blocked in 36755 test-armhf-armhf-xl-multivcpu 7 capture-logs(7) broken in 50358 blocked in 36755 test-armhf-armhf-xl 7 capture-logs(7) broken in 50358 blocked in 36755 test-armhf-armhf-xl-sedf-pin 7 capture-logs(7) broken in 50358 blocked in 36755 test-armhf-armhf-xl-arndale 7 capture-logs(7) broken in 50358 blocked in 36755 test-amd64-i386-pair21 guest-migrate/src_host/dst_host fail like 36755 Tests which did not succeed, but are not blocking: test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-armhf-armhf-xl-sedf 4 capture-logs(4) broken in 50358 never pass test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 50358 never pass test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass test-armhf-armhf-xl-credit2 6 xen-boot fail never pass test-armhf-armhf-xl-cubietruck 6 xen-boot fail never pass test-armhf-armhf-libvirt 6 xen-boot fail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 6 xen-boot fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 16 guest-stop fail never pass test-amd64-i386-xend-winxpsp3 20 leak-check/check fail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass build-amd64-rumpuserxen 6 xen-buildfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-armhf-armhf-xl-sedf 6 xen-boot fail never pass test-armhf-armhf-xl 6 xen-boot fail never pass test-armhf-armhf-xl-sedf-pin 6 xen-boot fail never pass test-armhf-armhf-xl-arndale 6 xen-boot fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 20 leak-check/checkfail never pass version targeted for testing: xen 46ed0083a76efa82713ea979b312fa69250380b2 baseline version: xen c58b16ef1572176cf2f6a424b527b5ed4bb73f17 People who touched revisions under test: Andrew Cooper andrew.coop...@citrix.com Ian Campbell ian.campb...@citrix.com Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com Konrad Rzeszutek Wilk konrad.w...@oracle.com jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass
Re: [Xen-devel] linux-next: build failure after merge of the xen-tip tree
Hi Stephen, On 04/13/2015 04:09 PM, Stephen Rothwell wrote: Hi all, After merging the xen-tip tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/char/tpm/xen-tpmfront.c: In function 'setup_ring': drivers/char/tpm/xen-tpmfront.c:203:7: warning: passing argument 2 of 'xenbus_grant_ring' makes pointer from integer without a cast rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr)); ^ In file included from drivers/char/tpm/xen-tpmfront.c:17:0: include/xen/xenbus.h:206:5: note: expected 'void *' but argument is of type 'long unsigned int' int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr, ^ drivers/char/tpm/xen-tpmfront.c:203:7: error: too few arguments to function 'xenbus_grant_ring' rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr)); ^ In file included from drivers/char/tpm/xen-tpmfront.c:17:0: include/xen/xenbus.h:206:5: note: declared here int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr, ^ Caused by commit 1b1586eeeb8c (xenbus_client: Extend interface to support multi-page ring). I have used the xen-tip tree from next-20150410 for today. Sorry for this issue, I missed the xentpm-front.c file in that patch. (Original patch from Wei Liu already included the right modification which didn't exist in Paul's.) Attached patch should fix this build failure. -- Regards, -Bob From 973eacee793595b9790957186ffd27f192f5dd4f Mon Sep 17 00:00:00 2001 From: Bob Liu bob@oracle.com Date: Mon, 13 Apr 2015 16:29:10 +0800 Subject: [PATCH] xen-tpmfront: fix build failure Fix build failure caused by commit 1b1586eeeb8c (xenbus_client: Extend interface to support multi-page ring), xen-tpmfront was missed in that commit. Signed-off-by: Bob Liu bob@oracle.com --- drivers/char/tpm/xen-tpmfront.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/char/tpm/xen-tpmfront.c b/drivers/char/tpm/xen-tpmfront.c index c3b4f5a..3111f27 100644 --- a/drivers/char/tpm/xen-tpmfront.c +++ b/drivers/char/tpm/xen-tpmfront.c @@ -193,6 +193,7 @@ static int setup_ring(struct xenbus_device *dev, struct tpm_private *priv) struct xenbus_transaction xbt; const char *message = NULL; int rv; + grant_ref_t gref; priv-shr = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO); if (!priv-shr) { @@ -200,11 +201,11 @@ static int setup_ring(struct xenbus_device *dev, struct tpm_private *priv) return -ENOMEM; } - rv = xenbus_grant_ring(dev, virt_to_mfn(priv-shr)); + rv = xenbus_grant_ring(dev, priv-shr, 1, gref); if (rv 0) return rv; - priv-ring_ref = rv; + priv-ring_ref = gref; rv = xenbus_alloc_evtchn(dev, priv-evtchn); if (rv) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] preparing for 4.5.1
On 30.03.15 at 15:15, roger@citrix.com wrote: El 26/03/15 a les 18.07, Jan Beulich ha escrit: All, having been released mid January, it is time to get ready for 4.5.1-rc1. Please reply with backport requests which you consider necessary but still missing. Please also be patient with them being pulled in - I'll be gone the coming two weeks, and I'd hope to get an RC1 put together soon thereafter. I would like to request the backport of the shared page tables fix between EPT and IOMMU for PVH: 797842 iommu: fix usage of shared EPT/IOMMU page tables on PVH Apart from you having failed to also request the fixup on top (as pointed out by Julien) you also didn't give a reason for the request. Aiui this is a PVH Dom0 fix only, and PVH Dom0 doesn't even come near to be supported in 4.5. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] x86/MSI-X: access MSI-X table only after having enabled MSI-X
On 10.04.15 at 22:02, konrad.w...@oracle.com wrote: On Wed, Mar 25, 2015 at 04:39:49PM +, Jan Beulich wrote: As done in Linux by f598282f51 (PCI: Fix the NIU MSI-X problem in a better way) and its broken predecessor, make sure we don't access the MSI-X table without having enabled MSI-X first, using the mask-all flag instead to prevent interrupts from occurring. This causes an regression with an Linux guest that has the XSA120 + XSA120 addendum with PV guests (hadn't tried yet HVM). You mentioning XSA-120 and its addendum - are these requirements for the problem to be seen? I admit I may have tested a PV guest only with an SR-IOV VF (and only a HVM guest also with an ordinary device), but I'd like to be clear about the validity of the connection. When PV guest requests an MSI-X, pciback gets: [ 122.778654] xen-pciback: :0a:00.0: enable MSI-X [ 122.778861] pciback :0a:00.0: xen map irq failed -6 for 1 domain [ 122.778887] xen_pciback: :0a:00.0: error enabling MSI-X for guest 1: err -6! Yeah, there were a few adjustments necessary to get similar issues under control for HVM guests - maybe there's a path left not covered by what's done for HVM guests. The device has the PCI_COMMAND enabled correctly: # lspci -s 0a:0.0 -vvv | head 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) Subsystem: Super Micro Computer Inc Device 10c9 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Attaching the 'xl dmesg', 'dmesg', and 'lspci' Right - MSI-X is still disabled if the lspci really matches the state at the time the failure occurred. I'm attaching the original patch with some debugging code left in, which I suppose would show where the problematic path is in case you can get to it before I would. Jan TODO: drop //temp-s As done in Linux by f598282f51 (PCI: Fix the NIU MSI-X problem in a better way) and its broken predecessor, make sure we don't access the MSI-X table without having enabled MSI-X first, using the mask-all flag instead to prevent interrupts from occurring. --- unstable.orig/xen/arch/x86/msi.c2015-02-10 08:19:17.0 +0100 +++ unstable/xen/arch/x86/msi.c 2015-02-10 09:05:12.0 +0100 @@ -142,6 +142,23 @@ static bool_t memory_decoded(const struc PCI_COMMAND_MEMORY); } +static bool_t msix_memory_decoded(const struct pci_dev *dev, unsigned int pos) +{ +u16 control = pci_conf_read16(dev-seg, dev-bus, + PCI_SLOT(dev-devfn), + PCI_FUNC(dev-devfn), + msix_control_reg(pos)); + +if ( !(control PCI_MSIX_FLAGS_ENABLE) ) +{//temp + static bool_t warned; + WARN_ON(!test_and_set_bool(warned)); +return 0; +} + +return memory_decoded(dev); +} + /* * MSI message composition */ @@ -219,7 +236,8 @@ static bool_t read_msi_msg(struct msi_de void __iomem *base; base = entry-mask_base; -if ( unlikely(!memory_decoded(entry-dev)) ) +if ( unlikely(!msix_memory_decoded(entry-dev, + entry-msi_attrib.pos)) ) return 0; msg-address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); msg-address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); @@ -285,7 +303,8 @@ static int write_msi_msg(struct msi_desc void __iomem *base; base = entry-mask_base; -if ( unlikely(!memory_decoded(entry-dev)) ) +if ( unlikely(!msix_memory_decoded(entry-dev, + entry-msi_attrib.pos)) ) return -ENXIO; writel(msg-address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); @@ -379,7 +398,7 @@ static bool_t msi_set_mask_bit(struct ir { struct msi_desc *entry = desc-msi_desc; struct pci_dev *pdev; -u16 seg; +u16 seg, control; u8 bus, slot, func; ASSERT(spin_is_locked(desc-lock)); @@ -401,35 +420,38 @@ static bool_t msi_set_mask_bit(struct ir } break; case PCI_CAP_ID_MSIX: +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos)); +if ( unlikely(!(control PCI_MSIX_FLAGS_ENABLE)) ) +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), + control | (PCI_MSIX_FLAGS_ENABLE | +PCI_MSIX_FLAGS_MASKALL)); if ( likely(memory_decoded(pdev)) ) { writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); -break; +if ( likely(control PCI_MSIX_FLAGS_ENABLE) ) +break; +flag = 1; } -if ( flag ) +
Re: [Xen-devel] [PATCH] xen: arm: X-Gene Storm check GIC DIST address for EOI quirk
On 8 April 2015 at 13:58, Christoffer Dall christoffer.d...@linaro.org wrote: Hi Pranav, On Mon, Apr 06, 2015 at 02:24:01PM +0530, Pranavkumar Sawargaonkar wrote: In old X-Gene Storm firmware and DT, secure mode addresses have been mentioned in GICv2 node. In this case maintenance interrupt is used instead of EOI HW method. This patch checks the GIC Distributor Base Address to enable EOI quirk for old firmware. Ref: http://lists.xen.org/archives/html/xen-devel/2014-07/msg01263.html Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org I have tested this on the m400 cluster and can confirm that it prevents the issue with ab trashing the machine: Tested-by: Christoffer Dall christoffer.d...@linaro.org Thanks Christoffer for cross checking. will sent out a new revision of the patch with comments addressed. Thanks, Pranav ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] PMU in xen enviroment
Hi,everyone We would like to obtain the distribution of CPU-specific hardware events such as clock cycles and cache and TLB misses and branch-miss and etc. in Xen environment . It is said that Xenoprof can be used to do so. We use Intel's Ivy Bridge Processor which is released in 2012, so I think it is better to use oprofile 0.9.8 or above. But http://xenoprof.sourceforge.net/ only released oprofile-0.9.5-xen.patch which is released in 2009. Does it mean no patch is needed since OProfile 0.9.6 ? is there a detailed document describing how to use oprofile in xen environment ? is there any other tools to do so ? Thanks a lot. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4
Monday, April 13, 2015, 11:50:51 AM, you wrote: On 13/04/15 10:39, Sander Eikelenboom wrote: Hi David, I seem to have spotted some trouble with a 4.0 dom0 kernel with the devel/for-linus-4.1 branch pulled on top. Does this remind you of any specific commits in the devel/for-linus-4.1 branch that could likely be involved that i could try to revert ? Yes. This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve performance of MMAPBATCH_V2) which makes the kernel try harder to map all GFNs instead of failing on the first one. I think this is qemu incorrectly trying to map GFNs. David Reverted that specific one, but still get those messages. -- Sander I now get a very large number of these at guest with passthrough start: (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001ed type:4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. If we trust Dom0 to do the right thing, then I don't think there is a general issue to be solved. Dom0 can break the system at any time, I don't see any differences here, unless we have a plan to actually be able to handle a misbehaving dom0, in that case I am all for it. No, that gets us in the wrong direction. Dom0 can have legitimate reasons to have to clear memory or I/O decoding on a device at run time (even if current Linux doesn't do so). The more general problem we may need to solve is that of racing config space accesses (one by Dom0, the other by the hypervisor). But that's beyond this series' scope. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. Qemu is either part of Dom0, or doesn't play with devices directly. And how are we going to deal with older unfixed QEMUs? So far we have been using the same policy for QEMU and the Dom0 kernel: Xen doesn't break them -- old Linux kernels and QEMUs are supposed to just work. I'm not sure that's really true for qemu, or if it is, then only by pure luck: The tool stack interface of the hypervisor as well as the libxc interfaces are subject to change between any two releases. I view it as unavoidable to break older qemu here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote: On 13.04.15 at 13:19, m...@redhat.com wrote: Yes Linux can't fix firmware 1st mode, but PCI express spec says what firmware should do in this case: IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices With 1.0a device Functions, 96 if the Unsupported Request Reporting Enable bit is set, the Function when operating as a Completer will send an uncorrectable error Message (if enabled) when a UR error is detected. On platforms where an uncorrectable error Message is handled as a System Error, this will break PC-compatible Configuration Space probing, so software/firmware on such platforms may need to avoid setting the Unsupported Request Reporting Enable bit. With device Functions implementing Role-Based Error Reporting, setting the Unsupported Request Reporting Enable bit will not interfere with PC-compatible Configuration Space probing, assuming that the severity for UR is left at its default of non-fatal. However, setting the Unsupported Request Reporting Enable bit will enable the Function to report UR errors detected with posted Requests, helping avoid this case for potential silent data corruption. On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. What I think you have is a very old 1.0a system, and you set Unsupported Request Reporting Enable. Can you confirm? No. In at least one of the two cases we got reports of the original problem, triggering the finding of this issue, this is a brand new one, only soon to become available publicly. Furthermore I'm being confused by the mention of PC-compatible config space probing above: The URs we talk about here don't result from config space accessed at all. OK. Can you please explain why does UR cause a system error then? It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't. You will have other problems if your firmware doesn't follow the spec. So how about either - Don't use firmware 1st mode with pci express (Seems no reason to do firmware 1st for PCIE, architecture is completely standard. I saw mentions of using combined/parallel mode, using AER for some devices but not others, but I don't know how this is supposed to be enabled. Any idea?) or - ask your vendor to update firmware if it doesn't do the right thing Both not very practical suggestions, based on experience. Jan Well using OS native mode is definitely practical, the question is how to detect the problematic configurations. There's always XSA-124 which says buggy hardware can cause security problems. -- MST ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL
On 10.04.15 at 19:03, ian.jack...@eu.citrix.com wrote: osstest service user writes ([xen-unstable test] 50291: regressions - FAIL): flight 50291 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/50291/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-freebsd10-i386 11 guest-localmigrate fail REGR. vs. 36514 test-amd64-i386-freebsd10-amd64 11 guest-localmigrate fail REGR. vs. 36514 I have done extensive testing of the network and found no network problems, infrastructure VM stalls, or anything, so I'm pretty sure these are not infrastructure problems but rather real bugs with the software under test. We believe this to be a bug in FreeBSD and are trying to update to FreeBSD 10.1 which may fix it. We think this is not anything to do with Xen. This bug seems to be exposed more in the new colo than the old test lab, for some reason. These were the only problems in this flight. Under the circumstances I think we would be justified in force pushing to xen.git#master: version targeted for testing: xen 3dca8a17f639f4b53cd2eeb687cec4e9f83acf3d Yes please. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4
Monday, April 13, 2015, 2:07:02 PM, you wrote: On 13/04/15 12:21, Sander Eikelenboom wrote: Monday, April 13, 2015, 11:50:51 AM, you wrote: On 13/04/15 10:39, Sander Eikelenboom wrote: Hi David, I seem to have spotted some trouble with a 4.0 dom0 kernel with the devel/for-linus-4.1 branch pulled on top. Does this remind you of any specific commits in the devel/for-linus-4.1 branch that could likely be involved that i could try to revert ? Yes. This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve performance of MMAPBATCH_V2) which makes the kernel try harder to map all GFNs instead of failing on the first one. I think this is qemu incorrectly trying to map GFNs. David Reverted that specific one, but still get those messages. You'll have to bisect it then. Because I don't see any other relevant commits in devel/for-linus-4.1 David Ok .. hmm first candidate of the bisect also looks interessting: [628c28eefd6f2cef03b212081b466ae43fd093a3] xen: unify foreign GFN map/unmap for auto-xlated physmap guests -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support
On 01.04.15 at 15:21, feng...@intel.com wrote: Hi Jan, Any more comments about this series? Thanks a lot! I think it makes little sense to wait for my review before posting v2 with issues addressed which others have pointed out. I just returned from vacation, and would have time to look at v1 next week the earliest (with the week after being occupied by travel). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] xen/x86: Patch re-factors MSI/X config code from, drivers/passthrough/pci.c to x86 specific
On Monday 13 April 2015 03:45 PM, Stefano Stabellini wrote: On Mon, 13 Apr 2015, Manish Jaggi wrote: This patch re-factors msi specific code to x86 specific files from xen/drivers/passthrough/pci.c. Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com --- xen/drivers/passthrough/pci.c| 102 +- xen/drivers/passthrough/x86/Makefile |1 + xen/drivers/passthrough/x86/pci.c| 115 ++ xen/include/asm-x86/msi.h|1 - xen/include/xen/pci.h| 20 +- 5 files changed, 137 insertions(+), 102 deletions(-) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index ecd061e..004aba9 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -23,7 +23,6 @@ #include xen/iommu.h #include xen/irq.h #include asm/hvm/iommu.h -#include asm/hvm/irq.h #include xen/delay.h #include xen/keyhandler.h #include xen/event.h @@ -33,21 +32,6 @@ #include xen/softirq.h #include xen/tasklet.h #include xsm/xsm.h -#include asm/msi.h - -struct pci_seg { -struct list_head alldevs_list; -u16 nr; -unsigned long *ro_map; -/* bus2bridge_lock protects bus2bridge array */ -spinlock_t bus2bridge_lock; -#define MAX_BUSES 256 -struct { -u8 map; -u8 bus; -u8 devfn; -} bus2bridge[MAX_BUSES]; -}; spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED; static struct radix_tree_root pci_segments; @@ -282,22 +266,10 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) *((u8*) pdev-bus) = bus; *((u8*) pdev-devfn) = devfn; pdev-domain = NULL; -INIT_LIST_HEAD(pdev-msi_list); - -if ( pci_find_cap_offset(pseg-nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), - PCI_CAP_ID_MSIX) ) +if (!pci_alloc_msix (pdev, pseg, bus, devfn)) { -struct arch_msix *msix = xzalloc(struct arch_msix); - -if ( !msix ) -{ -xfree(pdev); -return NULL; -} -spin_lock_init(msix-table_lock); -pdev-msix = msix; +return NULL; } - From the look of it, this code doesn't seem x86 specific MSIX is only used for x86, dom0/U handles MSI/X in ARM. I think If ARM is not using MSI/X then code has to be in x86 specific file. list_add(pdev-alldevs_list, pseg-alldevs_list); /* update bus2bridge */ @@ -755,54 +727,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) return ret; } -static int pci_clean_dpci_irq(struct domain *d, - struct hvm_pirq_dpci *pirq_dpci, void *arg) -{ -struct dev_intx_gsi_link *digl, *tmp; - -pirq_guest_unbind(d, dpci_pirq(pirq_dpci)); - -if ( pt_irq_need_timer(pirq_dpci-flags) ) -kill_timer(pirq_dpci-timer); - -list_for_each_entry_safe ( digl, tmp, pirq_dpci-digl_list, list ) -{ -list_del(digl-list); -xfree(digl); -} - -return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0; -} - -static int pci_clean_dpci_irqs(struct domain *d) -{ -struct hvm_irq_dpci *hvm_irq_dpci = NULL; - -if ( !iommu_enabled ) -return 0; - -if ( !is_hvm_domain(d) ) -return 0; - -spin_lock(d-event_lock); -hvm_irq_dpci = domain_get_irq_dpci(d); -if ( hvm_irq_dpci != NULL ) -{ -int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); - -if ( ret ) -{ -spin_unlock(d-event_lock); -return ret; -} - -d-arch.hvm_domain.irq.dpci = NULL; -free_hvm_irq_dpci(hvm_irq_dpci); -} -spin_unlock(d-event_lock); -return 0; -} ..but these two functions do int pci_release_devices(struct domain *d) { struct pci_dev *pdev; @@ -1186,28 +1110,6 @@ bool_t pcie_aer_get_firmware_first(const struct pci_dev *pdev) } #endif -static int _dump_pci_devices(struct pci_seg *pseg, void *arg) -{ -struct pci_dev *pdev; -struct msi_desc *msi; - -printk( segment %04x \n, pseg-nr); - -list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list ) -{ -printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs , - pseg-nr, pdev-bus, - PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn), - pdev-domain ? pdev-domain-domain_id : -1, - (pdev-node != NUMA_NO_NODE) ? pdev-node : -1); -list_for_each_entry ( msi, pdev-msi_list, list ) - printk(%d , msi-irq); -printk(\n); -} - -return 0; -} - static void dump_pci_devices(unsigned char ch) { printk( PCI devices \n); diff --git a/xen/drivers/passthrough/x86/Makefile b/xen/drivers/passthrough/x86/Makefile index a70cf94..a2bcf94 100644 --- a/xen/drivers/passthrough/x86/Makefile +++ b/xen/drivers/passthrough/x86/Makefile @@ -1,2 +1,3 @@ obj-y += ats.o obj-y += iommu.o +obj-$(HAS_PCI) += pci.o diff --git
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On Thu, Apr 9, 2015 at 5:36 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Thu, 9 Apr 2015, Eric Dumazet wrote: On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote: Hi all, I found a performance regression when running netperf -t TCP_MAERTS from an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the virtual machine are 30% slower than v3.18. Through bisection I found that the perf regression is caused by the prensence of the following commit in the guest kernel: commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 Author: Eric Dumazet eduma...@google.com Date: Sun Dec 7 12:22:18 2014 -0800 tcp: refine TSO autosizing [snip] This commit restored original TCP Small Queue behavior, which is the first step to fight bufferbloat. Some network drivers are known to be problematic because of a delayed TX completion. [snip] Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it makes a difference ? A very big difference: echo 262144 /proc/sys/net/ipv4/tcp_limit_output_bytes brings us much closer to the original performance, the slowdown is just 8% echo 1048576 /proc/sys/net/ipv4/tcp_limit_output_bytes fills the gap entirely, same performance as before refine TSO autosizing What would be the next step for here? Should I just document this as an important performance tweaking step for Xen, or is there something else we can do? Is the problem perhaps that netback/netfront delays TX completion? Would it be better to see if that can be addressed properly, so that the original purpose of the patch (fighting bufferbloat) can be achieved while not degrading performance for Xen? Or at least, so that people get decent perfomance out of the box without having to tweak TCP parameters? -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4
On 13/04/15 12:21, Sander Eikelenboom wrote: Monday, April 13, 2015, 11:50:51 AM, you wrote: On 13/04/15 10:39, Sander Eikelenboom wrote: Hi David, I seem to have spotted some trouble with a 4.0 dom0 kernel with the devel/for-linus-4.1 branch pulled on top. Does this remind you of any specific commits in the devel/for-linus-4.1 branch that could likely be involved that i could try to revert ? Yes. This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve performance of MMAPBATCH_V2) which makes the kernel try harder to map all GFNs instead of failing on the first one. I think this is qemu incorrectly trying to map GFNs. David Reverted that specific one, but still get those messages. You'll have to bisect it then. Because I don't see any other relevant commits in devel/for-linus-4.1 David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle
Hi Ian, On 09/04/15 18:04, Ian Jackson wrote: Julien Grall writes (Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle): On 09/04/15 17:52, Ian Jackson wrote: What would happen if our assumption about the DT compiler were violated ? The phandle would be present in 2 different nodes of the DT. FYI, that may also happen if a user use 2 times the same phandle in the partial DT. So we are making an assumption about the DT compiler which we don't check, and when the assumption is violated, we produce a corrupted DT which produces undefined behaviour in the guest. Can we do some kind of check ? As discussed IRL, I will add a note in the documentation about this restriction. This will be removed when the phandle will be generated dynamically. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI
Hi Manish, On 13/04/15 08:37, Manish Jaggi wrote: Xen currently does not have PCI support for ARM builds. This patch set makes the code compilable for ARM PCI and adds places-holder code which would be replaced with PCI pass-through support patch series. May I ask why you did send directly all the code to support PCI on ARM? Without the rest it's hard to tell whether these patches make sense or not. Re-factor MSI Handling - There is a some x86 specific code which is found in common code: xen/drivers/passthrough/pci.c which needs to be re factored. MSI/X are configured and handled by dom0 or domU code on ARM64 and is not required to be part of common code. However there are functions which are used as part of common code and calls to these functions cannot be easily re factored like pci_cleanup_msi. On x86, the hypervisor is taking care of MSI (enabling/disabling in the config space) as long as doing sanity check on MSI used by a given domain. How do you plan to handle it on ARM? IHMO, the MSI code is very useful and would also help GIC MSI implement (ITS + V2M). FWIW, I've pointed out the same issue on the ITS series a couple of weeks ago. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] Re-factoring passthrough/pci.c and adding place-holder code for ARM/PCI
BTW, your series is not threaded. I've already said it on another series. Please look at [1] to see how to send correctly your series. [1] http://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches#Sending_the_patches_to_the_list Regards, On 13/04/15 11:19, Julien Grall wrote: Hi Manish, On 13/04/15 08:37, Manish Jaggi wrote: Xen currently does not have PCI support for ARM builds. This patch set makes the code compilable for ARM PCI and adds places-holder code which would be replaced with PCI pass-through support patch series. May I ask why you did send directly all the code to support PCI on ARM? Without the rest it's hard to tell whether these patches make sense or not. Re-factor MSI Handling - There is a some x86 specific code which is found in common code: xen/drivers/passthrough/pci.c which needs to be re factored. MSI/X are configured and handled by dom0 or domU code on ARM64 and is not required to be part of common code. However there are functions which are used as part of common code and calls to these functions cannot be easily re factored like pci_cleanup_msi. On x86, the hypervisor is taking care of MSI (enabling/disabling in the config space) as long as doing sanity check on MSI used by a given domain. How do you plan to handle it on ARM? IHMO, the MSI code is very useful and would also help GIC MSI implement (ITS + V2M). FWIW, I've pointed out the same issue on the ITS series a couple of weeks ago. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On 13.04.15 at 13:19, m...@redhat.com wrote: Yes Linux can't fix firmware 1st mode, but PCI express spec says what firmware should do in this case: IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices With 1.0a device Functions, 96 if the Unsupported Request Reporting Enable bit is set, the Function when operating as a Completer will send an uncorrectable error Message (if enabled) when a UR error is detected. On platforms where an uncorrectable error Message is handled as a System Error, this will break PC-compatible Configuration Space probing, so software/firmware on such platforms may need to avoid setting the Unsupported Request Reporting Enable bit. With device Functions implementing Role-Based Error Reporting, setting the Unsupported Request Reporting Enable bit will not interfere with PC-compatible Configuration Space probing, assuming that the severity for UR is left at its default of non-fatal. However, setting the Unsupported Request Reporting Enable bit will enable the Function to report UR errors detected with posted Requests, helping avoid this case for potential silent data corruption. On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. What I think you have is a very old 1.0a system, and you set Unsupported Request Reporting Enable. Can you confirm? No. In at least one of the two cases we got reports of the original problem, triggering the finding of this issue, this is a brand new one, only soon to become available publicly. Furthermore I'm being confused by the mention of PC-compatible config space probing above: The URs we talk about here don't result from config space accessed at all. You will have other problems if your firmware doesn't follow the spec. So how about either - Don't use firmware 1st mode with pci express (Seems no reason to do firmware 1st for PCIE, architecture is completely standard. I saw mentions of using combined/parallel mode, using AER for some devices but not others, but I don't know how this is supposed to be enabled. Any idea?) or - ask your vendor to update firmware if it doesn't do the right thing Both not very practical suggestions, based on experience. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] xen/x86: Patch re-factors MSI/X config code from, drivers/passthrough/pci.c to x86 specific
On Mon, 13 Apr 2015, Manish Jaggi wrote: This patch re-factors msi specific code to x86 specific files from xen/drivers/passthrough/pci.c. Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com --- xen/drivers/passthrough/pci.c| 102 +- xen/drivers/passthrough/x86/Makefile |1 + xen/drivers/passthrough/x86/pci.c| 115 ++ xen/include/asm-x86/msi.h|1 - xen/include/xen/pci.h| 20 +- 5 files changed, 137 insertions(+), 102 deletions(-) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index ecd061e..004aba9 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -23,7 +23,6 @@ #include xen/iommu.h #include xen/irq.h #include asm/hvm/iommu.h -#include asm/hvm/irq.h #include xen/delay.h #include xen/keyhandler.h #include xen/event.h @@ -33,21 +32,6 @@ #include xen/softirq.h #include xen/tasklet.h #include xsm/xsm.h -#include asm/msi.h - -struct pci_seg { -struct list_head alldevs_list; -u16 nr; -unsigned long *ro_map; -/* bus2bridge_lock protects bus2bridge array */ -spinlock_t bus2bridge_lock; -#define MAX_BUSES 256 -struct { -u8 map; -u8 bus; -u8 devfn; -} bus2bridge[MAX_BUSES]; -}; spinlock_t pcidevs_lock = SPIN_LOCK_UNLOCKED; static struct radix_tree_root pci_segments; @@ -282,22 +266,10 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) *((u8*) pdev-bus) = bus; *((u8*) pdev-devfn) = devfn; pdev-domain = NULL; -INIT_LIST_HEAD(pdev-msi_list); - -if ( pci_find_cap_offset(pseg-nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), - PCI_CAP_ID_MSIX) ) +if (!pci_alloc_msix (pdev, pseg, bus, devfn)) { -struct arch_msix *msix = xzalloc(struct arch_msix); - -if ( !msix ) -{ -xfree(pdev); -return NULL; -} -spin_lock_init(msix-table_lock); -pdev-msix = msix; +return NULL; } - From the look of it, this code doesn't seem x86 specific list_add(pdev-alldevs_list, pseg-alldevs_list); /* update bus2bridge */ @@ -755,54 +727,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) return ret; } -static int pci_clean_dpci_irq(struct domain *d, - struct hvm_pirq_dpci *pirq_dpci, void *arg) -{ -struct dev_intx_gsi_link *digl, *tmp; - -pirq_guest_unbind(d, dpci_pirq(pirq_dpci)); - -if ( pt_irq_need_timer(pirq_dpci-flags) ) -kill_timer(pirq_dpci-timer); - -list_for_each_entry_safe ( digl, tmp, pirq_dpci-digl_list, list ) -{ -list_del(digl-list); -xfree(digl); -} - -return pt_pirq_softirq_active(pirq_dpci) ? -ERESTART : 0; -} - -static int pci_clean_dpci_irqs(struct domain *d) -{ -struct hvm_irq_dpci *hvm_irq_dpci = NULL; - -if ( !iommu_enabled ) -return 0; - -if ( !is_hvm_domain(d) ) -return 0; - -spin_lock(d-event_lock); -hvm_irq_dpci = domain_get_irq_dpci(d); -if ( hvm_irq_dpci != NULL ) -{ -int ret = pt_pirq_iterate(d, pci_clean_dpci_irq, NULL); - -if ( ret ) -{ -spin_unlock(d-event_lock); -return ret; -} - -d-arch.hvm_domain.irq.dpci = NULL; -free_hvm_irq_dpci(hvm_irq_dpci); -} -spin_unlock(d-event_lock); -return 0; -} ..but these two functions do int pci_release_devices(struct domain *d) { struct pci_dev *pdev; @@ -1186,28 +1110,6 @@ bool_t pcie_aer_get_firmware_first(const struct pci_dev *pdev) } #endif -static int _dump_pci_devices(struct pci_seg *pseg, void *arg) -{ -struct pci_dev *pdev; -struct msi_desc *msi; - -printk( segment %04x \n, pseg-nr); - -list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list ) -{ -printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs , - pseg-nr, pdev-bus, - PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn), - pdev-domain ? pdev-domain-domain_id : -1, - (pdev-node != NUMA_NO_NODE) ? pdev-node : -1); -list_for_each_entry ( msi, pdev-msi_list, list ) - printk(%d , msi-irq); -printk(\n); -} - -return 0; -} - static void dump_pci_devices(unsigned char ch) { printk( PCI devices \n); diff --git a/xen/drivers/passthrough/x86/Makefile b/xen/drivers/passthrough/x86/Makefile index a70cf94..a2bcf94 100644 --- a/xen/drivers/passthrough/x86/Makefile +++ b/xen/drivers/passthrough/x86/Makefile @@ -1,2 +1,3 @@ obj-y += ats.o obj-y += iommu.o +obj-$(HAS_PCI) += pci.o diff --git a/xen/drivers/passthrough/x86/pci.c b/xen/drivers/passthrough/x86/pci.c new file mode 100644
Re: [Xen-devel] [PATCH v3 3/8] x86: detect and initialize Intel CAT feature
On 13/04/15 11:51, Jan Beulich wrote: On 27.03.15 at 19:31, andrew.coop...@citrix.com wrote: On 26/03/15 12:38, Chao Peng wrote: +cpuid_count(PSR_CPUID_LEVEL_CAT, 0, eax, ebx, ecx, edx); +if ( ebx PSR_RESOURCE_TYPE_L3 ) +{ +cpuid_count(PSR_CPUID_LEVEL_CAT, 1, eax, ebx, ecx, edx); +info-cbm_len = (eax 0x1f) + 1; +info-cos_max = (edx 0x); + +info-enabled = 1; +printk(XENLOG_DEBUG CAT: enabled on socket %u, cos_max:%u, cbm_len:%u\n, + socket, info-cos_max, info-cbm_len); I would bump this to INFO, at least for socket 0. Similar to CMT, a user which has gone to the effort of enabling CAT will want some indication in the boot log without also having to bump the logging level. Which INFO wouldn't be sufficient for (at least not with the xen.org defaults, not sure about XenServer's). Huh - you are right. XenServer doesn't patch this, but I spend most of my time using debug builds so I tend not to notice. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On Mon, Apr 13, 2015 at 09:17:04AM +0100, Jan Beulich wrote: On 01.04.15 at 11:59, m...@redhat.com wrote: On Wed, Apr 01, 2015 at 10:41:12AM +0100, Andrew Cooper wrote: On 01/04/15 10:20, Stefano Stabellini wrote: CC'ing the author of the patch and xen-devel. FYI I think that Jan is going to be on vacation for a couple of weeks. On Wed, 1 Apr 2015, Michael S. Tsirkin wrote: On Tue, Mar 31, 2015 at 03:18:03PM +0100, Stefano Stabellini wrote: From: Jan Beulich jbeul...@suse.com Otherwise the guest can abuse that control to cause e.g. PCIe Unsupported Request responses (by disabling memory and/or I/O decoding and subsequently causing [CPU side] accesses to the respective address ranges), which (depending on system configuration) may be fatal to the host. This is CVE-2015-2756 / XSA-126. Signed-off-by: Jan Beulich jbeul...@suse.com Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com The patch description seems somewhat incorrect to me. UR should not be fatal to the system, and it's not platform specific. I think that people have been able to repro this, but I'll let Jan comment on it. Depending on how the BIOS sets up AER (if even available), a UR can very easily be fatal to the system. If firmware first mode is set, Xen (or indeed Linux) can't fix a problematic setup. Experimentally, doing so can cause infinite loops in certain vendors SMM handlers. I think it can, just disable UR reporting, this is up to OS. This is what the PCI spec says - you have snipped the relevant part out from the mail you are replying to. As already said by Andrew, the OS must not do such when the system is in APEI firmware first mode. Jan Yes Linux can't fix firmware 1st mode, but PCI express spec says what firmware should do in this case: IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices With 1.0a device Functions, 96 if the Unsupported Request Reporting Enable bit is set, the Function when operating as a Completer will send an uncorrectable error Message (if enabled) when a UR error is detected. On platforms where an uncorrectable error Message is handled as a System Error, this will break PC-compatible Configuration Space probing, so software/firmware on such platforms may need to avoid setting the Unsupported Request Reporting Enable bit. With device Functions implementing Role-Based Error Reporting, setting the Unsupported Request Reporting Enable bit will not interfere with PC-compatible Configuration Space probing, assuming that the severity for UR is left at its default of non-fatal. However, setting the Unsupported Request Reporting Enable bit will enable the Function to report UR errors detected with posted Requests, helping avoid this case for potential silent data corruption. On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. What I think you have is a very old 1.0a system, and you set Unsupported Request Reporting Enable. Can you confirm? If you have access to the system in question, I can provide a test script to detect this. You will have other problems if your firmware doesn't follow the spec. So how about either - Don't use firmware 1st mode with pci express (Seems no reason to do firmware 1st for PCIE, architecture is completely standard. I saw mentions of using combined/parallel mode, using AER for some devices but not others, but I don't know how this is supposed to be enabled. Any idea?) or - ask your vendor to update firmware if it doesn't do the right thing -- MST ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4
On 13/04/15 10:39, Sander Eikelenboom wrote: Hi David, I seem to have spotted some trouble with a 4.0 dom0 kernel with the devel/for-linus-4.1 branch pulled on top. Does this remind you of any specific commits in the devel/for-linus-4.1 branch that could likely be involved that i could try to revert ? Yes. This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve performance of MMAPBATCH_V2) which makes the kernel try harder to map all GFNs instead of failing on the first one. I think this is qemu incorrectly trying to map GFNs. David I now get a very large number of these at guest with passthrough start: (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001ed type:4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/8] Add core.sh and wrapper function
On 04/10/2015 03:29 PM, Stefano Stabellini wrote: diff --git a/raise b/raise new file mode 100755 index 000..7f3faae --- /dev/null +++ b/raise @@ -0,0 +1,16 @@ +#!/usr/bin/env bash It is important to set -e immediately to catch all possible errors: -e Exit immediately if a command exits with a non-zero status. Then you can drop set -e from raise.sh and unraise.sh when you turn them into libraries. set -e gets tricky when you sometimes expect commands to fail; particularly if you may want to do some sort of clean-up. But let's cross that bridge when we come to it. :-) -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code
On Monday 13 April 2015 03:44 PM, Stefano Stabellini wrote: On Mon, 13 Apr 2015, Manish Jaggi wrote: Add ARM PCI Support --- a) Place holder functions are added for pci_conf_read/write calls. b) Macros dev_is_pci, pci_to_dev are implemented in drivers/passthrough/pci/arm code Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com --- xen/arch/arm/Makefile|1 + xen/arch/arm/pci.c | 60 +++ xen/drivers/passthrough/arm/Makefile |1 + xen/drivers/passthrough/arm/pci.c| 88 ++ xen/drivers/passthrough/arm/smmu.c |1 - xen/drivers/passthrough/pci.c|9 ++-- xen/include/asm-arm/device.h | 33 + xen/include/asm-arm/domain.h |3 ++ xen/include/asm-arm/pci.h|7 +-- 9 files changed, 186 insertions(+), 17 deletions(-) diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 935999e..aaf5d88 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -39,6 +39,7 @@ obj-y += device.o obj-y += decode.o obj-y += processor.o obj-y += smc.o +obj-$(HAS_PCI) += pci.o #obj-bin-y += o diff --git a/xen/arch/arm/pci.c b/xen/arch/arm/pci.c new file mode 100644 index 000..9438f41 --- /dev/null +++ b/xen/arch/arm/pci.c @@ -0,0 +1,60 @@ +#include xen/pci.h + +void _pci_conf_write( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint32_t data, int bytes) +{ +} + +uint32_t _pci_conf_read( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, int bytes) +{ +return 0; +} I guess that they are going to be implemented with platform specific code? Although I thought that the mmcfg stuff is somewhat standard across architectures. yes. _pci_conf_read/write will call pcihb_conf_read/write (pcihb = pci_host_bridge). ref: http://lists.xen.org/archives/html/xen-devel/2015-02/msg02717.html +uint8_t pci_conf_read8( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 1); +} + +uint16_t pci_conf_read16( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 2); +} + +uint32_t pci_conf_read32( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 4); +} + +void pci_conf_write8( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint8_t data) +{ + _pci_conf_write(seg, bus, dev, func, reg, data, 1); +} + +void pci_conf_write16( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint16_t data) +{ + _pci_conf_write(seg, bus, dev, func, reg, data, 2); +} + +void pci_conf_write32( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint32_t data) +{ + _pci_conf_write(seg, bus, dev, func, reg, data, 4); +} + +void arch_pci_ro_device(int seg, int bdf) +{ +} This is also missing an implementation. Maybe you should add /* TODO */ here too ok. diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile index f4cd26e..1a41549 100644 --- a/xen/drivers/passthrough/arm/Makefile +++ b/xen/drivers/passthrough/arm/Makefile @@ -1,2 +1,3 @@ obj-y += iommu.o obj-y += smmu.o +obj-$(HAS_PCI) += pci.o diff --git a/xen/drivers/passthrough/arm/pci.c b/xen/drivers/passthrough/arm/pci.c new file mode 100644 index 000..07a5a78 --- /dev/null +++ b/xen/drivers/passthrough/arm/pci.c @@ -0,0 +1,88 @@ +/* + * PCI helper functions for arm for passthrough support. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) 2015 Cavium Networks + * + * Author: Manish Jaggi manish.ja...@cavium.com + */ +#include xen/pci.h +#include asm/device.h +#include xen/sched.h + +int _dump_pci_devices(struct pci_seg *pseg, void *arg) +{ +struct pci_dev *pdev; + +printk( segment %04x \n, pseg-nr); + +list_for_each_entry ( pdev, pseg-alldevs_list, alldevs_list ) +{ +printk(%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs , + pseg-nr, pdev-bus, + PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn), +
Re: [Xen-devel] [PATCH 1/8] Add core.sh and wrapper function
On 04/10/2015 03:29 PM, Stefano Stabellini wrote: +arg_parse_cmd=\ +local -a args; +local _a; +local _vn; +local _m; + +_m=true; + +for _a in \\$@\ ; do +false echo \Evaluating \${_a} [[ \\${_a/=}\ = \\${_a}\ ]]\; +if \$_m [[ \\${_a/=}\ != \\${_a}\ ]] ; then +false echo Parameter; +_vn=\${_a%%=*}; +eval \local \$_vn\; +eval \\$_a\; +elif \$_m [[ \\${_a}\ == \--\ ]] ; then +false echo Separator; +_m=false; +else +false echo Argument; +_m=false; +args+=(\\$_a\); +fi; +done I am sorry but I cannot bear myself to introduce so much complexity into the world. raisin is supposed to be easy to read. If we end up actually needing something so powerful and so complex in the future we can import it then, but I think that at the moment we can do nicely without it. I can certainly understand the sentiment, but the idea behind this calling convention is actually to make it *easier* both to read and debug most of the source file, at the cost of two very complex, but well-tested and generic macros ($arg_parse and $requireargs). Consider some criteria: - Difficulty of writing or reading code that calls a function - Difficulty of writing a function which will be called - Availability of sensible defaults - Ability to make sure all necessary arguments have been specified - Passing function arguments into sub-functions - Global variables are nasty to debug Consider alternate ways of passing arguments into functions: 1. Positional i.e., cp SRC DEST In this model, it's easy to make sure that all necessary arguments have been specified; and there are no nasty global variables to work with. However is really hard to program and read because you have to remember what argument goes where. You also have to specify all possible arguments -- you can't have a default. And if you need to take an argument and pass it to another function you call, you have to repeat it again. 2. Option i.e., -v --components='a b' This is easier to read if you use long options (or common ones like -v or -y), and you can also have defaults. Additionally, you can have a certain amount of error checking -- if you get an argument you don't recognize you can throw an error. But you have to write a lot of boilerplate option processing at the top of each function; and if you have anything a bit complex (line --components= above), it makes it a lot harder to read. And you have to write new code every time, giving you more opportunity for bugs. Consider the argument parsing code you have in raise at the moment -- it assumes that you will run a command without any additional parameters. What if you wanted to do something like this: raise -v build xen The current code would have to be refactored. And you have to write a lot of custom code to check to make sure that all the required arguments have been passed in. And you have the same problem that if you need to pass an argument further down the call stack, you have to repeat it again. 3. Global variables i.e., VAR=foo function This is a bit easier to read in the sense that you're using full variable names for options. It's easier to write, since you don't have any parsing code. Additionally, variables can be passed further down the calling stack automatically without having to be explicitly copied around. In theory you should be able to have sensible local defaults. But you have all the problems with global variables being really hard to debug. 4. Prefix variable i.e., YES=y check-builddep This is a bit easier to read in the sense that you're using full variable names for options. It's easier to write, since you don't have any parsing code. Additionally, variables can be passed further down the calling stack automatically without having to be explicitly copied around. In theory you should be able to have sensible local defaults. I hadn't thought about this as a calling convention. It certainly has the advantage that it's build into bash and fairly well-understood by experienced users. But I do think it's a bit unnatural to have to prefix all the variables to the function. 5. VAR=VAL automatic parsing with inheritance This allows you to have a rich, consistent way to pass in arguments that are descriptive. It's easy to write code to parse the arguments and check to make sure you have all the ones you need: Just add $arg_parse and $requireargs at the top of your function. They are automatically declared local, so there's no worry about polluting the global namespace. They are automatically inherited, so you don't have to keep passing information further down the stack. And you can have sensible defaults. I think #5 is well worth the little bit of macro hackery confined to a handful of well-tested macros, particularly if raise becomes a fairly large and fully-functional script. I've been using this calling convention for several years now in my
Re: [Xen-devel] [PATCH 2/2] xen/arm: Make HAS_PCI compilable on ARM by adding place-holder code
Hi Manish, On 13/04/15 08:48, Manish Jaggi wrote: Add ARM PCI Support --- a) Place holder functions are added for pci_conf_read/write calls. b) Macros dev_is_pci, pci_to_dev are implemented in drivers/passthrough/pci/arm code Signed-off-by: Manish Jaggi manish.ja...@caviumnetworks.com --- xen/arch/arm/Makefile|1 + xen/arch/arm/pci.c | 60 +++ xen/drivers/passthrough/arm/Makefile |1 + xen/drivers/passthrough/arm/pci.c| 88 ++ xen/drivers/passthrough/arm/smmu.c |1 - xen/drivers/passthrough/pci.c|9 ++-- xen/include/asm-arm/device.h | 33 + xen/include/asm-arm/domain.h |3 ++ xen/include/asm-arm/pci.h|7 +-- 9 files changed, 186 insertions(+), 17 deletions(-) diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 935999e..aaf5d88 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -39,6 +39,7 @@ obj-y += device.o obj-y += decode.o obj-y += processor.o obj-y += smc.o +obj-$(HAS_PCI) += pci.o #obj-bin-y += o diff --git a/xen/arch/arm/pci.c b/xen/arch/arm/pci.c new file mode 100644 index 000..9438f41 --- /dev/null +++ b/xen/arch/arm/pci.c @@ -0,0 +1,60 @@ +#include xen/pci.h + +void _pci_conf_write( The _ is not necessary here. Please name the function pci_conf_write. +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, uint32_t data, int bytes) unsigned int bytes? +{ +} + +uint32_t _pci_conf_read( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg, int bytes) +{ +return 0; +} Same here. + +uint8_t pci_conf_read8( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 1); +} + +uint16_t pci_conf_read16( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 2); Wrong cast. +} + +uint32_t pci_conf_read32( +uint32_t seg, uint32_t bus, uint32_t dev, uint32_t func, +uint32_t reg) +{ +return (uint8_t)_pci_conf_read(seg, bus, dev, func, reg, 4); Wrong cast. [..] diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 004aba9..68c292b 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1252,9 +1252,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn) /* Prevent device assign if mem paging or mem sharing have been * enabled for this domain */ if ( unlikely(!need_iommu(d) -(d-arch.hvm_domain.mem_sharing_enabled || - d-mem_event-paging.ring_page || - p2m_get_hostp2m(d)-global_logdirty)) ) +(d-mem_event-paging.ring_page +#ifdef CONFIG_X86 + || d-arch.hvm_domain.mem_sharing_enabled || + p2m_get_hostp2m(d)-global_logdirty +#endif +)) ) Please avoid #ifdef CONFIG_* and introduce an arch macro. return -EXDEV; if ( !spin_trylock(pcidevs_lock) ) diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h index a72f7c9..009ff0a 100644 --- a/xen/include/asm-arm/device.h +++ b/xen/include/asm-arm/device.h @@ -6,6 +6,17 @@ enum device_type { DEV_DT, +DEV_ENUMERATED, +}; + +enum device_class +{ +DEVICE_SERIAL, +DEVICE_IOMMU, +DEVICE_GIC, +DEVICE_PCI, +/* Use for error */ +DEVICE_UNKNOWN, }; Why? It will be very unlikely that we have to create a structure device for the IOMMU, GIC and SERIAL. It would make more sense to add a DEV_PCI directly to device_type. struct dev_archdata { @@ -16,28 +27,30 @@ struct dev_archdata { struct device { enum device_type type; +enum device_class class; #ifdef HAS_DEVICE_TREE struct dt_device_node *of_node; /* Used by drivers imported from Linux */ #endif struct dev_archdata archdata; +#ifdef HAS_PCI +void *pci_dev; This field is not necessary. I've added one for the DT for keeping compatibility with Linux. +#endif [..] +int dev_is_pci(device_t *dev); This could easily be a macro. struct device_desc { /* Device name */ diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 9e0419e..41ae847 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -42,6 +42,8 @@ struct vtimer { uint64_t cval; }; +#define has_arch_pdevs(d)(!list_empty((d)-arch.pdev_list)) + struct arch_domain { #ifdef CONFIG_ARM_64 @@ -56,6 +58,7 @@ struct arch_domain xen_pfn_t *grant_table_gpfn; struct io_handler io_handlers; +struct list_head pdev_list; /* Continuable domain_relinquish_resources(). */ enum {
[Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4
Hi David, I seem to have spotted some trouble with a 4.0 dom0 kernel with the devel/for-linus-4.1 branch pulled on top. Does this remind you of any specific commits in the devel/for-linus-4.1 branch that could likely be involved that i could try to revert ? -- Sander I now get a very large number of these at guest with passthrough start: (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001ed type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001ee type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001ef type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f0 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f1 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f2 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f3 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f4 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f5 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f6 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f7 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f8 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001f9 type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001fa type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001fb type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001fc type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001fd type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001fe type:4 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001ff type:4 (XEN) [2015-04-12 14:55:20.226] memory_map: error -22 removing dom1 access to [fe200,fe3ff] But the guest starts after that .. and the device seems to work fine: (d1) [2015-04-12 14:55:20.230] pci dev 05:0 INTA-IRQ10 (d1) [2015-04-12 14:55:20.246] No RAM in high memory; setting high_mem resource base to 1 (d1) [2015-04-12 14:55:20.247] pci dev 03:0 bar 10 size 00200: 0f008 (d1) [2015-04-12 14:55:20.248] pci dev 02:0 bar 14 size 00100: 0f208 (d1) [2015-04-12 14:55:20.250] pci dev 05:0 bar 10 size 00020: 0f304 (XEN) [2015-04-12 14:55:20.250] memory_map:add: dom1 gfn=f3000 mfn=fe200 nr=200 (d1) [2015-04-12 14:55:20.255] pci dev 04:0 bar 30 size 4: 0f320 (d1) [2015-04-12 14:55:20.257] pci dev 04:0 bar 10 size 2: 0f324 (d1) [2015-04-12 14:55:20.257] pci dev 03:0 bar 30 size 1: 0f326 (d1) [2015-04-12 14:55:20.259] pci dev 03:0 bar 14 size 01000: 0f327 (d1) [2015-04-12 14:55:20.260] pci dev 02:0 bar 10 size 00100: 0c001 (d1) [2015-04-12 14:55:20.262] pci dev 04:0 bar 14 size 00040: 0c101 (d1) [2015-04-12 14:55:20.263] pci dev 01:1 bar 20 size 00010: 0c141 (d1) [2015-04-12 14:55:20.266] Multiprocessor initialisation: (d1) [2015-04-12 14:55:20.286] - CPU0 ... 48-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. (d1) [2015-04-12 14:55:20.307] - CPU1 ... 48-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. (d1) [2015-04-12 14:55:20.333] - CPU2 ... 48-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. (d1) [2015-04-12 14:55:20.359] - CPU3 ... 48-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done. (d1) [2015-04-12 14:55:20.359] Testing HVM environment: (d1) [2015-04-12 14:55:20.375] - REP INSB across page boundaries ... passed (d1) [2015-04-12 14:55:20.388] - GS base MSRs and SWAPGS ... passed (d1) [2015-04-12 14:55:20.388] Passed 2 of 2 tests (d1) [2015-04-12 14:55:20.388] Writing SMBIOS tables ... (d1) [2015-04-12 14:55:20.389] Loading SeaBIOS ... (d1) [2015-04-12 14:55:20.389] Creating MP tables ... (d1) [2015-04-12 14:55:20.389] Loading ACPI ... (d1) [2015-04-12 14:55:20.390] vm86 TSS at fc00a200 (d1) [2015-04-12 14:55:20.391] BIOS map: (d1) [2015-04-12 14:55:20.391] 1-100d3: Scratch space (d1) [2015-04-12 14:55:20.391] c-f: Main BIOS (d1) [2015-04-12 14:55:20.391] E820 table: (d1) [2015-04-12 14:55:20.391] [00]: : - :000a: RAM (d1) [2015-04-12 14:55:20.391] HOLE: :000a - :000c (d1) [2015-04-12 14:55:20.391] [01]: :000c - :0010: RESERVED (d1) [2015-04-12 14:55:20.391] [02]: :0010 - :3f80: RAM (d1) [2015-04-12 14:55:20.391] HOLE: :3f80 - :fc00 (d1) [2015-04-12 14:55:20.391] [03]: :fc00 - 0001:: RESERVED (d1) [2015-04-12 14:55:20.391] Invoking SeaBIOS ... (d1)
Re: [Xen-devel] Outstanding fixes for Linux 4.1?
On 02/04/15 20:22, Konrad Rzeszutek Wilk wrote: On Thu, Apr 02, 2015 at 05:21:50PM +0100, David Vrabel wrote: All, Are there any outstanding patches that should be added to Linux 4.1 (the merge window for which will be opening shortly)? Bob's two patches to the deferred grant mechanism. These need acking by a blkback maintainer. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On Mon, 13 Apr 2015, Jan Beulich wrote: On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. If we trust Dom0 to do the right thing, then I don't think there is a general issue to be solved. Dom0 can break the system at any time, I don't see any differences here, unless we have a plan to actually be able to handle a misbehaving dom0, in that case I am all for it. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. In any case we should make it clear somewhere who is supposed to write to the command register (and other PCI reigsters) at any given time, otherwise it would be very easy for a new kernel update to break the hypervisor and we wouldn't even know why it happened. We should, but at this point in time this is going to be rather problematic. Such a separation of responsibilities should have been done before all the pass-through code got written. That is true. @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_ } break; case PCI_CAP_ID_MSIX: -{ -int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET; -writel(flag, entry-mask_base + offset); -readl(entry-mask_base + offset); -break; -} +if ( likely(memory_decoded(pdev)) ) +{ +writel(flag, entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +readl(entry-mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); +break; +} +if ( flag ) +{ +u16 control; +domid_t domid = pdev-domain-domain_id; + +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos)); +if ( control PCI_MSIX_FLAGS_MASKALL ) +break; +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry-msi_attrib.pos), + control | PCI_MSIX_FLAGS_MASKALL); How is that going to interact with Linux (Dom0) writing to the command register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for devices assigned to HVM guests. Could this cause any conflicts? I don't see the relation to the comment register here - all quoted code only deals with the MSI-X control register. I meant control register, sorry for the confusion. One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but as a matter of fact QEMU has done that for years and we cannot break that behaviour without introducing regressions. In fact as it stands QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM guests, not Xen. Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed through devices to running domains? I think that might be a good enough separation of responsibilities between Xen and QEMU. No, the bits has to be under hypervisor control (just like any other hardware bits controlling interrupts). Maybe they should be under hypervisor control, but keep in mind that they are not and they haven't been for years now. We cannot go ahead and break existing code paths and use cases. As it stands, QEMU is in charge of PCI_MSIX_FLAGS_MASKALL for passthrough devices and this is a regression. Dealing with this is the subject of patches I have ready, but didn't post yet. OK, that is very good. But if we really want to make this change, then I think that those patches would need to be part of this series so that they can be committed
Re: [Xen-devel] [PATCH v3 3/8] x86: detect and initialize Intel CAT feature
On 27.03.15 at 19:31, andrew.coop...@citrix.com wrote: On 26/03/15 12:38, Chao Peng wrote: +cpuid_count(PSR_CPUID_LEVEL_CAT, 0, eax, ebx, ecx, edx); +if ( ebx PSR_RESOURCE_TYPE_L3 ) +{ +cpuid_count(PSR_CPUID_LEVEL_CAT, 1, eax, ebx, ecx, edx); +info-cbm_len = (eax 0x1f) + 1; +info-cos_max = (edx 0x); + +info-enabled = 1; +printk(XENLOG_DEBUG CAT: enabled on socket %u, cos_max:%u, cbm_len:%u\n, + socket, info-cos_max, info-cbm_len); I would bump this to INFO, at least for socket 0. Similar to CMT, a user which has gone to the effort of enabling CAT will want some indication in the boot log without also having to bump the logging level. Which INFO wouldn't be sufficient for (at least not with the xen.org defaults, not sure about XenServer's). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC][PATCH 10/13] tools: extend XENMEM_set_memory_map
On Mon, Apr 13, 2015 at 10:09:51AM +0800, Chen, Tiejun wrote: [...] Hardcoded value? Yes. Actually, we intend to use this to present that lowmem entry, tools/firmware/hvmloader/e820.c: /* Low RAM goes here. Reserve space for special pages. */ ... e820[nr].addr = 0x10; I don't like the idea of having two hardcoded values in different locations. Please put this value into a header file and reference it here and in hvmloader. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On Mon, 13 Apr 2015, Jan Beulich wrote: On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: On 02.04.15 at 18:49, stefano.stabell...@eu.citrix.com wrote: On Wed, 25 Mar 2015, Jan Beulich wrote: When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. I don't think that it is a good idea for both Xen and Linux to access the command register simultaneously. Working around Linux in Xen doesn't sound like an optimal solution. Maybe we could just fix the pciback and that would be enough. I'm afraid that would just eliminate the specific case, but not the general issue. If we trust Dom0 to do the right thing, then I don't think there is a general issue to be solved. Dom0 can break the system at any time, I don't see any differences here, unless we have a plan to actually be able to handle a misbehaving dom0, in that case I am all for it. No, that gets us in the wrong direction. Dom0 can have legitimate reasons to have to clear memory or I/O decoding on a device at run time (even if current Linux doesn't do so). The more general problem we may need to solve is that of racing config space accesses (one by Dom0, the other by the hypervisor). But that's beyond this series' scope. This is why I was asking for a document that describes who is in charge of what and when. I don't think we can move forward without it. While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. Qemu is either part of Dom0, or doesn't play with devices directly. I don't understand the point you are trying to make here. If your intention is to point out that QEMU shouldn't be writing to the control register, as I wrote earlier, the current codebase disagrees with you and has been that way for years. And how are we going to deal with older unfixed QEMUs? So far we have been using the same policy for QEMU and the Dom0 kernel: Xen doesn't break them -- old Linux kernels and QEMUs are supposed to just work. I'm not sure that's really true for qemu, or if it is, then only by pure luck: This is false, it is so by design. QEMU has an internal libxc compatibility layer. The tool stack interface of the hypervisor as well as the libxc interfaces are subject to change between any two releases. QEMU knows how to cope with libxc interface changes. I view it as unavoidable to break older qemu here. I disagree and I am opposed to any patches series that would deliberately break QEMU. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] x86/efi: Reserve SMBIOS table region when EFI booting
On 02.04.15 at 10:24, ross.lagerw...@citrix.com wrote: Some EFI firmware implementations may place the SMBIOS table in RAM marked as BootServicesData, which Xen does not consider as reserved. When dom0 tries to access the SMBIOS, the region is not contained in the initial P2M and it crashes with a page fault. To fix this, reserve the SMBIOS region. Also, fix the memcmp check for existence of the SMBIOS As Andrew said, put the memcmp() fixes all in a separate patch, or fold the one remaining one in here. and the DMI checksum calculation. This I don't agree with, and even less so without explanation. The spec says 15h Intermediate Checksum BYTE Checksum of Intermediate Entry Point Structure (IEPS). This value, when added to all other bytes in the IEPS, results in the value 00h (using 8-bit addition calculations). Values in the IEPS are summed starting at offset 10h, for 0Fh bytes. I.e. I can't see why the sizeof() you replace would be wrong (other than for broken firmware). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On Mon, Apr 13, 2015 at 01:40:59PM +0100, Jan Beulich wrote: On 13.04.15 at 13:47, m...@redhat.com wrote: On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote: On 13.04.15 at 13:19, m...@redhat.com wrote: Yes Linux can't fix firmware 1st mode, but PCI express spec says what firmware should do in this case: IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices With 1.0a device Functions, 96 if the Unsupported Request Reporting Enable bit is set, the Function when operating as a Completer will send an uncorrectable error Message (if enabled) when a UR error is detected. On platforms where an uncorrectable error Message is handled as a System Error, this will break PC-compatible Configuration Space probing, so software/firmware on such platforms may need to avoid setting the Unsupported Request Reporting Enable bit. With device Functions implementing Role-Based Error Reporting, setting the Unsupported Request Reporting Enable bit will not interfere with PC-compatible Configuration Space probing, assuming that the severity for UR is left at its default of non-fatal. However, setting the Unsupported Request Reporting Enable bit will enable the Function to report UR errors detected with posted Requests, helping avoid this case for potential silent data corruption. On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. What I think you have is a very old 1.0a system, and you set Unsupported Request Reporting Enable. Can you confirm? No. In at least one of the two cases we got reports of the original problem, triggering the finding of this issue, this is a brand new one, only soon to become available publicly. Furthermore I'm being confused by the mention of PC-compatible config space probing above: The URs we talk about here don't result from config space accessed at all. OK. Can you please explain why does UR cause a system error then? It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't. Quite possible. Looking at the ITP log we were provided, the UR severity bit is clear (non-fatal), yet the error got surfaced to the OS as a fatal one (I would guess because it validly gets flagged as uncorrectable at the same time). Jan No, that's not valid. Can you check device capabilities register, offset 0x4 within pci express capability structure? Bit 15 is 15 Role-Based Error Reporting. Is it set? The spec says: 15 On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. -- MST ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv2 1/6] x86/hvm: don't include asm/spinlock.h
On 10/04/15 16:24, Andrew Cooper wrote: On 10/04/15 15:19, David Vrabel wrote: asm/spinlock.h should not be included directly. Signed-off-by: David Vrabel david.vra...@citrix.com s/asm/xen/g instead of a straight delete? Otherwise you are relying on pulling in xen/spinlock.h implicitly. None of these files declare a spinlock so getting the header implicitly is correct, IMO. David xen/arch/x86/hvm/hvm.c |1 - xen/arch/x86/hvm/svm/svm.c |1 - xen/arch/x86/hvm/vmx/vmx.c |1 - ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL
Jan Beulich writes (Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL): On 10.04.15 at 19:03, ian.jack...@eu.citrix.com wrote: These were the only problems in this flight. Under the circumstances I think we would be justified in force pushing to xen.git#master: version targeted for testing: xen 3dca8a17f639f4b53cd2eeb687cec4e9f83acf3d Yes please. Done. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL
On 13.04.15 at 15:55, ian.jack...@eu.citrix.com wrote: Jan Beulich writes (Re: [Xen-devel] [xen-unstable test] 50291: regressions - FAIL): On 10.04.15 at 19:03, ian.jack...@eu.citrix.com wrote: These were the only problems in this flight. Under the circumstances I think we would be justified in force pushing to xen.git#master: version targeted for testing: xen 3dca8a17f639f4b53cd2eeb687cec4e9f83acf3d Yes please. Done. Thanks! Perhaps similarly consider flight 50355 a good one for 4.5 then? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] tcp: Allow sk_wmem_alloc to exceed sysctl_tcp_limit_output_bytes
On Mon, 2015-04-13 at 14:46 +0100, David Vrabel wrote: And the proof-of-concept patch for idea (b) I used was: @@ -552,6 +552,8 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) goto drop; } +skb_orphan(skb); + page = virt_to_page(skb-data); offset = offset_in_page(skb-data); len = skb_headlen(skb); No. This a bunch of allocations and a full memcpy of all the frags. skb_orphan() does nothing like that. But the main concern here is it basically breaks back pressure. And you do not want this, unless there is no other choice. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen-pt: Fix bug cause PCI devices re-attach failed
Use the option like 'pci=[ '07:10,1', '0b:10.1', '81:10.1']' in HVM guest configuration file to assign more than one VF PCI devices to a guest, after the guest boot up, detach the VFs in sequence by 'xl pci-detach $DOM_ID $VF_BDF', and then attach the VFs by 'xl pci-attach $DOM_ID $VF_BDF' in sequence, an error message will be reported like following: libxl: error: libxl_qmp.c:287:qmp_handle_error_response: receive an error message from QMP server: Duplicate ID 'pci-pt-81_10.1' for device. The error message will be printed in two cases: 1. Attach and detach very quickly. The message will disappear if retry, it is expected because of the asynchronous unplug mechanism. 2. Do the attach and detach operation with a time interval. eg. 10s. The error message will not disappear if retry, in this case, it's a bug. In the 'xen_pt_region_add' and 'xen_pt_region_del', we should only care about the 'xen-pci-pt-*' memory region, this can avoid the region's reference count is not equal with the dereference count when the device is detached and prevent the device's related QemuOpts object from being released properly, and then trigger the bug when the device is re-attached. I sent a patch to fix a similar bug before, but the patch could not fix the issue completely. Signed-off-by: Liang Li liang.z...@intel.com --- hw/xen/xen_pt.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index f2893b2..8aab769 100644 --- a/hw/xen/xen_pt.c +++ b/hw/xen/xen_pt.c @@ -61,6 +61,7 @@ #include qemu/range.h #include exec/address-spaces.h +#define XEN_PT_NAME_PREFIX xen-pci-pt #define XEN_PT_NR_IRQS (256) static uint8_t xen_pt_mapped_machine_irq[XEN_PT_NR_IRQS] = {0}; @@ -585,6 +586,10 @@ static void xen_pt_region_update(XenPCIPassthroughState *s, static void xen_pt_region_add(MemoryListener *l, MemoryRegionSection *sec) { +if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX, +sizeof(XEN_PT_NAME_PREFIX))) { +return; +} XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState, memory_listener); @@ -594,6 +599,11 @@ static void xen_pt_region_add(MemoryListener *l, MemoryRegionSection *sec) static void xen_pt_region_del(MemoryListener *l, MemoryRegionSection *sec) { +if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX, +sizeof(XEN_PT_NAME_PREFIX))) { +return; +} + XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState, memory_listener); @@ -603,6 +613,10 @@ static void xen_pt_region_del(MemoryListener *l, MemoryRegionSection *sec) static void xen_pt_io_region_add(MemoryListener *l, MemoryRegionSection *sec) { +if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX, +sizeof(XEN_PT_NAME_PREFIX))) { +return; +} XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState, io_listener); @@ -612,6 +626,10 @@ static void xen_pt_io_region_add(MemoryListener *l, MemoryRegionSection *sec) static void xen_pt_io_region_del(MemoryListener *l, MemoryRegionSection *sec) { +if (strncmp(sec-mr-name, XEN_PT_NAME_PREFIX, +sizeof(XEN_PT_NAME_PREFIX))) { +return; +} XenPCIPassthroughState *s = container_of(l, XenPCIPassthroughState, io_listener); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: +void __init __pv_init_lock_hash(void) +{ + int pv_hash_size = 4 * num_possible_cpus(); + + if (pv_hash_size (1U LFSR_MIN_BITS)) + pv_hash_size = (1U LFSR_MIN_BITS); + /* +* Allocate space from bootmem which should be page-size aligned +* and hence cacheline aligned. +*/ + pv_lock_hash = alloc_large_system_hash(PV qspinlock, + sizeof(struct pv_hash_bucket), + pv_hash_size, 0, HASH_EARLY, + pv_lock_hash_bits, NULL, + pv_hash_size, pv_hash_size); pv_taps = lfsr_taps(pv_lock_hash_bits); I don't understand what you meant here. Let me explain (even though I propose taking all the LFSR stuff out). pv_lock_hash_bit is a runtime variable, therefore it cannot compile time evaluate the forest of if statements required to compute the taps value. Therefore its best to compute the taps _once_, and this seems like a good place to do so. + goto done; + } + } + + hash = lfsr(hash, pv_lock_hash_bits, 0); Since pv_lock_hash_bits is a variable, you end up running through that massive if() forest to find the corresponding tap every single time. It cannot compile-time optimize it. The minimum bits size is now 8. So unless the system has more than 64 vCPUs, it will get the right value in the first if statement. Still, no reason to not pre-compute the taps value, its simple enough. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] More force pushes (xen-4.5-testing, xen-unstable, qemu-mainline)
Here are some more which I propose to force push. In each case I show all the blocking tests mentioned in the osstest test report. osstest service user writes ([xen-4.5-testing test] 50373: regressions - FAIL): flight 50373 xen-4.5-testing real [real] ... test-amd64-i386-freebsd10-amd64 14 guest-localmigrate/x10 fail in 50317 REGR. vs. 50268 ... version targeted for testing: xen 0b754fb3ed6b7b6d0f2e1f7c1877d3c0a7da2168 osstest service user writes ([xen-unstable test] 50390: regressions - FAIL): flight 50390 xen-unstable real [real] test-amd64-i386-freebsd10-i386 13 guest-localmigrate fail REGR. vs. 36514 test-amd64-i386-freebsd10-amd64 15 guest-localmigrate.2 fail REGR. vs. 36514 ... version targeted for testing: xen 123c7793797502b222300eb710cd3873dcca41ee osstest service user writes ([qemu-mainline test] 50391: regressions - FAIL): flight 50391 qemu-mainline real [real] ... test-amd64-i386-freebsd10-i386 13 guest-localmigrate fail REGR. vs. 36709 test-amd64-i386-freebsd10-amd64 15 guest-localmigrate.2 fail REGR. vs. 36709 version targeted for testing: qemuu58c24a4775ef7ccf16cfcd695753dfdf149fe29d These are all FreeBSD migration tests. These seem to be failing much more often in the new colo. As I say I don't know why this is. We hope to fix it with the FreeBSD 10.1 update which Roger has submitted and which is currently in the osstest self-push-gate. In the general case they fail after a number of successful runs. In each case the symptom is that the guest simply doesn't come back onto the network after the one of the migrations. I double checked http://logs.test-lab.xenproject.org/osstest/logs/50391/test-amd64-i386-freebsd10-amd64/info.html and as with the earlier tests there doesn't seem to be any hints in any of the logs. These are all localhost migrations. This rules out a problem with the colo infrastructure dropping or not honouring gratuitous ARP or MAC address migration, since the guest's MAC and IP address remain (from the infrastructure's point of view) on the same switch port. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen-pt: Fix bug cause PCI devices re-attach failed
On 13/04/2015 16:12, Liang Li wrote: 2. Do the attach and detach operation with a time interval. eg. 10s. The error message will not disappear if retry, in this case, it's a bug. In the 'xen_pt_region_add' and 'xen_pt_region_del', we should only care about the 'xen-pci-pt-*' memory region, this can avoid the region's reference count is not equal with the dereference count when the device is detached and prevent the device's related QemuOpts object from being released properly, and then trigger the bug when the device is re-attached. This doesn't explain _which_ region is causing the bug and how. Assuming this is the right fix, should you instead move the memory_region_ref/unref pair from xen_pt_region_add/del after this conditional: if (bar == -1 (!s-msix || s-msix-mmio != mr)) { return; } in xen_pt_region_update? Paolo ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] x86/domctl: cleanup
On 01.04.15 at 17:31, andrew.coop...@citrix.com wrote: case XEN_DOMCTL_gettscinfo: -{ -xen_guest_tsc_info_t info; - -ret = -EINVAL; -if ( d == current-domain ) /* no domain_pause() */ -break; - -domain_pause(d); -tsc_get_info(d, info.tsc_mode, -info.elapsed_nsec, -info.gtsc_khz, -info.incarnation); -if ( copy_to_guest(domctl-u.tsc_info.out_info, info, 1) ) -ret = -EFAULT; +if ( d == currd ) /* no domain_pause() */ +ret = -EINVAL; else -ret = 0; -domain_unpause(d); -} -break; +{ +xen_guest_tsc_info_t info; + +domain_pause(d); +tsc_get_info(d, info.tsc_mode, + info.elapsed_nsec, + info.gtsc_khz, + info.incarnation); +domain_unpause(d); +copyback = 1; If you want to use copyback here, you need to pass pointers into domctl-u.tsc_info.out_info to tsc_get_info(). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 10/15] tools/libxc: x86 HVM save code
Andrew Cooper andrew.coop...@citrix.com writes: Save the x86 HVM specific parts of the domain. This is considerably simpler than an x86 PV domain. Only the HVM_CONTEXT and HVM_PARAMS records are needed. There is no need for any page normalisation. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/Makefile |1 + tools/libxc/xc_sr_common.h |1 + tools/libxc/xc_sr_save_x86_hvm.c | 220 ++ 3 files changed, 222 insertions(+) create mode 100644 tools/libxc/xc_sr_save_x86_hvm.c diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index fc1e668..ba124f4 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -58,6 +58,7 @@ GUEST_SRCS-$(CONFIG_X86) += xc_sr_common_x86.c GUEST_SRCS-$(CONFIG_X86) += xc_sr_common_x86_pv.c GUEST_SRCS-$(CONFIG_X86) += xc_sr_restore_x86_pv.c GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_pv.c +GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_hvm.c GUEST_SRCS-y += xc_sr_restore.c GUEST_SRCS-y += xc_sr_save.c GUEST_SRCS-y += xc_offline_page.c xc_compression.c diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 4f33839..ec14d59 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -286,6 +286,7 @@ struct xc_sr_context }; extern struct xc_sr_save_ops save_ops_x86_pv; +extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; diff --git a/tools/libxc/xc_sr_save_x86_hvm.c b/tools/libxc/xc_sr_save_x86_hvm.c new file mode 100644 index 000..0928f19 --- /dev/null +++ b/tools/libxc/xc_sr_save_x86_hvm.c @@ -0,0 +1,220 @@ +#include assert.h + +#include xc_sr_common_x86.h + +#include xen/hvm/params.h + +/* + * Query for the HVM context and write an HVM_CONTEXT record into the stream. + */ +static int write_hvm_context(struct xc_sr_context *ctx) +{ +xc_interface *xch = ctx-xch; +int rc, hvm_buf_size; +struct xc_sr_record hvm_rec = +{ +.type = REC_TYPE_HVM_CONTEXT, +}; + +hvm_buf_size = xc_domain_hvm_getcontext(xch, ctx-domid, 0, 0); +if ( hvm_buf_size 0 ) +{ +PERROR(Couldn't get HVM context size from Xen); +rc = -1; +goto out; +} + +hvm_rec.data = malloc(hvm_buf_size); +if ( !hvm_rec.data ) +{ +PERROR(Couldn't allocate memory); +rc = -1; +goto out; +} + +hvm_buf_size = xc_domain_hvm_getcontext(xch, ctx-domid, +hvm_rec.data, hvm_buf_size); +if ( hvm_buf_size 0 ) +{ +PERROR(Couldn't get HVM context from Xen); +rc = -1; +goto out; +} + +hvm_rec.length = hvm_buf_size; +rc = write_record(ctx, hvm_rec); +if ( rc 0 ) +{ +PERROR(error write HVM_CONTEXT record); +goto out; +} + + out: +free(hvm_rec.data); +return rc; +} + +/* + * Query for a range of HVM parameters and write an HVM_PARAMS record into the + * stream. + */ +static int write_hvm_params(struct xc_sr_context *ctx) +{ +static const unsigned int params[] = { +HVM_PARAM_STORE_PFN, +HVM_PARAM_IOREQ_PFN, +HVM_PARAM_BUFIOREQ_PFN, +HVM_PARAM_PAGING_RING_PFN, +HVM_PARAM_MONITOR_RING_PFN, +HVM_PARAM_SHARING_RING_PFN, +HVM_PARAM_VM86_TSS, +HVM_PARAM_CONSOLE_PFN, +HVM_PARAM_ACPI_IOPORTS_LOCATION, +HVM_PARAM_VIRIDIAN, +HVM_PARAM_IDENT_PT, +HVM_PARAM_PAE_ENABLED, +HVM_PARAM_VM_GENERATION_ID_ADDR, +HVM_PARAM_IOREQ_SERVER_PFN, +HVM_PARAM_NR_IOREQ_SERVER_PAGES, +}; + +xc_interface *xch = ctx-xch; +struct xc_sr_rec_hvm_params_entry entries[ARRAY_SIZE(params)]; +struct xc_sr_rec_hvm_params hdr = { +.count = 0, +}; +struct xc_sr_record rec = { +.type = REC_TYPE_HVM_PARAMS, +.length = sizeof(hdr), +.data = hdr, +}; +unsigned int i; +int rc; + +for ( i = 0; i ARRAY_SIZE(params); i++ ) +{ +uint32_t index = params[i]; +uint64_t value; + +rc = xc_hvm_param_get(xch, ctx-domid, index, value); +if ( rc ) +{ +PERROR(Failed to get HVMPARAM at index %u, index); +return rc; +} + +if ( value != 0 ) +{ +entries[hdr.count].index = index; +entries[hdr.count].value = value; +hdr.count++; +} +} While reviewing my 'soft reset' series Ian C raised a question about the unsafeness of sequential get/set of HVM params: http://lists.xen.org/archives/html/xen-devel/2015-01/msg01177.html In case the concern is valid I think the requirements for migration are
Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4
Monday, April 13, 2015, 2:21:21 PM, you wrote: On 13/04/15 13:14, Sander Eikelenboom wrote: Monday, April 13, 2015, 2:07:02 PM, you wrote: On 13/04/15 12:21, Sander Eikelenboom wrote: Monday, April 13, 2015, 11:50:51 AM, you wrote: On 13/04/15 10:39, Sander Eikelenboom wrote: Hi David, I seem to have spotted some trouble with a 4.0 dom0 kernel with the devel/for-linus-4.1 branch pulled on top. Does this remind you of any specific commits in the devel/for-linus-4.1 branch that could likely be involved that i could try to revert ? Yes. This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve performance of MMAPBATCH_V2) which makes the kernel try harder to map all GFNs instead of failing on the first one. I think this is qemu incorrectly trying to map GFNs. David Reverted that specific one, but still get those messages. You'll have to bisect it then. Because I don't see any other relevant commits in devel/for-linus-4.1 David Ok .. hmm first candidate of the bisect also looks interessting: [628c28eefd6f2cef03b212081b466ae43fd093a3] xen: unify foreign GFN map/unmap for auto-xlated physmap guests Unless your dom0 is PVH, no. David Nope .. but bisect is underway .. so hopefully we will know it soon enough :-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On 13/04/15 11:56, George Dunlap wrote: On Thu, Apr 9, 2015 at 5:36 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Thu, 9 Apr 2015, Eric Dumazet wrote: On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote: Hi all, I found a performance regression when running netperf -t TCP_MAERTS from an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the virtual machine are 30% slower than v3.18. Through bisection I found that the perf regression is caused by the prensence of the following commit in the guest kernel: commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 Author: Eric Dumazet eduma...@google.com Date: Sun Dec 7 12:22:18 2014 -0800 tcp: refine TSO autosizing [snip] I recently discussed this issue on netdev in the following thread: https://www.marc.info/?l=linux-netdevm=142738853820517 This commit restored original TCP Small Queue behavior, which is the first step to fight bufferbloat. Some network drivers are known to be problematic because of a delayed TX completion. [snip] Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it makes a difference ? A very big difference: echo 262144 /proc/sys/net/ipv4/tcp_limit_output_bytes brings us much closer to the original performance, the slowdown is just 8% echo 1048576 /proc/sys/net/ipv4/tcp_limit_output_bytes fills the gap entirely, same performance as before refine TSO autosizing What would be the next step for here? Should I just document this as an important performance tweaking step for Xen, or is there something else we can do? Is the problem perhaps that netback/netfront delays TX completion? Would it be better to see if that can be addressed properly, so that the original purpose of the patch (fighting bufferbloat) can be achieved while not degrading performance for Xen? Or at least, so that people get decent perfomance out of the box without having to tweak TCP parameters? I agree; reducing the completion latency should be the ultimate goal. However, that won't be easy, so we need a work-around in the short term. I don't like the idea of relying on documenting the recommendation to change tcp_limit_output_bytes; too many people won't read this advice and will expect the out-of-the-box defaults to be reasonable. Following Eric's pointers to where a similar problem had been experienced in wifi drivers, I came up with two proof-of-concept patches that gave a similar performance gain without any changes to sysctl parameters or core tcp/ip code. See https://www.marc.info/?l=linux-netdevm=142746161307283. I haven't yet received any feedback from the xen-netfront maintainers about whether those ideas could be reasonably adopted. Jonathan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On 13.04.15 at 13:47, m...@redhat.com wrote: On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote: On 13.04.15 at 13:19, m...@redhat.com wrote: Yes Linux can't fix firmware 1st mode, but PCI express spec says what firmware should do in this case: IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices With 1.0a device Functions, 96 if the Unsupported Request Reporting Enable bit is set, the Function when operating as a Completer will send an uncorrectable error Message (if enabled) when a UR error is detected. On platforms where an uncorrectable error Message is handled as a System Error, this will break PC-compatible Configuration Space probing, so software/firmware on such platforms may need to avoid setting the Unsupported Request Reporting Enable bit. With device Functions implementing Role-Based Error Reporting, setting the Unsupported Request Reporting Enable bit will not interfere with PC-compatible Configuration Space probing, assuming that the severity for UR is left at its default of non-fatal. However, setting the Unsupported Request Reporting Enable bit will enable the Function to report UR errors detected with posted Requests, helping avoid this case for potential silent data corruption. On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. What I think you have is a very old 1.0a system, and you set Unsupported Request Reporting Enable. Can you confirm? No. In at least one of the two cases we got reports of the original problem, triggering the finding of this issue, this is a brand new one, only soon to become available publicly. Furthermore I'm being confused by the mention of PC-compatible config space probing above: The URs we talk about here don't result from config space accessed at all. OK. Can you please explain why does UR cause a system error then? It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't. Quite possible. Looking at the ITP log we were provided, the UR severity bit is clear (non-fatal), yet the error got surfaced to the OS as a fatal one (I would guess because it validly gets flagged as uncorrectable at the same time). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
On 13.04.15 at 14:01, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: On 13.04.15 at 12:50, stefano.stabell...@eu.citrix.com wrote: On Mon, 13 Apr 2015, Jan Beulich wrote: While we trust Dom0 to not do outright bad things, the hypervisor should still avoid doing things that can go wrong due to the state a device is put (or left) in by Dom0. Xen should also avoid doing things that can go wrong because of the state a device is put in by QEMU or other components in the system. There isn't much room for Xen to play with. Qemu is either part of Dom0, or doesn't play with devices directly. I don't understand the point you are trying to make here. If your intention is to point out that QEMU shouldn't be writing to the control register, as I wrote earlier, the current codebase disagrees with you and has been that way for years. Yes, this was precisely my intention. Qemu having done a certain thing for many years doesn't mean this is correct, and shouldn't be fixed if broken. And how are we going to deal with older unfixed QEMUs? So far we have been using the same policy for QEMU and the Dom0 kernel: Xen doesn't break them -- old Linux kernels and QEMUs are supposed to just work. I'm not sure that's really true for qemu, or if it is, then only by pure luck: This is false, it is so by design. QEMU has an internal libxc compatibility layer. Which could necessarily only ever be updated after the fact (of a libxc change) and only ever in maintained branches. I.e. as widely or as narrow as fixing the issues here would require. I view it as unavoidable to break older qemu here. I disagree and I am opposed to any patches series that would deliberately break QEMU. If that was without also adjusting qemu I would understand you. Considering that the issues here haven't been brought up just yesterday, the lack of any substantial counter proposal on how to fix this is signaling to me that there are little alternatives. Or do you have any going beyond do it another way? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.2-testing test] 50395: regressions - FAIL
flight 50395 xen-4.2-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/50395/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-qemuu-freebsd10-i386 10 guest-start fail REGR. vs. 36512 test-i386-i386-pair21 guest-migrate/src_host/dst_host fail REGR. vs. 36512 test-amd64-i386-pair 21 guest-migrate/src_host/dst_host fail REGR. vs. 36512 Tests which are failing intermittently (not blocking): test-amd64-amd64-pair 9 xen-boot/src_host fail pass in 50375 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-localmigrate.2 fail pass in 50375 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemut-debianhvm-amd64 18 guest-start/debianhvm.repeat fail in 50375 blocked in 36512 Tests which did not succeed, but are not blocking: test-i386-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 50375 never pass test-amd64-amd64-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail never pass test-amd64-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-libvirt 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass test-i386-i386-xl-qemut-winxpsp3 16 guest-stop fail never pass test-i386-i386-libvirt 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-i386-i386-xl-qemuu-winxpsp3 16 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 16 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 16 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 16 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 16 guest-stop fail never pass test-i386-i386-xl-winxpsp3 16 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 16 guest-stop fail never pass build-amd64-rumpuserxen 5 rumpuserxen-buildfail never pass test-amd64-i386-xend-winxpsp3 20 leak-check/check fail never pass build-i386-rumpuserxen5 rumpuserxen-buildfail never pass test-amd64-i386-xend-qemut-winxpsp3 20 leak-check/checkfail never pass test-amd64-amd64-xl-win7-amd64 16 guest-stop fail never pass version targeted for testing: xen e3bfa4003ceaa2746cdd77655953ab2601acaf9c baseline version: xen 5bec01c19839e150e489dd04376c65f961830c86 People who touched revisions under test: Ian Campbell ian.campb...@citrix.com Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com Konrad Rzeszutek Wilk konrad.w...@oracle.com jobs: build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-xl pass test-amd64-i386-xl pass test-i386-i386-xlpass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-qemuu-freebsd10-amd64pass test-amd64-amd64-xl-qemuu-ovmf-amd64 fail test-amd64-i386-xl-qemuu-ovmf-amd64
Re: [Xen-devel] [PATCH RFC] tcp: Allow sk_wmem_alloc to exceed sysctl_tcp_limit_output_bytes
On 27/03/15 13:06, Jonathan Davies wrote: On 26/03/15 17:14, Eric Dumazet wrote: On Thu, 2015-03-26 at 16:46 +, Jonathan Davies wrote: Network drivers with slow TX completion can experience poor network transmit throughput, limited by hitting the sk_wmem_alloc limit check in tcp_write_xmit. The limit is 128 KB (by default), which means we are limited to two 64 KB skbs in-flight. This has been observed to limit transmit throughput with xen-netfront because its TX completion can be relatively slow compared to physical NIC drivers. There have been several modifications to the calculation of the sk_wmem_alloc limit in the past. Here is a brief history: * Since TSQ was introduced, the queue size limit was sysctl_tcp_limit_output_bytes. * Commit c9eeec26 (tcp: TSQ can use a dynamic limit) made the limit max(skb-truesize, sk-sk_pacing_rate 10). This allows more packets in-flight according to the estimated rate. * Commit 98e09386 (tcp: tsq: restore minimal amount of queueing) made the limit max_t(unsigned int, sysctl_tcp_limit_output_bytes, sk-sk_pacing_rate 10). This ensures at least sysctl_tcp_limit_output_bytes in flight but allowed more if rate estimation shows this to be worthwhile. * Commit 605ad7f1 (tcp: refine TSO autosizing) made the limit min_t(u32, max(2 * skb-truesize, sk-sk_pacing_rate 10), sysctl_tcp_limit_output_bytes). This meant that the limit can never exceed sysctl_tcp_limit_output_bytes, regardless of what rate estimation suggests. It's not clear from the commit message why this significant change was justified, changing sysctl_tcp_limit_output_bytes from being a lower bound to an upper bound. This patch restores the behaviour that allows the limit to grow above sysctl_tcp_limit_output_bytes according to the rate estimation. This has been measured to improve xen-netfront throughput from a domU to dom0 from 5.5 Gb/s to 8.0 Gb/s. Or, in the case of transmitting from one domU to another (on the same host), throughput rose from 2.8 Gb/s to 8.0 Gb/s. In the latter case, TX completion is especially slow, explaining the large improvement. These values were measured against 4.0-rc5 using iperf -c ip -i 1 using CentOS 7.0 VM(s) on Citrix XenServer 6.5 on a Dell R730 host with a pair of Xeon E5-2650 v3 CPUs. Fixes: 605ad7f184b6 (tcp: refine TSO autosizing) Signed-off-by: Jonathan Davies jonathan.dav...@citrix.com --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1db253e..3a49af8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2052,7 +2052,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, * One example is wifi aggregation (802.11 AMPDU) */ limit = max(2 * skb-truesize, sk-sk_pacing_rate 10); -limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes); +limit = max_t(u32, limit, sysctl_tcp_limit_output_bytes); if (atomic_read(sk-sk_wmem_alloc) limit) { set_bit(TSQ_THROTTLED, tp-tsq_flags); That will get a NACK from me and Google TCP team of course. Try your patch with low throughput flows, like 64kbps, GSO off... And observe TCP timestamps and rtt estimations, critical for vegas or any CC based on delay. I get line rate on 40Gbps NIC using this (low) limit of 2 TSO packets. This topic was discussed for wifi recently, and I suggested multiple ways to solve the problem on problematic drivers. There is a reason sysctl_tcp_limit_output_bytes exists : You can change its value to whatever you want. vi +652 Documentation/networking/ip-sysctl.txt tcp_limit_output_bytes - INTEGER Controls TCP Small Queue limit per tcp socket. TCP bulk sender tends to increase packets in flight until it gets losses notifications. With SNDBUF autotuning, this can result in a large amount of packets queued in qdisc/device on the local machine, hurting latency of other flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes limits the number of bytes on qdisc or device to reduce artificial RTT/cwnd and reduce bufferbloat. Default: 131072 Documentation is pretty clear : This is the max value, not a min one. Okay, thanks for your feedback. It wasn't clear to me from the commit message in 605ad7f1 (tcp: refine TSO autosizing) why tcp_limit_output_bytes changed from being a lower bound to an upper bound in that patch. But it's now clear from your reply that that's the behaviour you intend as correct. Thanks for drawing my attention to the wifi thread, which is helpful. As I understand it, the only available options for fixing the performance regression experienced by xen-netfront are:
Re: [Xen-devel] Dom0 linux 4.0 + devel/for-linus-4.1 branch: p2m.c:884:d0v0 gfn_to_mfn failed! gfn=ffffffff001ed type:4
On 13/04/15 13:14, Sander Eikelenboom wrote: Monday, April 13, 2015, 2:07:02 PM, you wrote: On 13/04/15 12:21, Sander Eikelenboom wrote: Monday, April 13, 2015, 11:50:51 AM, you wrote: On 13/04/15 10:39, Sander Eikelenboom wrote: Hi David, I seem to have spotted some trouble with a 4.0 dom0 kernel with the devel/for-linus-4.1 branch pulled on top. Does this remind you of any specific commits in the devel/for-linus-4.1 branch that could likely be involved that i could try to revert ? Yes. This will probably be 4e8c0c8c4bf3a (xen/privcmd: improve performance of MMAPBATCH_V2) which makes the kernel try harder to map all GFNs instead of failing on the first one. I think this is qemu incorrectly trying to map GFNs. David Reverted that specific one, but still get those messages. You'll have to bisect it then. Because I don't see any other relevant commits in devel/for-linus-4.1 David Ok .. hmm first candidate of the bisect also looks interessting: [628c28eefd6f2cef03b212081b466ae43fd093a3] xen: unify foreign GFN map/unmap for auto-xlated physmap guests Unless your dom0 is PVH, no. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On 13.04.15 at 14:47, m...@redhat.com wrote: On Mon, Apr 13, 2015 at 01:40:59PM +0100, Jan Beulich wrote: Quite possible. Looking at the ITP log we were provided, the UR severity bit is clear (non-fatal), yet the error got surfaced to the OS as a fatal one (I would guess because it validly gets flagged as uncorrectable at the same time). No, that's not valid. Can you check device capabilities register, offset 0x4 within pci express capability structure? Bit 15 is 15 Role-Based Error Reporting. Is it set? The spec says: 15 On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. Yes, that bit is set. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv2 1/6] x86/hvm: don't include asm/spinlock.h
On 13/04/15 14:13, David Vrabel wrote: On 10/04/15 16:24, Andrew Cooper wrote: On 10/04/15 15:19, David Vrabel wrote: asm/spinlock.h should not be included directly. Signed-off-by: David Vrabel david.vra...@citrix.com s/asm/xen/g instead of a straight delete? Otherwise you are relying on pulling in xen/spinlock.h implicitly. None of these files declare a spinlock so getting the header implicitly is correct, IMO. In which case, good riddance. Acked-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 10/15] tools/libxc: x86 HVM save code
On 13/04/15 13:28, Vitaly Kuznetsov wrote: Andrew Cooper andrew.coop...@citrix.com writes: +/* + * Query for a range of HVM parameters and write an HVM_PARAMS record into the + * stream. + */ +static int write_hvm_params(struct xc_sr_context *ctx) +{ +static const unsigned int params[] = { +HVM_PARAM_STORE_PFN, +HVM_PARAM_IOREQ_PFN, +HVM_PARAM_BUFIOREQ_PFN, +HVM_PARAM_PAGING_RING_PFN, +HVM_PARAM_MONITOR_RING_PFN, +HVM_PARAM_SHARING_RING_PFN, +HVM_PARAM_VM86_TSS, +HVM_PARAM_CONSOLE_PFN, +HVM_PARAM_ACPI_IOPORTS_LOCATION, +HVM_PARAM_VIRIDIAN, +HVM_PARAM_IDENT_PT, +HVM_PARAM_PAE_ENABLED, +HVM_PARAM_VM_GENERATION_ID_ADDR, +HVM_PARAM_IOREQ_SERVER_PFN, +HVM_PARAM_NR_IOREQ_SERVER_PAGES, +}; + +xc_interface *xch = ctx-xch; +struct xc_sr_rec_hvm_params_entry entries[ARRAY_SIZE(params)]; +struct xc_sr_rec_hvm_params hdr = { +.count = 0, +}; +struct xc_sr_record rec = { +.type = REC_TYPE_HVM_PARAMS, +.length = sizeof(hdr), +.data = hdr, +}; +unsigned int i; +int rc; + +for ( i = 0; i ARRAY_SIZE(params); i++ ) +{ +uint32_t index = params[i]; +uint64_t value; + +rc = xc_hvm_param_get(xch, ctx-domid, index, value); +if ( rc ) +{ +PERROR(Failed to get HVMPARAM at index %u, index); +return rc; +} + +if ( value != 0 ) +{ +entries[hdr.count].index = index; +entries[hdr.count].value = value; +hdr.count++; +} +} While reviewing my 'soft reset' series Ian C raised a question about the unsafeness of sequential get/set of HVM params: http://lists.xen.org/archives/html/xen-devel/2015-01/msg01177.html In case the concern is valid I think the requirements for migration are even stricter as we can migrate live. In case it is not, would it be possible to move your params[] array to some common place so it can be reused? This code currently mirrors what the old migration did. This was one area we deliberately didn't try to clean up (we were focusing on a functional replacement). I am not a fan of hvm params. They are used as a dumping ground for things which IMO should live elsewhere. The result is this mess, without any clear direction of who owns which parameter, which should move on migration, which shouldn't etc. There are many areas of the toolstack which independently poke at the parameters. I would be hesitant to blindly lift this code out, although it is probably the best start you will find. What is really needed is formal statement of who owns which hvm params, and what their purpose is. That would at least be a good starting point for evaluating questions like this. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On Mon, 2015-04-13 at 11:56 +0100, George Dunlap wrote: Is the problem perhaps that netback/netfront delays TX completion? Would it be better to see if that can be addressed properly, so that the original purpose of the patch (fighting bufferbloat) can be achieved while not degrading performance for Xen? Or at least, so that people get decent perfomance out of the box without having to tweak TCP parameters? Sure, please provide a patch, that does not break back pressure. But just in case, if Xen performance relied on bufferbloat, it might be very difficult to reach a stable equilibrium : Any small change in stack or scheduling might introduce a significant difference in 'raw performance'. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125
On 13/04/15 17:33, Ian Jackson wrote: Andrew Cooper writes ([Xen-devel] [PATCH] tools/libxc: Fix build of 32bit toolstacks on CentOS 5.x following XSA-125): gcc 4.1 of CentOS 5.x era does not like the typecheck in min() between uint64_t and unsigned long. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com As and when this is acked and pushed to #staging, we should update the advisory. This needs backporting to 4.5 To be clear, do you know whether this is a trivial cherry pick ? It is trivial to cherrypick. 4.4 is the problematic case which I did the backport for separately, but then that is trivial to cherrypick back to 4.2 ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] new functions libxl_bitmap_{or,and}
provide logical and and or of two bitmaps --- v.1 updated comments and format v.2 rewrote bitmap functions to manipulate bytes not bits Signed-off-by: Linda Jacobson lin...@jma3.com --- tools/libxl/libxl_utils.c | 74 +++ tools/libxl/libxl_utils.h | 5 2 files changed, 79 insertions(+) diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c index 9053b27..5c7178f 100644 --- a/tools/libxl/libxl_utils.c +++ b/tools/libxl/libxl_utils.c @@ -691,6 +691,80 @@ void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit) bitmap-map[bit / 8] = ~(1 (bit 7)); } +/* provide logical or and logical and of two bitmaps */ +int libxl_bitmap_or(libxl_ctx *ctx, libxl_bitmap *or_map, +libxl_bitmap *map1, libxl_bitmap *map2) +{ +GC_INIT(ctx); +int rc; +uint32_t i; +libxl_bitmap *lgmap; +libxl_bitmap *smap; + +if (map1-size map2-size) { +lgmap = map1; +smap = map2; +} +else { +lgmap = map2; +smap = map1; +} + +rc = libxl_bitmap_alloc(ctx, or_map, lgmap-size * 8); +if (rc) +goto out; + +/* + * if bitmaps aren't the same size, their union (logical or) will + * be size of larger bit map. Any bit past the end of the + * smaller bit map, will match the larger one. + */ +for (i = 0; i smap-size; i++) +or_map-map[i] = (smap-map[i] | lgmap-map[i]); + +for (i = smap-size; i lgmap-size; i++) +or_map-map[i] = lgmap-map[i]; + +out: +GC_FREE; +return rc; +} + +int libxl_bitmap_and(libxl_ctx *ctx, libxl_bitmap *and_map, + libxl_bitmap *map1, libxl_bitmap *map2) +{ +GC_INIT(ctx); +int rc; +uint32_t i; +libxl_bitmap *lgmap, *smap; + +if (map1-size map2-size) { +lgmap = map1; +smap = map2; +} +else { +lgmap = map2; +smap = map1; +} + + +rc = libxl_bitmap_alloc(ctx, and_map, smap-size * 8); +if (rc) +goto out; + +/* + * if bitmaps aren't same size, their 'and' will be size of + * smaller bit map + */ +for (i = 0; i and_map-size; i++) +and_map-map[i] = (lgmap-map[i] smap-map[i]); + +out: +GC_FREE; +return rc; + +} + int libxl_bitmap_count_set(const libxl_bitmap *bitmap) { int i, nr_set_bits = 0; diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h index acacdd9..1c0086b 100644 --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h @@ -90,6 +90,11 @@ int libxl_bitmap_test(const libxl_bitmap *bitmap, int bit); void libxl_bitmap_set(libxl_bitmap *bitmap, int bit); void libxl_bitmap_reset(libxl_bitmap *bitmap, int bit); int libxl_bitmap_count_set(const libxl_bitmap *cpumap); +/* or and and functions for two bitmaps */ +int libxl_bitmap_or(libxl_ctx *ctx, libxl_bitmap *or_map, +libxl_bitmap *map1, libxl_bitmap *map2); +int libxl_bitmap_and(libxl_ctx *ctx, libxl_bitmap *and_map, + libxl_bitmap *map1, libxl_bitmap *map2); char *libxl_bitmap_to_hex_string(libxl_ctx *ctx, const libxl_bitmap *cpumap); static inline void libxl_bitmap_set_any(libxl_bitmap *bitmap) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] tcp: Allow sk_wmem_alloc to exceed sysctl_tcp_limit_output_bytes
On 13/04/15 15:05, Eric Dumazet wrote: On Mon, 2015-04-13 at 14:46 +0100, David Vrabel wrote: And the proof-of-concept patch for idea (b) I used was: @@ -552,6 +552,8 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) goto drop; } +skb_orphan(skb); + page = virt_to_page(skb-data); offset = offset_in_page(skb-data); len = skb_headlen(skb); No. This a bunch of allocations and a full memcpy of all the frags. skb_orphan() does nothing like that. I think David accidently read the skb_orphan_frags() code. But the main concern here is it basically breaks back pressure. And you do not want this, unless there is no other choice. virtio_net already use's skb_orphan() in it's transmit path. It seems only fair that other virtual network drivers behave in the same way. There are no easy solutions to decrease the transmit latency for netback/netfront. We map the guest memory through to the backend to avoid memory copies. The frontend memory can only be freed once the network driver has completed transmitting the packet in the backend. Modern network drivers can be quite slow at freeing the skb's once transmitted (the packet is already on the wire as far as they are concerned) and this delay is compounded by needing the signal the completion of the transmit back to the frontend (by IPI in worst case). From a networking point of view, the backend is a switch. Is it OK to consider the packet to have been transmitted from the guest point of view once the backend is aware of the packet? This would help justify the skb_orphan() in the frontend. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: +static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) +{ + struct __qspinlock *l = (void *)lock; + struct qspinlock **lp = NULL; + struct pv_node *pn = (struct pv_node *)node; + int slow_set = false; + int loop; + + for (;;) { + for (loop = SPIN_THRESHOLD; loop; loop--) { + if (!READ_ONCE(l-locked)) + return; + + cpu_relax(); + } + + WRITE_ONCE(pn-state, vcpu_halted); + if (!lp) + lp = pv_hash(lock, pn); + /* +* lp must be set before setting _Q_SLOW_VAL +* +* [S] lp = lock[RmW] l = l-locked = 0 +* MB MB +* [S] l-locked = _Q_SLOW_VAL [L] lp +* +* Matches the cmpxchg() in pv_queue_spin_unlock(). +*/ + if (!slow_set + !cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) { + /* +* The lock is free and _Q_SLOW_VAL has never been +* set. Need to clear the hash bucket before getting +* the lock. +*/ + WRITE_ONCE(*lp, NULL); + return; + } else if (slow_set !READ_ONCE(l-locked)) + return; + slow_set = true; I'm somewhat puzzled by the slow_set thing; what is wrong with the thing I had, namely: if (!lp) { lp = pv_hash(lock, pn); /* * comment */ lv = cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL); if (lv != _Q_LOCKED_VAL) { /* we're woken, unhash and return */ WRITE_ONCE(*lp, NULL); return; } } + + pv_wait(l-locked, _Q_SLOW_VAL); If we get a spurious wakeup (due to device interrupts or random kick) we'll loop around but -locked will remain _Q_SLOW_VAL. The purpose of the slow_set flag is not about the lock value. It is to make sure that pv_hash_find() will always find a match. Consider the following scenario: cpu1cpu2cpu3 pv_wait spurious wakeup loop l-locked read _Q_SLOW_VAL pv_hash_find() unlock pv_hash() - same entry cmpxchg(l-locked) clear hash (?) Here, the entry for cpu3 will be removed leading to panic when pv_hash_find() can find the entry. So the hash entry can only be removed if the other cpu has no chance to see _Q_SLOW_VAL. Still confused. Afaict that cannot happen with my code. We only do the cmpxchg(, _Q_SLOW_VAL) _once_. Only on the first time around that loop will we hash the lock and set the slow flag. And cpu3 cannot come in on the same entry because we've not yet released the lock when we find and unhash. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory
If QEMU has called on xc_domain_setmaxmem to add more memory for option ROMs, domain restore needs to also increase the memory. Signed-off-by: Don Slutz dsl...@verizon.com --- To see the hvmloader loader issue: xl cre -p e1000x8.xfg xl save e1000x8 e1000x8.save xl restore e1000x8.save With e1000x8.xfg: --- builder = hvm device_model_args_hvm = [ -monitor, pty, -boot, menu=on, ] device_model_version = qemu-xen disk = [ /dev/etherd/e500.1,,xvda, /dev/etherd/e500.2,,xvdb, /dev/etherd/e500.3,,xvde, /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD1.iso,,hdc,cdrom, /local-isos/iso/centos/x86_64/CentOS-6.3-x86_64-bin-DVD2.iso,,hdd,cdrom, ] maxmem = 8192 memory = 8192 name = e1000x8 serial = pty tsc_mode = native_paravirt uuid = e5000105-3d83-c962-07ae-2bc46c3644a0 videoram = 16 vif = [ model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:a0, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:aa, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:b4, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:be, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:c8, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:d2, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:dc, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:e6, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:f0, model=e1000,bridge=xenbr0,mac=00:0c:29:86:44:fa, ] viridian = 0 xen_platform_pci = 1 mmio_hole = 2048 vcpus = 1 maxvcpus = 6 on_poweroff = preserve on_reboot = preserve on_watchdog = preserve on_crash = preserve --- tools/libxc/xc_domain_restore.c | 75 +++-- 1 file changed, 73 insertions(+), 2 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index 2ab9f46..28b4fa6 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -47,6 +47,13 @@ #include xen/hvm/ioreq.h #include xen/hvm/params.h +/* Leave some slack so that hvmloader does not complain about lack of + * memory at boot time (Could not allocate order=0 extent). + * Once hvmloader is modified to cope with that situation without + * printing warning messages, QEMU_SPARE_PAGES can be removed. + */ +#define QEMU_SPARE_PAGES 16 + struct restore_ctx { unsigned long max_mfn; /* max mfn of the current host machine */ unsigned long hvirt_start; /* virtual starting address of the hypervisor */ @@ -209,12 +216,44 @@ static int uncanonicalize_pagetable( if (!ctx-hvm ctx-superpages) rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns); else +{ +xc_domaininfo_t info; +unsigned long free_pages; + +if ((xc_domain_getinfolist(xch, dom, 1, info) != 1) || +(info.domain != dom)) { +ERROR(Failed xc_domain_getinfolist for batch (uncanonicalize_pagetable)\n); +errno = ENOMEM; +return 0; +} +free_pages = info.max_pages - info.tot_pages; +if (free_pages QEMU_SPARE_PAGES) { +free_pages -= QEMU_SPARE_PAGES; +} else { +free_pages = 0; +} +if (free_pages nr_mfns) +{ +DPRINTF(Adjust memory for batch (uncanonicalize_pagetable): free_pages=%lu nr_mfns=%d max_pages=%lu tot_pages=%lu max_mfn=%lu\n, +free_pages, nr_mfns, (unsigned long)info.max_pages, +(unsigned long)info.tot_pages, ctx-max_mfn); +if (xc_domain_setmaxmem(xch, dom, +((info.max_pages + nr_mfns - free_pages) + (XC_PAGE_SHIFT - 10))) 0) +{ +ERROR(Failed xc_domain_setmaxmem for batch (uncanonicalize_pagetable)\n); +errno = ENOMEM; +return 0; +} +} rc = xc_domain_populate_physmap_exact(xch, dom, nr_mfns, 0, 0, ctx-p2m_batch); +} if (rc) { -ERROR(Failed to allocate memory for batch.!\n); +ERROR(Failed to allocate memory for batch. rc=%d nr_mfns=%d!\n, + rc, nr_mfns); errno = ENOMEM; return 0; } @@ -1241,12 +1280,44 @@ static int apply_batch(xc_interface *xch, uint32_t dom, struct restore_ctx *ctx, if (!ctx-hvm ctx-superpages) rc = alloc_superpage_mfns(xch, dom, ctx, nr_mfns); else +{ +xc_domaininfo_t info; +unsigned long free_pages; + +if ((xc_domain_getinfolist(xch, dom, 1, info) != 1) || +(info.domain != dom)) { +ERROR(Failed xc_domain_getinfolist for apply_batch\n); +errno = ENOMEM; +
Re: [Xen-devel] [PATCH 1/1] x86: Set regs-entry_vector for early_page_fault
On 13/04/15 17:04, Don Slutz wrote: This changes: (XEN) Early fatal page fault at e008:82d080164252 (cr2=, ec=) (XEN) [ Xen-4.6-unstable x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:e008:[82d080164252] arch_domain_create+0x3e/0x4ef ... (XEN) Xen call trace: (XEN)[82d080164252] arch_domain_create+0x3e/0x4ef (XEN)[82d080105262] domain_create+0x384/0x556 (XEN)[82d0802a0de4] scheduler_init+0x1c4/0x244 (XEN)[82d0802be359] __start_xen+0x1d0e/0x22a1 (XEN)[82d080100067] __high_start+0x53/0x58 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) FATAL TRAP: vector = 0 (divide error) (XEN) [error_code=] , IN INTERRUPT CONTEXT (XEN) ... to: (XEN) Early fatal page fault at e008:82d080164252 (cr2=, ec=) (XEN) [ Xen-4.6-unstable x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:e008:[82d080164252] arch_domain_create+0x3e/0x4ef ... (XEN) Xen call trace: (XEN)[82d080164252] arch_domain_create+0x3e/0x4ef (XEN)[82d080105262] domain_create+0x384/0x556 (XEN)[82d0802a0de4] scheduler_init+0x1c4/0x244 (XEN)[82d0802be359] __start_xen+0x1d0e/0x22a1 (XEN)[82d080100067] __high_start+0x53/0x58 (XEN) (XEN) Faulting linear address: (XEN) Pagetable walk from : (XEN) L4[0x000] = 00083a1a6063 (XEN) L3[0x000] = 00083a1a5063 (XEN) L2[0x000] = 00083a1a4063 (XEN) L1[0x000] = (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) FATAL TRAP: vector = 14 (page fault) (XEN) [error_code=] , IN INTERRUPT CONTEXT (XEN) ... Signed-off-by: Don Slutz dsl...@verizon.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- xen/arch/x86/x86_64/entry.S | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S index 2d25d57..7e63e64 100644 --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -626,6 +626,7 @@ ENTRY(double_fault) .pushsection .init.text, ax, @progbits ENTRY(early_page_fault) +movl $TRAP_page_fault,4(%rsp) SAVE_ALL movq %rsp,%rdi call do_early_page_fault ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On 04/13/2015 11:09 AM, Peter Zijlstra wrote: On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: +__visible void __pv_queue_spin_unlock(struct qspinlock *lock) +{ + struct __qspinlock *l = (void *)lock; + struct pv_node *node; + + if (likely(cmpxchg(l-locked, _Q_LOCKED_VAL, 0) == _Q_LOCKED_VAL)) + return; + + /* +* The queue head has been halted. Need to locate it and wake it up. +*/ + node = pv_hash_find(lock); + smp_store_release(l-locked, 0); Ah yes, clever that. + /* +* At this point the memory pointed at by lock can be freed/reused, +* however we can still use the PV node to kick the CPU. +*/ + if (READ_ONCE(node-state) == vcpu_halted) + pv_kick(node-cpu); +} +PV_CALLEE_SAVE_REGS_THUNK(__pv_queue_spin_unlock); However I feel the PV_CALLEE_SAVE_REGS_THUNK thing belongs in the x86 code. That is why I originally put my version of the qspinlock_paravirt.h header file under arch/x86/include/asm. Maybe we should move it back there. Putting the thunk in arch/x86/kernel/kvm.c didn't work when you consider that the Xen code also need that. Well the function is 'generic' and belong here I think. Its just the PV_CALLEE_SAVE_REGS_THUNK thing that arch specific. Should have live in arch/x86/kernel/paravirt-spinlocks.c instead? Another alternative is to put the PV_CALLEE_SAVE_REGS_THUNK into a separate asm/qspinlock_paravirt.h file and included in the generic qspinlock_paravirt.h file. By putting the thunk with the __pv_queue_spin_unlock together in the same file, we can make the thunk and the unlock fast path (_Q_SLOW_VAL not set) go into two adjacent instruction cachelines. I think that may help a bit in term of performance. Cheers, Longman ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/8] Add core.sh and wrapper function
On Mon, 13 Apr 2015, George Dunlap wrote: On 04/10/2015 03:29 PM, Stefano Stabellini wrote: +arg_parse_cmd=\ +local -a args; +local _a; +local _vn; +local _m; + +_m=true; + +for _a in \\$@\ ; do +false echo \Evaluating \${_a} [[ \\${_a/=}\ = \\${_a}\ ]]\; +if \$_m [[ \\${_a/=}\ != \\${_a}\ ]] ; then +false echo Parameter; +_vn=\${_a%%=*}; +eval \local \$_vn\; +eval \\$_a\; +elif \$_m [[ \\${_a}\ == \--\ ]] ; then +false echo Separator; +_m=false; +else +false echo Argument; +_m=false; +args+=(\\$_a\); +fi; +done I am sorry but I cannot bear myself to introduce so much complexity into the world. raisin is supposed to be easy to read. If we end up actually needing something so powerful and so complex in the future we can import it then, but I think that at the moment we can do nicely without it. I can certainly understand the sentiment, but the idea behind this calling convention is actually to make it *easier* both to read and debug most of the source file, at the cost of two very complex, but well-tested and generic macros ($arg_parse and $requireargs). Consider some criteria: - Difficulty of writing or reading code that calls a function - Difficulty of writing a function which will be called - Availability of sensible defaults - Ability to make sure all necessary arguments have been specified - Passing function arguments into sub-functions - Global variables are nasty to debug Consider alternate ways of passing arguments into functions: 1. Positional i.e., cp SRC DEST In this model, it's easy to make sure that all necessary arguments have been specified; and there are no nasty global variables to work with. However is really hard to program and read because you have to remember what argument goes where. You also have to specify all possible arguments -- you can't have a default. And if you need to take an argument and pass it to another function you call, you have to repeat it again. 2. Option i.e., -v --components='a b' This is easier to read if you use long options (or common ones like -v or -y), and you can also have defaults. Additionally, you can have a certain amount of error checking -- if you get an argument you don't recognize you can throw an error. But you have to write a lot of boilerplate option processing at the top of each function; and if you have anything a bit complex (line --components= above), it makes it a lot harder to read. And you have to write new code every time, giving you more opportunity for bugs. Consider the argument parsing code you have in raise at the moment -- it assumes that you will run a command without any additional parameters. What if you wanted to do something like this: raise -v build xen The current code would have to be refactored. And you have to write a lot of custom code to check to make sure that all the required arguments have been passed in. And you have the same problem that if you need to pass an argument further down the call stack, you have to repeat it again. 3. Global variables i.e., VAR=foo function This is a bit easier to read in the sense that you're using full variable names for options. It's easier to write, since you don't have any parsing code. Additionally, variables can be passed further down the calling stack automatically without having to be explicitly copied around. In theory you should be able to have sensible local defaults. But you have all the problems with global variables being really hard to debug. 4. Prefix variable i.e., YES=y check-builddep This is a bit easier to read in the sense that you're using full variable names for options. It's easier to write, since you don't have any parsing code. Additionally, variables can be passed further down the calling stack automatically without having to be explicitly copied around. In theory you should be able to have sensible local defaults. I hadn't thought about this as a calling convention. It certainly has the advantage that it's build into bash and fairly well-understood by experienced users. But I do think it's a bit unnatural to have to prefix all the variables to the function. 5. VAR=VAL automatic parsing with inheritance This allows you to have a rich, consistent way to pass in arguments that are descriptive. It's easy to write code to parse the arguments and check to make sure you have all the ones you need: Just add $arg_parse and $requireargs at the top of your function. They are automatically declared local, so there's no worry about polluting the global namespace. They are automatically inherited, so you don't have to keep passing information further down the stack. And you can have sensible defaults. I think #5 is well worth the little bit of macro hackery confined to a
Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory
On 13/04/15 17:09, Don Slutz wrote: If QEMU has called on xc_domain_setmaxmem to add more memory for option ROMs, domain restore needs to also increase the memory. Signed-off-by: Don Slutz dsl...@verizon.com hvmloader has no interaction with xc_domain_restore(). It is xl's job to propagate the current memory as part of migration. If qemu has had to bump it up, this should be reflected in the domain config file. The migration path should not be hacked up to fix a higher level toolstack bug. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On 04/13/2015 10:47 AM, Peter Zijlstra wrote: On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: +void __init __pv_init_lock_hash(void) +{ + int pv_hash_size = 4 * num_possible_cpus(); + + if (pv_hash_size (1U LFSR_MIN_BITS)) + pv_hash_size = (1U LFSR_MIN_BITS); + /* +* Allocate space from bootmem which should be page-size aligned +* and hence cacheline aligned. +*/ + pv_lock_hash = alloc_large_system_hash(PV qspinlock, + sizeof(struct pv_hash_bucket), + pv_hash_size, 0, HASH_EARLY, + pv_lock_hash_bits, NULL, + pv_hash_size, pv_hash_size); pv_taps = lfsr_taps(pv_lock_hash_bits); I don't understand what you meant here. Let me explain (even though I propose taking all the LFSR stuff out). pv_lock_hash_bit is a runtime variable, therefore it cannot compile time evaluate the forest of if statements required to compute the taps value. Therefore its best to compute the taps _once_, and this seems like a good place to do so. OK, I got it. That make sense. + goto done; + } + } + + hash = lfsr(hash, pv_lock_hash_bits, 0); Since pv_lock_hash_bits is a variable, you end up running through that massive if() forest to find the corresponding tap every single time. It cannot compile-time optimize it. The minimum bits size is now 8. So unless the system has more than 64 vCPUs, it will get the right value in the first if statement. Still, no reason to not pre-compute the taps value, its simple enough. Still, we need to keep the hash_bits value as it will needed by the hashing function. I have taken out the lfsr code and use linear probing in the updated qspinlock patch that I am working on. However, we can always add that back in as an additional patch. Cheers, Longman ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/2] x86/domctl: cleanup
* latch curr/currd once at start * drop redundant ret = 0 and braces * use copyback = 1 when appropriate * move break statements inside case-specific braced scopes * don't bother check for NULL before calling xfree() * eliminate trailing whitespace * Xen style corrections Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com --- v2: Fix use of copyback in XEN_DOMCTL_gettscinfo --- xen/arch/x86/domctl.c | 274 - 1 file changed, 111 insertions(+), 163 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 9450795..b6df23a 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -1,6 +1,6 @@ /** * Arch-specific domctl.c - * + * * Copyright (c) 2002-2006, K A Fraser */ @@ -39,7 +39,7 @@ static int gdbsx_guest_mem_io( domid_t domid, struct xen_domctl_gdbsx_memio *iop) -{ +{ ulong l_uva = (ulong)iop-uva; iop-remain = dbg_rw_mem( (dbgva_t)iop-gva, (dbgbyte_t *)l_uva, iop-len, domid, @@ -53,6 +53,8 @@ long arch_do_domctl( struct xen_domctl *domctl, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { +struct vcpu *curr = current; +struct domain *currd = curr-domain; long ret = 0; bool_t copyback = 0; unsigned long i; @@ -61,15 +63,13 @@ long arch_do_domctl( { case XEN_DOMCTL_shadow_op: -{ ret = paging_domctl(d, domctl-u.shadow_op, guest_handle_cast(u_domctl, void), 0); if ( ret == -ERESTART ) return hypercall_create_continuation(__HYPERVISOR_arch_1, h, u_domctl); copyback = 1; -} -break; +break; case XEN_DOMCTL_ioport_permission: { @@ -79,8 +79,7 @@ long arch_do_domctl( if ( (fp + np) = fp || (fp + np) MAX_IOPORTS ) ret = -EINVAL; -else if ( !ioports_access_permitted(current-domain, -fp, fp + np - 1) || +else if ( !ioports_access_permitted(currd, fp, fp + np - 1) || xsm_ioport_permission(XSM_HOOK, d, fp, fp + np - 1, allow) ) ret = -EPERM; else if ( allow ) @@ -89,8 +88,8 @@ long arch_do_domctl( ret = ioports_deny_access(d, fp, fp + np - 1); if ( !ret ) memory_type_changed(d); +break; } -break; case XEN_DOMCTL_getpageframeinfo: { @@ -127,16 +126,16 @@ long arch_do_domctl( break; } } - + put_page(page); } copyback = 1; +break; } -break; case XEN_DOMCTL_getpageframeinfo3: -if (!has_32bit_shinfo(current-domain)) +if ( !has_32bit_shinfo(currd) ) { unsigned int n, j; unsigned int num = domctl-u.getpageframeinfo3.num; @@ -150,7 +149,7 @@ long arch_do_domctl( break; } -page = alloc_domheap_page(current-domain, MEMF_no_owner); +page = alloc_domheap_page(currd, MEMF_no_owner); if ( !page ) { ret = -ENOMEM; @@ -251,7 +250,7 @@ long arch_do_domctl( ret = -ENOMEM; break; } - + ret = 0; for ( n = 0; n num; ) { @@ -266,9 +265,9 @@ long arch_do_domctl( ret = -EFAULT; break; } - + for ( j = 0; j k; j++ ) -{ +{ struct page_info *page; unsigned long gfn = arr32[j]; @@ -320,8 +319,8 @@ long arch_do_domctl( } free_xenheap_page(arr32); +break; } -break; case XEN_DOMCTL_getmemlist: { @@ -329,7 +328,8 @@ long arch_do_domctl( uint64_t mfn; struct page_info *page; -if ( unlikely(d-is_dying) ) { +if ( unlikely(d-is_dying) ) +{ ret = -EINVAL; break; } @@ -346,7 +346,7 @@ long arch_do_domctl( * rather than trying to fix it we restrict it for the time being. */ if ( /* No nested locks inside copy_to_guest_offset(). */ - paging_mode_external(current-domain) || + paging_mode_external(currd) || /* Arbitrary limit capping processing time. */ max_pfns GB(4) / PAGE_SIZE ) { @@ -375,8 +375,8 @@ long arch_do_domctl( domctl-u.getmemlist.num_pfns = i; copyback = 1; +break; } -break; case XEN_DOMCTL_hypercall_init: { @@ -403,15 +403,15 @@ long arch_do_domctl( unmap_domain_page(hypercall_page);
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On 04/13/2015 11:08 AM, Peter Zijlstra wrote: On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: +static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node) +{ + struct __qspinlock *l = (void *)lock; + struct qspinlock **lp = NULL; + struct pv_node *pn = (struct pv_node *)node; + int slow_set = false; + int loop; + + for (;;) { + for (loop = SPIN_THRESHOLD; loop; loop--) { + if (!READ_ONCE(l-locked)) + return; + + cpu_relax(); + } + + WRITE_ONCE(pn-state, vcpu_halted); + if (!lp) + lp = pv_hash(lock, pn); + /* +* lp must be set before setting _Q_SLOW_VAL +* +* [S] lp = lock[RmW] l = l-locked = 0 +* MB MB +* [S] l-locked = _Q_SLOW_VAL [L] lp +* +* Matches the cmpxchg() in pv_queue_spin_unlock(). +*/ + if (!slow_set + !cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) { + /* +* The lock is free and _Q_SLOW_VAL has never been +* set. Need to clear the hash bucket before getting +* the lock. +*/ + WRITE_ONCE(*lp, NULL); + return; + } else if (slow_set !READ_ONCE(l-locked)) + return; + slow_set = true; I'm somewhat puzzled by the slow_set thing; what is wrong with the thing I had, namely: if (!lp) { lp = pv_hash(lock, pn); /* * comment */ lv = cmpxchg(l-locked, _Q_LOCKED_VAL, _Q_SLOW_VAL); if (lv != _Q_LOCKED_VAL) { /* we're woken, unhash and return */ WRITE_ONCE(*lp, NULL); return; } } + + pv_wait(l-locked, _Q_SLOW_VAL); If we get a spurious wakeup (due to device interrupts or random kick) we'll loop around but -locked will remain _Q_SLOW_VAL. The purpose of the slow_set flag is not about the lock value. It is to make sure that pv_hash_find() will always find a match. Consider the following scenario: cpu1cpu2cpu3 pv_wait spurious wakeup loop l-locked read _Q_SLOW_VAL pv_hash_find() unlock pv_hash()- same entry cmpxchg(l-locked) clear hash (?) Here, the entry for cpu3 will be removed leading to panic when pv_hash_find() can find the entry. So the hash entry can only be removed if the other cpu has no chance to see _Q_SLOW_VAL. Still confused. Afaict that cannot happen with my code. We only do the cmpxchg(, _Q_SLOW_VAL) _once_. Only on the first time around that loop will we hash the lock and set the slow flag. And cpu3 cannot come in on the same entry because we've not yet released the lock when we find and unhash. Maybe I am not clear in my illustration. More than one locks can be hashed to the same value. So cpu3 is accessing a different lock which has the same hashed value as the lock used by cpu1 and cpu2. Anyway, I remove the slow_set flag by unrolling the retry loop so that after pv_wait(), it goes into the 2nd loop instead of going back to the top. As a result, cmpxchg and pv_hash can only be called once. Cheers, Longman ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 1/5] sysctl: Make XEN_SYSCTL_numainfo a little more efficient
On 07.04.15 at 18:57, boris.ostrov...@oracle.com wrote: On 04/07/2015 12:04 PM, Andrew Cooper wrote: On 06/04/15 23:12, Boris Ostrovsky wrote: A number of changes to XEN_SYSCTL_numainfo interface: * Make sysctl NUMA topology query use fewer copies by combining some fields into a single structure and copying distances for each node in a single copy. * NULL meminfo and distance handles are a request for maximum number of nodes (num_nodes). If those handles are valid and num_nodes is is smaller than the number of nodes in the system then -ENOBUFS is returned (and correct num_nodes is provided) * Instead of using max_node_index for passing number of nodes keep this value in num_nodes: almost all uses of max_node_index required adding or subtracting one to eventually get to number of nodes anyway. * Replace INVALID_NUMAINFO_ID with XEN_INVALID_MEM_SZ and add XEN_INVALID_NODE_DIST. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Ian Campbell ian.campb...@citrix.com This subtly changes the behaviour of XEN_SYSCTL_numainfo with regards to NULL guest handles. Previously, a caller was able to select which information they wanted by choosing which guest handles were non-NULL. With the new semantics, the caller must pass both ni-meminfo and ni-distance to get either bit of information. Each copy_to_guest_offset() should be gated on a !guest_handle_is_null() so a caller can request meminfo information without distance information. Currently the caller, in fact, can have either of three pointers (node_to_memsize, node_to_memfree or node_to_node_distance) as NULL and the hypervisor will fill whichever pointer is valid. Because I put the first two together into a struct we are already changing behavior in that regard. Not to mention that having all three as NULL now has new meaning as well. I thought that either both pointers should be valid or neither. If people disagree I can change this. I agree with Andrew's view, fwiw. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/1] x86/hvm: prevent hvm_free_ioreq_gmfn() clobber of arbitrary memory
This will prevent a hard to track down bug. It is related to commit ffdb781883abd3215287ba1b1853f3d437d1240c x86/hvm: prevent gcc uninitialised var warning Which will preset gmfn to ~0UL. This code will check if there is a path where bufioreq_pfn is passed to hvm_free_ioreq_gmfn() and it is uninitialised, the BUG_ON will report it. Reported-by: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Don Slutz dsl...@verizon.com --- xen/arch/x86/hvm/hvm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index ade99c0..0abac7c 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -536,8 +536,9 @@ static int hvm_alloc_ioreq_gmfn(struct domain *d, unsigned long *gmfn) static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn) { -unsigned int i = gmfn - d-arch.hvm_domain.ioreq_gmfn.base; +unsigned long i = gmfn - d-arch.hvm_domain.ioreq_gmfn.base; +BUG_ON(i = sizeof(d-arch.hvm_domain.ioreq_gmfn.mask) * 8); clear_bit(i, d-arch.hvm_domain.ioreq_gmfn.mask); } -- 1.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel