Re: [Xen-devel] [PATCH v5 4/4] MAINTAINERS: support for xen-access
On Thu, Jul 9, 2015 at 9:17 AM, Razvan Cojocaru rcojoc...@bitdefender.com wrote: On 07/09/2015 04:14 PM, Tamas K Lengyel wrote: Add tools/tests/xen-acess to the supported list under VM EVENT/MEM ACCESS. Signed-off-by: Tamas K Lengyel tleng...@novetta.com --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1e74688..dae0aa3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -389,6 +389,7 @@ F:xen/common/mem_access.c F: xen/arch/x86/hvm/event.c F: xen/arch/x86/monitor.c F: xen/arch/x86/vm_event.c +F: tools/tests/xen-access This seems to use the wrong indentation type, it doesn't align with the rest. With that fixed, Acked-by: Razvan Cojocaru rcojoc...@bitdefender.com Cheers, Razvan Ah what, it's the only spot where it's not using whitespace for indentation? =) Nice catch. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Pass-through in Xen ARM - Draft 2.
On 09/07/2015 12:30, Manish Jaggi wrote: On Thursday 09 July 2015 01:38 PM, Julien Grall wrote: On 09/07/2015 08:13, Manish Jaggi wrote: If this was a domctl there might be scope for accepting an implementation which made assumptions such as sbdf == deviceid. However I'd still like to see this topic given proper treatment in the design and not just glossed over with this is how ThunderX does things. I got your point. Or maybe the solution is simple and we should just do it now -- i.e. can we add a new field to the PHYSDEVOP_pci_host_bridge_add argument struct which contains the base deviceid for that bridge deviceId would be same as sbdf. As we dont have a way to translate sbdf to deviceID. I think we have to be clear in this design document about the different meaning. When the Device Tree is used, it's assumed that the deviceID will be equal to the requester ID and not the sbdf. Does SMMU v2 has a concept of requesterID. I see requesterID term in SMMUv3 The requester ID is part of the PCI spec and not the SMMU. The version of the SMMUv2 spec doesn't mention anything about PCI. I suspect this is because the spec has been written before the introduced PCI. And therefore this is implementation defined. Linux provides a function (pci_for_each_dma_alias) which will return a requester ID for a given PCI device. It appears that the BDF (the 's' of sBDF is only internal to Linux and not part of the hardware) is equal to the requester ID on your platform but we can't assume it for anyone else. so you mean requesterID = pci_for_each_dma_alias(sbdf) Yes. When we have a PCI in hand, we have to find the requester ID for this device. That is the question. How to map requesterID to sbdf See above. On Once ? Yes. we have it we can deduce the streamID and the deviceID. The way to do it will depend on whether we use device tree or ACPI: - For device tree, the streamID, and deviceID will be equal to the requester ID what do you think should be streamID when a device is PCI EP and is enumerated. Also per ARM SMMU 2.0 spec StreamID is implementation specific. As per SMMUv3 specs For PCI, it is intended that StreamID is generated from the PCI RequesterID. The generation function may be 1:1 where one Root Complex is hosted by one SMMU I think my sentence For device tree, the streamID, and deviceID will be equal to the requester ID is pretty clear. FWIW, this is the solution chosen for Linux: Assume Stream ID == Requester ID for now. We need a way to describe the ID mappings in FDT (see arm_smmu_add_pci_device in drivers/iommu/arm-smmu.c). You can refer to my point below about ACPI tables. The solution would be exactly the same. If we have a requester ID in hand we can do pretty much everything. The whole point of my previous email is to give insight about what we need and what we can deduce based on firmware tables. I didn't cover anything implementation details. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/13] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.
On 01.07.15 at 20:09, edmund.h.wh...@intel.com wrote: @@ -1830,6 +1831,20 @@ static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v) vmx_vmcs_exit(v); } +static int vmx_vcpu_emulate_vmfunc(struct cpu_user_regs *regs) +{ +int rc = X86EMUL_EXCEPTION; +struct vcpu *v = current; curr +if ( !cpu_has_vmx_vmfunc altp2m_active(v-domain) + regs-eax == 0 + p2m_switch_vcpu_altp2m_by_id(v, (uint16_t)regs-ecx) ) +{ +rc = X86EMUL_OKAY; +} Pointless braces. @@ -2095,6 +2112,12 @@ static void vmx_invlpg_intercept(unsigned long vaddr) vpid_sync_vcpu_gva(curr, vaddr); } +static int vmx_vmfunc_intercept(struct cpu_user_regs *regs) +{ +gdprintk(XENLOG_ERR, Failed guest VMFUNC execution\n); +return X86EMUL_EXCEPTION; +} + static int vmx_cr_access(unsigned long exit_qualification) { struct vcpu *curr = current; @@ -3245,6 +3268,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) update_guest_eip(); break; +case EXIT_REASON_VMFUNC: +if ( vmx_vmfunc_intercept(regs) == X86EMUL_OKAY ) +update_guest_eip(); +else +hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE); +break; The two changes don't fit together (and continue to look pointless considering that the helper unconditionally returns X86EMUL_EXCEPTION): != X86EMUL_OKAY doesn't mean == X86EMUL_EXCEPTION. --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -3815,28 +3815,40 @@ x86_emulate( case 0x01: /* Grp7 */ { struct segment_register reg; unsigned long base, limit, cr0, cr0w; +uint64_t tsc_aux; -if ( modrm == 0xdf ) /* invlpga */ +switch( modrm ) { -generate_exception_if(!in_protmode(ctxt, ops), EXC_UD, -1); -generate_exception_if(!mode_ring0(), EXC_GP, 0); -fail_if(ops-invlpg == NULL); -if ( (rc = ops-invlpg(x86_seg_none, truncate_ea(_regs.eax), - ctxt)) ) -goto done; -break; -} - -if ( modrm == 0xf9 ) /* rdtscp */ -{ -uint64_t tsc_aux; -fail_if(ops-read_msr == NULL); -if ( (rc = ops-read_msr(MSR_TSC_AUX, tsc_aux, ctxt)) != 0 ) -goto done; -_regs.ecx = (uint32_t)tsc_aux; -goto rdtsc; +case 0xdf: /* invlpga AMD */ Case labels indented the same as the containing switch() please. +case 0xd4: /* vmfunc */ +generate_exception_if( +(lock_prefix | +rep_prefix() | +(vex.pfx == vex_66)), +EXC_UD, -1); FWIW, while Andrew pointed out that this doesn't match the doc, I suppose it's the doc that's wrong here, so I would be inclined to suggest keeping it as is. What I don't like though is the formatting - why does this need to span across 5 lines? +fail_if(ops-vmfunc == NULL); +if ( (rc = ops-vmfunc(ctxt) != X86EMUL_OKAY) ) +goto done; +break; +default: +goto continue_grp7; } +break; +continue_grp7: Labels indented by at least one space please. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59207: regressions - FAIL
osstest service owner writes ([xen-unstable test] 59207: regressions - FAIL): flight 59207 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59207/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: ... test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot \ fail REGR. vs. 58965 Xen crash in set_cpu_sibling_map AFAICT. osstest reports failures in PV jobs too. ARM seems unaffected. Ian. Jul 9 08:38:43.197031 (XEN) If possible, disable SATA Combined mode in BIOS or contact your vendor for BIOS update. Jul 9 08:38:43.205026 (XEN) AMD-Vi: Error initialization Jul 9 08:38:43.213017 (XEN) I/O virtualisation disabled Jul 9 08:38:43.221011 (XEN) [ Xen-4.6-unstable x86_64 debug=y Not tainted ] Jul 9 08:38:43.221048 (XEN) CPU:0 Jul 9 08:38:43.229024 (XEN) RIP:e008:[82d08019c044] set_cpu_sibling_map+0x50/0x391 Jul 9 08:38:43.237034 (XEN) RFLAGS: 00010293 CONTEXT: hypervisor Jul 9 08:38:43.245023 (XEN) rax: rbx: 82d08034cf40 rcx: Jul 9 08:38:43.253022 (XEN) rdx: 0001 rsi: rdi: Jul 9 08:38:43.261021 (XEN) rbp: 82d080307e28 rsp: 82d080307dd8 r8: 830219028930 Jul 9 08:38:43.269024 (XEN) r9: r10: 0004 r11: 82d0803140d0 Jul 9 08:38:43.277033 (XEN) r12: 83021903afe0 r13: 82d08034cec0 r14: Jul 9 08:38:43.285020 (XEN) r15: 0005 cr0: 8005003b cr4: 000406e0 Jul 9 08:38:43.293007 (XEN) cr3: dfcb4000 cr2: Jul 9 08:38:43.293040 (XEN) ds: es: fs: gs: ss: cs: e008 Jul 9 08:38:43.301037 (XEN) Xen stack trace from rsp=82d080307dd8: Jul 9 08:38:43.309032 (XEN)82d080307e28 0020 83021903afe0 82d08034cec0 Jul 9 08:38:43.317031 (XEN)82d08034cec0 82d08034cf40 83021903afe0 82d08034cec0 Jul 9 08:38:43.325031 (XEN)82d08034cec0 0005 82d080307e48 82d0802dae3e Jul 9 08:38:43.333029 (XEN)82d08030 82d08030 82d080307f08 82d0802da680 Jul 9 08:38:43.341040 (XEN)0003 00ae2000 0021f000 Jul 9 08:38:43.349033 (XEN)83085d70 0011 00021d02e000 Jul 9 08:38:43.357026 (XEN)0015024d 83085f00 83085fb0 82d5 Jul 9 08:38:43.365037 (XEN)0008 0001006e 0003 02f8 Jul 9 08:38:43.373026 (XEN) Jul 9 08:38:43.381037 (XEN) 82d080100073 Jul 9 08:38:43.389028 (XEN) Jul 9 08:38:43.397035 (XEN) Jul 9 08:38:43.405039 (XEN) Jul 9 08:38:43.413035 (XEN) Jul 9 08:38:43.421040 (XEN) Jul 9 08:38:43.429040 (XEN) Jul 9 08:38:43.437028 (XEN) 8300dfd4d000 Jul 9 08:38:43.445033 (XEN) Jul 9 08:38:43.453008 (XEN) Xen call trace: Jul 9 08:38:43.453038 (XEN)[82d08019c044] set_cpu_sibling_map+0x50/0x391 Jul 9 08:38:43.461024 (XEN)[82d0802dae3e] smp_prepare_cpus+0x12e/0x254 Jul 9 08:38:43.469122 (XEN)[82d0802da680] __start_xen+0x20b4/0x265c Jul 9 08:38:43.477198 (XEN)[82d080100073] __high_start+0x53/0x58 Jul 9 08:38:43.485020 (XEN) Jul 9 08:38:43.485048 (XEN) Pagetable walk from : Jul 9 08:38:43.493009 (XEN) L4[0x000] = 000219036063 Jul 9 08:38:43.493043 (XEN) L3[0x000] = 000219035063 Jul 9 08:38:43.501045 (XEN) L2[0x000] = 000219034063 Jul 9 08:38:43.509034 (XEN) L1[0x000] = Jul 9 08:38:43.517005 (XEN) Jul 9 08:38:43.517030 (XEN) Jul 9 08:38:43.525030 (XEN) Panic on CPU 0: Jul 9 08:38:43.525060 (XEN) FATAL PAGE FAULT Jul 9 08:38:43.525086 (XEN) [error_code=0002] Jul 9 08:38:43.533038 (XEN) Faulting linear address: Jul 9 08:38:43.541011 (XEN) Jul 9 08:38:43.541043 (XEN) Jul 9 08:38:43.548990 (XEN) Reboot in five seconds... ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling
On Thu, 2015-07-09 at 14:44 +0100, Jan Beulich wrote: On 09.07.15 at 14:53, george.dun...@eu.citrix.com wrote: Consider the following scenario: - v1 blocks on pcpu 0. - vcpu_runstate_change() will do everything necessary for v1 on p0. - The scheduler does load balancing and moves v1 to p1, calling vcpu_migrate(). Because the vcpu is still blocked, vcpu_runstate_change() is not called. - A device interrupt is generated. What happens to the interrupt? Does everything still work properly, or will the device wake-up interrupt go to the wrong pcpu (p0 rather than p1)? I think much of this was discussed before, since I also disliked the hooking into vcpu_runstate_change(). What I remember having been told is that it really only matters which pCPU's list a vCPU is on, not what v-processor says. Right. But, as far as I could understand from the patches I've seen, a vcpu ends up in a list when it blocks, and when it blocks there will be a context switch, and hence we can deal with the queueing during the the context switch itself (which is, in part, an arch specific operation already). What am I missing? Maybe (looking at vmx_pi_desc_update() in patch 14), something that is right now dealt with by means of RUNSTATE_offline? What (if yes)? Can (if yes) it be explained in some non rusntate-based way, so we can judge whether that could also be achieved differently than by actually hooking in vcpu_runstate_change()? Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry
On 09.07.15 at 16:00, paul.durr...@citrix.com wrote: So, the only significant difference I can see is that using a signed value causes an extra cltq instruction to be used. Or maybe I need some specific level of optimization for there to be a big difference? No, this is the difference I'm trying to point out (and if the index register wasn't %rax, it would be a movslq or whatever its ATT name is). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling
On 07/09/2015 03:18 PM, Dario Faggioli wrote: On Thu, 2015-07-09 at 14:44 +0100, Jan Beulich wrote: On 09.07.15 at 14:53, george.dun...@eu.citrix.com wrote: Consider the following scenario: - v1 blocks on pcpu 0. - vcpu_runstate_change() will do everything necessary for v1 on p0. - The scheduler does load balancing and moves v1 to p1, calling vcpu_migrate(). Because the vcpu is still blocked, vcpu_runstate_change() is not called. - A device interrupt is generated. What happens to the interrupt? Does everything still work properly, or will the device wake-up interrupt go to the wrong pcpu (p0 rather than p1)? I think much of this was discussed before, since I also disliked the hooking into vcpu_runstate_change(). What I remember having been told is that it really only matters which pCPU's list a vCPU is on, not what v-processor says. Right. But, as far as I could understand from the patches I've seen, a vcpu ends up in a list when it blocks, and when it blocks there will be a context switch, and hence we can deal with the queueing during the the context switch itself (which is, in part, an arch specific operation already). What am I missing? I think what you're missing is that Jan is answering my question about migrating a blocked vcpu, not arguing that vcpu_runstate_change() is the right way to go. At least that's how I understood him. :-) But regarding context_switch: I think the reason we need more hooks than that is that context_switch only changes into and out of running state. There are also changes that need to happen when you change from blocked to offline, offline to blocked, blocked to runnable, c; these don't go through context_switch. That's why I was suggesting some architectural equivalents to the SCHED_OP() callbacks to be added to vcpu_wake c. vcpu_runstate_change() is at the moment a nice quiet cul-de-sac that just does a little bit of accounting; I'd rather not have it suddenly become a major thoroughfare for runstate change hooks, if we can avoid it. :-) -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 11/13] x86/altp2m: define and implement alternate p2m HVMOP types.
On 01.07.15 at 20:09, edmund.h.wh...@intel.com wrote: +case HVMOP_altp2m_set_domain_state: +{ +struct xen_hvm_altp2m_domain_state a; +struct domain *d; +struct vcpu *v; +bool_t ostate; + +if ( copy_from_guest(a, arg, 1) ) +return -EFAULT; + +d = rcu_lock_domain_by_any_id(a.domid); +if ( d == NULL ) +return -ESRCH; Is it indeed intended for a guest to enable this on itself? +rc = -EINVAL; +if ( is_hvm_domain(d) hvm_altp2m_supported() + !nestedhvm_enabled(d) ) +{ +ostate = d-arch.altp2m_active; +d-arch.altp2m_active = !!a.state; + +rc = 0; + +/* If the alternate p2m state has changed, handle appropriately */ +if ( d-arch.altp2m_active != ostate ) +{ +if ( ostate || !(rc = p2m_init_altp2m_by_id(d, 0)) ) +{ Two if()-s like these should be combined into one. +case HVMOP_altp2m_vcpu_enable_notify: +{ +struct domain *curr_d = current-domain; +struct vcpu *curr = current; The other way around please. +case HVMOP_altp2m_set_mem_access: +{ +struct xen_hvm_altp2m_set_mem_access a; +struct domain *d; + +if ( copy_from_guest(a, arg, 1) ) +return -EFAULT; + +d = rcu_lock_domain_by_any_id(a.domid); +if ( d == NULL ) +return -ESRCH; Again - is it intended for this to be invokable by the guest for itself? If so, is it being made certain that it can't increase access permissions beyond what its controlling domain dictates? --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -396,6 +396,75 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t); #endif /* defined(__i386__) || defined(__x86_64__) */ +/* Set/get the altp2m state for a domain */ +#define HVMOP_altp2m_set_domain_state 24 +#define HVMOP_altp2m_get_domain_state 25 +struct xen_hvm_altp2m_domain_state { +/* Domain to be updated or queried */ +domid_t domid; +/* IN or OUT variable on/off */ +uint8_t state; +}; Not sure if it was said before - explicit padding please (with padding fields verified to be zero for future extensibility). +struct xen_hvm_altp2m_view { +/* Domain to be updated */ +domid_t domid; +/* IN/OUT variable */ +uint16_t view; Is it certain that the number of views will never exceed 64k? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] x86: correct socket_cpumask allocation
On Thu, Jul 09, 2015 at 10:41:55AM +0100, Jan Beulich wrote: On 09.07.15 at 10:26, chao.p.p...@linux.intel.com wrote: @@ -748,8 +758,9 @@ static int cpu_smpboot_alloc(unsigned int cpu) goto oom; per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu); -if ( !socket_cpumask[socket] - !zalloc_cpumask_var(socket_cpumask + socket) ) +if ( secondary_socket_cpumask == NULL + (secondary_socket_cpumask = _xzalloc(nr_cpumask_bits / 8, + sizeof(long))) == NULL ) This is horrible since completely type-unsafe, and correct only because _xmalloc() happens to allocate more space than requested if the size isn't a multiple of MEM_ALIGN. And it makes me realize why on IRC I first suggested xzalloc_array(): That would at least have taken care of that latent bug. And remember that I did _not_ suggest _xzalloc(), but xzalloc(). Taken together I think we should stay with using zalloc_cpumask_var(), and introduce zap_cpumask_var() (storing NULL in the big NR_CPUS case and doing nothing in the small one). Apart from zap_cpumask_var() there is need to check if cpumask_vat_t is NULL as well. While that is weird to satisfy compiler for small NR_CPUS case. Should I be overlooking something that still prevents this from building in both cases, the above allocation should be changed to at least be type safe (and I guess I'd rather waste a few bytes here than see you add fragile casts or some such). So this solution is finally adopted. The new version is already sent out. Chao ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [ovmf test] 59220: regressions - FAIL
flight 59220 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/59220/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-ovmf-amd64 17 guest-start.2 fail REGR. vs. 59179 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 59179 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 59179 version targeted for testing: ovmf 51dc120608129b652a1fdbb3f0d9c9a7f8a4b748 baseline version: ovmf e109e3fec9245e3fa777802d17e00431e349a6bd Last test of basis59179 2015-07-08 01:15:55 Z1 days Testing same since59220 2015-07-08 19:32:31 Z0 days1 attempts People who touched revisions under test: Bruce Cran br...@cran.org.uk Feng Tian feng.t...@intel.com Hess Chen hesheng.c...@intel.com Jeff Fan jeff@intel.com jiaxinwu jiaxin...@intel.com Olivier Martin olivier.mar...@arm.com Qiu Shumin shumin@intel.com Star Zeng star.z...@intel.com Zhang Lubo lubo.zh...@intel.com jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 fail test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass test-amd64-amd64-xl-qemuu-winxpsp3 pass test-amd64-i386-xl-qemuu-winxpsp3pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 355 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: @@ -424,8 +426,22 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val) } } -static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size) +static int stdvga_mem_write(const struct hvm_io_handler *handler, +uint64_t addr, uint32_t size, +uint64_t data) { +struct hvm_hw_stdvga *s = current-domain-arch.hvm_domain.stdvga; +ioreq_t p = { .type = IOREQ_TYPE_COPY, + .addr = addr, + .size = size, + .count = 1, + .dir = IOREQ_WRITE, + .data = data, +}; Indentation (still - I know I pointed this out on v6, just perhaps at another example). See e.g. the context of the earlier change to the beginning of hvm_mmio_internal() in this patch for how this should look like. -int stdvga_intercept_mmio(ioreq_t *p) +static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler, +const ioreq_t *p) { -struct domain *d = current-domain; -struct hvm_hw_stdvga *s = d-arch.hvm_domain.stdvga; -uint64_t start, end, addr = p-addr, count = p-count, size = p-size; -int buf = 0, rc; - -if ( unlikely(p-df) ) -{ -start = (addr - (count - 1) * size); -end = addr + size; -} -else -{ -start = addr; -end = addr + count * size; -} - -if ( (start VGA_MEM_BASE) || (end (VGA_MEM_BASE + VGA_MEM_SIZE)) ) -return X86EMUL_UNHANDLEABLE; - -if ( size 8 ) -{ -gdprintk(XENLOG_WARNING, invalid mmio size %d\n, (int)p-size); -return X86EMUL_UNHANDLEABLE; -} +struct hvm_hw_stdvga *s = current-domain-arch.hvm_domain.stdvga; spin_lock(s-lock); -if ( s-stdvga s-cache ) +if ( !s-stdvga || + (hvm_mmio_first_byte(p) VGA_MEM_BASE) || + (hvm_mmio_last_byte(p) (VGA_MEM_BASE + VGA_MEM_SIZE)) ) If last means what is says, you need = here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable test] 59207: regressions - FAIL
flight 59207 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59207/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-pvh-amd 6 xen-boot fail REGR. vs. 58965 test-amd64-amd64-xl-xsm 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-xl-xsm6 xen-boot fail REGR. vs. 58965 test-amd64-i386-rumpuserxen-i386 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-qemut-rhel6hvm-amd 6 xen-bootfail REGR. vs. 58965 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-qemuu-rhel6hvm-amd 6 xen-bootfail REGR. vs. 58965 test-amd64-i386-xl-qemut-win7-amd64 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 58965 test-amd64-i386-freebsd10-amd64 6 xen-boot fail REGR. vs. 58965 test-armhf-armhf-xl-multivcpu 15 guest-start/debian.repeat fail REGR. vs. 58965 test-amd64-amd64-xl-qemut-debianhvm-amd64 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-xl-qemut-debianhvm-amd64 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-xl-qemuu-ovmf-amd64 6 xen-boot fail REGR. vs. 58965 test-amd64-amd64-xl-qemuu-win7-amd64 9 windows-install fail REGR. vs. 58965 test-amd64-i386-xl-qemut-winxpsp3 6 xen-boot fail REGR. vs. 58965 test-amd64-amd64-xl-qemuu-winxpsp3 6 xen-bootfail REGR. vs. 58965 Regressions which are regarded as allowable (not blocking): test-amd64-i386-libvirt-xsm 6 xen-boot fail REGR. vs. 58965 test-amd64-i386-libvirt 6 xen-boot fail REGR. vs. 58965 test-amd64-amd64-libvirt 6 xen-boot fail REGR. vs. 58965 test-amd64-amd64-libvirt-xsm 6 xen-boot fail REGR. vs. 58965 test-amd64-amd64-xl-rtds 6 xen-boot fail REGR. vs. 58965 test-armhf-armhf-xl-rtds 11 guest-start fail like 58965 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58965 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 58965 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass version targeted for testing: xen 89483dc2b3eca29b9b104019bda1883c08150dc9 baseline version: xen c40317f11b3f05e7c06a2213560c8471081f2662 Last test of basis58965 2015-06-29 02:08:30 Z 10 days Failing since 58974 2015-06-29 15:11:59 Z9 days 10 attempts Testing same since59207 2015-07-08 13:02:00 Z1 days1 attempts People who touched revisions under test: Andrew Cooper andrew.coop...@citrix.com Anthony PERARD anthony.per...@citrix.com Ard Biesheuvel a...@linaro.org Boris Ostrovsky boris.ostrov...@oracle.com Chao Peng chao.p.p...@linux.intel.com Chen Baozi baoz...@gmail.com Daniel De Graaf dgde...@tycho.nsa.gov Dario Faggioli dario.faggi...@citrix.com Euan Harris euan.har...@citrix.com Feng Wu feng...@intel.com George Dunlap george.dun...@eu.citrix.com Ian Campbell ian,campb...@citrix.com Ian Campbell ian.campb...@citrix.com Ian Jackson ian.jack...@eu.citrix.com Jan Beulich jbeul...@suse.com Jennifer Herbert jennifer.herb...@citrix.com Juergen Gross jgr...@suse.com Julien Grall julien.gr...@citrix.com Julien Grall julien.gr...@linaro.org Liang Li liang.z...@intel.com Paul Durrant paul.durr...@citrix.com Roger Pau Monné roger@citrix.com Samuel Thibault samuel.thiba...@ens-lyon.org Thomas Leonard tal...@gmail.com Tiejun Chen tiejun.c...@intel.com Wei Liu wei.l...@citrix.com Wen Congyang we...@cn.fujitsu.com Yang Zhang yang.z.zh...@intel.com jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64
Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m
On 09.07.15 at 13:49, wei.l...@citrix.com wrote: As far as I can tell, there are several issues here with regard to this patch series on hypervisor side: 1. Patch #1 is neither acked nor reviewed. The patch is obvious and straightforward - the only reason I didn't commit it yet is that it would be dead code right now. 3. Patch #7 has issue with regard to instruction emulator. Those would seem to be easily fixed. 4. Patch #8 in this version (#7 in previous version, v2) is reviewed, but I'm not sure if the concern raised in v2 went away. Judging from the date of v3 submission and date of issue raised on v2, the issue is still there. No, the concern did not go away, but George actually had provided a patch to address the concern. 5. Patch #11 is not acked, and because it's ABI matter we need to be very careful about it. Indeed. But directions were given, so it should be mostly a mechanical thing. Question for hypervisor maintainers, how much risk do you think this series impose? If this feature is off, is the code path more or less the same of before? It looks like it mostly is, so I don't view the risk as too high. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 2/2] mg-all-branch-statuses: Show how up to date each branch is
Using report_find_push_age_info allows us to provide counts of attempts since the last baseline on current tip as well as the first attempt of each of those. Since everything serialises on the repo lock I didn't bother trying to parallelise anything. It's a little terse to keep it in 80 chars. cron infrastructure is added to run this in a new $HOME/for-maintjobs.git on a daily basis and push the output to the location given in the Publish config variable, if it is set (it is set in production but not for cambridge). Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v2: Use new report_find_push_age_info functionality, output condensed. v3: - Correctly quote parameters to printf, so empty ones don't misalign the rest. - Drop dates in favour of number of days, and n/a in those cases - Print Error! if no tip is available and UpToDate if tip==basis. v4: - Simplify $onevar, dropping $dflt, dropping export, subsuming $oneflightvar. - Right justify some stuff - s/UpToDate/same/ - Simplify some pointless or complex shell conditionals v5: - Add cron infrastructure v6: - Propagate cron output to logs (untested apart from bash -n due to lack of local infra). v7: - Use cr-publish-flight-logs. v8: - Output to $c{Results} time with an recently updates set of Repos says: 57.34user 28.28system 5:34.16elapsed 25%CPU (0avgtext+0avgdata 47256maxresident)k 100216inputs+600outputs (673major+2332436minor)pagefaults 0swaps So it's not quick... Example output: Branch BasisTip #Tip #Tot 1stTip 1stNew libvirt d10a5f58 285407940 11 n/a 9 days linux-3.0e1c63f9f 5dba9ddd29 3 days 870 days linux-3.10 b3d78448 same linux-3.14 762167f9 same linux-3.16 26749e75 same linux-3.18 d048c068 ea5dd38e55 3 days 3 days linux-3.4bb4a05a0 cf1b3dad 17 16312 days 213 days linux-4.1b953c0d2 6a010c0a00 n/an/a linux-arm-xen64972ceb same linux-linus 6aaf0da8 9bdc771f01 n/a 1 day linux-mingo-tip-master d935d0f7 21316d690 16 n/a 1175 days linux-nexte58c24350 219 n/a448 days osstest 15d2dd50 Error! 0- n/an/a ovmf 3735017c 0bb5d7a501 n/a 0 days qemu-mainlined2966f80 3536064201 n/a 0 days qemu-upstream-4.2-testingd2382550 same qemu-upstream-4.3-testingefae5e0f same qemu-upstream-4.4-testing32226f42 same qemu-upstream-4.5-testingd9552b0a same qemu-upstream-unstable c4a962ec same rumpuserxen 30d72f3f 3b91e449 62 11478 days 150 days seabios f24eb2f8 6cfebb4e00 n/an/a xen-4.0-testing 2692df2a same xen-4.1-testing 40feff87 same xen-4.2-testing 38fcda22 same xen-4.3-testing e7c02297 same xen-4.4-testing 6c1cb3db same xen-4.5-testing e3bd3cef same xen-unstable c40317f1 9379af0805 n/a 3 days --- cr-all-branch-statuses | 32 + crontab| 1 + crontab-cambridge | 1 + mg-all-branch-statuses | 120 + 4 files changed, 154 insertions(+) create mode 100755 cr-all-branch-statuses create mode 100755 mg-all-branch-statuses diff --git a/cr-all-branch-statuses b/cr-all-branch-statuses new file mode 100755 index 000..f9885db --- /dev/null +++ b/cr-all-branch-statuses @@ -0,0 +1,32 @@ +#!/bin/bash + +# This is part of osstest, an automated testing framework for Xen. +# Copyright (C) 2015 Citrix Inc. +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + +set -ex + +. cri-args-hostlists +branch=$1; shift + +check_stop all-branch-statuses. + +results=`getconfig Results` + +abs=$results/all-branch-statuses.txt +( date ; echo ; bash -x ./mg-all-branch-statuses ) $abs.new +mv $abs.new $abs + +./cr-publish-flight-logs diff --git a/crontab b/crontab index 5dc8675..436645b 100644 --- a/crontab +++ b/crontab @@ -7,4 +7,5 @@
Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain
On 07/09/2015 10:17 AM, Jan Beulich wrote: On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote: On 07/09/2015 03:02 AM, Jan Beulich wrote: On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote: As I started to update the patches I realized that in some cases (especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the domain. d-vcpu[0] should work. Otherwise I'll either need a new field in struct domain or wrap has_32bit_shinfo into something PVH-specific, like is_32bit_pvh_vcpu(). Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like for PVH, especially if you also intend the tools to use the 64-bit guest context variant even for 32-bit PVH? Once again - are you intending to prohibit 32-bit PVH switching to 64-bit mode (which would seem both wrong and possibly cumbersome to me)? With current PVH implementation I don't think we can switch. We are starting the guest in very much PV-like fashion. That's why we are getting into switch_compat() --- via XEN_DOMCTL_set_address_size. For XEN_DOMCTL_get_address_size specifically we need to be able to figure out the mode *before* the guest is running because we use it to set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't change the mode. Okay - but is there code (being put) in place to refuse switch attempts? No, I should add code to deal with this. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-unstable test] 59207: regressions - FAIL
On 09.07.15 at 16:07, ian.jack...@eu.citrix.com wrote: osstest service owner writes ([xen-unstable test] 59207: regressions - FAIL): flight 59207 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/59207/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: ... test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot \ fail REGR. vs. 58965 Xen crash in set_cpu_sibling_map AFAICT. I'm expecting a v3 of the patch any minute. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH OSSTEST v8 1/2] cr-publish-flight-logs: allow publishing only the Results
Removing the requirement that a flight be provided allows this to be used to publish the results directory even in contexts which have no flight. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v7: New patch v8: No need for a list of files, just allow to omit publishing flight. Therefore flight becomes optional. Renamed from : cr-publish-flight-logs: support publishing files from $HOME/public_html --- cr-publish-flight-logs | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/cr-publish-flight-logs b/cr-publish-flight-logs index 94c74c8..8a19d88 100755 --- a/cr-publish-flight-logs +++ b/cr-publish-flight-logs @@ -31,10 +31,10 @@ if (@ARGV $ARGV[0] eq '--push-harness') { $push_harness = 1; } -die usage: ./cr-publish-flight-logs flight unless @ARGV==1; +my $flight= shift @ARGV // ''; +die unless $flight =~ m/^\d*$/; -my $flight= shift @ARGV; -die unless $flight =~ m/^\d+$/; +die usage: ./cr-publish-flight-logs [flight] unless @ARGV==0; open LOCK, $c{GlobalLockDir}/publish-lock or die $!; flock LOCK, LOCK_EX or die $!; @@ -59,5 +59,6 @@ sub copydir ($$) { $!=0; $?=0; system @cmd; die rsync $? $! if $? or $!; } -copydir($c{Logs}/$flight/, $c{LogsPublish}/$flight) if $c{LogsPublish}; +copydir($c{Logs}/$flight/, $c{LogsPublish}/$flight) +if $flight $c{LogsPublish}; copydir($c{Results}/, $c{ResultsPublish}) if $c{ResultsPublish}; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v8 1/2] cr-publish-flight-logs: allow publishing only the Results
Ian Campbell writes ([PATCH OSSTEST v8 1/2] cr-publish-flight-logs: allow publishing only the Results): Removing the requirement that a flight be provided allows this to be used to publish the results directory even in contexts which have no flight. Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling
On Thu, 2015-07-09 at 15:27 +0100, George Dunlap wrote: On 07/09/2015 03:18 PM, Dario Faggioli wrote: On Thu, 2015-07-09 at 14:44 +0100, Jan Beulich wrote: On 09.07.15 at 14:53, george.dun...@eu.citrix.com wrote: Consider the following scenario: - v1 blocks on pcpu 0. - vcpu_runstate_change() will do everything necessary for v1 on p0. - The scheduler does load balancing and moves v1 to p1, calling vcpu_migrate(). Because the vcpu is still blocked, vcpu_runstate_change() is not called. - A device interrupt is generated. What happens to the interrupt? Does everything still work properly, or will the device wake-up interrupt go to the wrong pcpu (p0 rather than p1)? I think much of this was discussed before, since I also disliked the hooking into vcpu_runstate_change(). What I remember having been told is that it really only matters which pCPU's list a vCPU is on, not what v-processor says. Right. But, as far as I could understand from the patches I've seen, a vcpu ends up in a list when it blocks, and when it blocks there will be a context switch, and hence we can deal with the queueing during the the context switch itself (which is, in part, an arch specific operation already). What am I missing? I think what you're missing is that Jan is answering my question about migrating a blocked vcpu, not arguing that vcpu_runstate_change() is the right way to go. At least that's how I understood him. :-) Eheh, no, I got that... I just wanted to take one more chance to try to get an answer (from Feng, mainly, but from anyone who knows or has an idea, in reality! :-D) on why another approach is not possible. Sorry for abusing the reply for my own mean purposes. :-D But regarding context_switch: I think the reason we need more hooks than that is that context_switch only changes into and out of running state. Sure. There are also changes that need to happen when you change from blocked to offline, offline to blocked, blocked to runnable, c; these don't go through context_switch. Right, those ones. And about them, I could say for the third time that having a non-runstate biased description of what we need would help to understand where to do things, whether there are suitable spots already or we need to add new ones, etc But I guess I don't want to be so repetitive! :-PP That's why I was suggesting some architectural equivalents to the SCHED_OP() callbacks to be added to vcpu_wake c. Yep. If (or at least for those of which) really there is no other place already, adding these would be TheRightThing for me as well. vcpu_runstate_change() is at the moment a nice quiet cul-de-sac that just does a little bit of accounting; Indeed! I'd rather not have it suddenly become a major thoroughfare for runstate change hooks, if we can avoid it. :-) Yep, that was my understanding of it, and my idea too. Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 3/4] Convert map_domain_page() to use the new mfn_t type
Reworked the internals and declaration, applying (un)boxing where needed. Converted calls to map_domain_page() to provide mfn_t types, boxing where needed. Signed-off-by: Ben Catterall ben.catter...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- Changed since v1: * Created paddr_to_mfn() and mfn_to_paddr() for both x86 and ARM * Converted code to use the new paddr_to_mfn() rather than e.g. paddrPAGE_SHIFT Changed since v2: * Switch to using paddr_to_pfn() and pfn_to_addr(). * Removed paddr_to_mfn() and mfn_to_paddr() * Added missing blank line --- xen/arch/arm/domain_build.c | 2 +- xen/arch/arm/kernel.c | 2 +- xen/arch/arm/mm.c | 12 +- xen/arch/arm/p2m.c| 4 ++-- xen/arch/arm/traps.c | 4 ++-- xen/arch/x86/debug.c | 10 xen/arch/x86/domain.c | 4 ++-- xen/arch/x86/domain_build.c | 10 xen/arch/x86/domain_page.c| 22 - xen/arch/x86/domctl.c | 2 +- xen/arch/x86/mm.c | 40 +++ xen/arch/x86/mm/guest_walk.c | 2 +- xen/arch/x86/mm/hap/guest_walk.c | 2 +- xen/arch/x86/mm/mem_sharing.c | 4 ++-- xen/arch/x86/mm/p2m-ept.c | 22 - xen/arch/x86/mm/p2m-pod.c | 8 +++ xen/arch/x86/mm/p2m-pt.c | 28 +++--- xen/arch/x86/mm/p2m.c | 2 +- xen/arch/x86/mm/paging.c | 32 - xen/arch/x86/mm/shadow/common.c | 2 +- xen/arch/x86/mm/shadow/multi.c| 4 ++-- xen/arch/x86/mm/shadow/private.h | 2 +- xen/arch/x86/smpboot.c| 2 +- xen/arch/x86/tboot.c | 5 ++-- xen/arch/x86/traps.c | 12 +- xen/arch/x86/x86_64/mm.c | 14 +-- xen/arch/x86/x86_64/traps.c | 10 xen/arch/x86/x86_emulate.c| 10 xen/common/grant_table.c | 4 ++-- xen/common/kexec.c| 4 ++-- xen/common/kimage.c | 10 xen/common/memory.c | 6 ++--- xen/common/tmem_xen.c | 6 ++--- xen/drivers/passthrough/amd/iommu_guest.c | 10 xen/drivers/passthrough/amd/iommu_map.c | 14 +-- xen/drivers/passthrough/vtd/x86/vtd.c | 2 +- xen/include/asm-x86/hap.h | 2 +- xen/include/asm-x86/page.h| 7 +++--- xen/include/asm-x86/paging.h | 2 +- xen/include/xen/domain_page.h | 8 +++ 40 files changed, 175 insertions(+), 173 deletions(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 8556afd..a059de6 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -1408,7 +1408,7 @@ static void initrd_load(struct kernel_info *kinfo) return; } -dst = map_domain_page(maPAGE_SHIFT); +dst = map_domain_page(_mfn(paddr_to_pfn(ma))); copy_from_paddr(dst + s, paddr + offs, l); diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c index 209c3dd..f641b12 100644 --- a/xen/arch/arm/kernel.c +++ b/xen/arch/arm/kernel.c @@ -182,7 +182,7 @@ static void kernel_zimage_load(struct kernel_info *info) return; } -dst = map_domain_page(maPAGE_SHIFT); +dst = map_domain_page(_mfn(paddr_to_pfn(ma))); copy_from_paddr(dst + s, paddr + offs, l); diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index d479048..ae0f34c 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -213,7 +213,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr, else root_table = 0; -mapping = map_domain_page(root_pfn + root_table); +mapping = map_domain_page(_mfn(root_pfn + root_table)); for ( level = root_level; ; level++ ) { @@ -230,7 +230,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr, /* For next iteration */ unmap_domain_page(mapping); -mapping = map_domain_page(pte.walk.base); +mapping = map_domain_page(_mfn(pte.walk.base)); } unmap_domain_page(mapping); @@ -282,11 +282,11 @@ void unmap_domain_page_global(const void *va) } /* Map a page of domheap memory */ -void *map_domain_page(unsigned long mfn) +void *map_domain_page(mfn_t mfn) { unsigned long flags; lpae_t *map = this_cpu(xen_dommap); -unsigned long slot_mfn = mfn ~LPAE_ENTRY_MASK; +unsigned long slot_mfn = mfn_x(mfn) ~LPAE_ENTRY_MASK; vaddr_t va; lpae_t pte; int i, slot; @@ -339,7 +339,7 @@ void *map_domain_page(unsigned long mfn) va = (DOMHEAP_VIRT_START + (slot SECOND_SHIFT) - +
[Xen-devel] [PATCH v3 1/4] xen/domain_page: Convert map_domain_page_global() to using mfn_t
From: Andrew Cooper andrew.coop...@citrix.com The sh_map/unmap wrappers can be dropped, and take the opportunity to turn some #define's into static inlines, for added type saftey. As part of adding the type safety, GCC highlights an problematic include cycle with arm/mm.h including domain_page.h which includes xen/mm.h and falls over __page_to_mfn being used before being declared. Simply dropping the inclusion of domain_page.h fixes the compilation issue. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Jan Beulich jbeul...@suse.com CC: Tim Deegan t...@xen.org CC: Ian Campbell ian.campb...@citrix.com CC: Stefano Stabellini stefano.stabell...@citrix.com --- Changed since v2 * (un)map_domain_page_global() now take ptr-to-const Reviewed-by: Jan Beulich jbeul...@suse.com For ARM: Acked-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/mm.c| 6 ++ xen/arch/x86/domain_page.c | 9 - xen/arch/x86/mm/shadow/multi.c | 10 +- xen/arch/x86/mm/shadow/private.h | 12 xen/include/asm-arm/mm.h | 1 - xen/include/xen/domain_page.h| 22 +- 6 files changed, 28 insertions(+), 32 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index ff1b330..d479048 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -271,11 +271,9 @@ void clear_fixmap(unsigned map) } #ifdef CONFIG_DOMAIN_PAGE -void *map_domain_page_global(unsigned long mfn) +void *map_domain_page_global(mfn_t mfn) { -mfn_t m = _mfn(mfn); - -return vmap(m, 1); +return vmap(mfn, 1); } void unmap_domain_page_global(const void *va) diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index d684b2f..0f7548b 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -302,17 +302,16 @@ int mapcache_vcpu_init(struct vcpu *v) return 0; } -void *map_domain_page_global(unsigned long mfn) +void *map_domain_page_global(mfn_t mfn) { -mfn_t m = _mfn(mfn); ASSERT(!in_irq() local_irq_is_enabled()); #ifdef NDEBUG -if ( mfn = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) -return mfn_to_virt(mfn); +if ( mfn_x(mfn) = PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) +return mfn_to_virt(mfn_x(mfn)); #endif -return vmap(m, 1); +return vmap(mfn, 1); } void unmap_domain_page_global(const void *ptr) diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 4058a9c..19644d2 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -3806,7 +3806,7 @@ sh_detach_old_tables(struct vcpu *v) if ( v-arch.paging.shadow.guest_vtable ) { if ( shadow_mode_external(d) || shadow_mode_translate(d) ) -sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); +unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); v-arch.paging.shadow.guest_vtable = NULL; } #endif // !NDEBUG @@ -3977,8 +3977,8 @@ sh_update_cr3(struct vcpu *v, int do_locking) if ( shadow_mode_external(d) || shadow_mode_translate(d) ) { if ( v-arch.paging.shadow.guest_vtable ) -sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); -v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn); +unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); +v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn); /* PAGING_LEVELS==4 implies 64-bit, which means that * map_domain_page_global can't fail */ BUG_ON(v-arch.paging.shadow.guest_vtable == NULL); @@ -4010,8 +4010,8 @@ sh_update_cr3(struct vcpu *v, int do_locking) if ( shadow_mode_external(d) || shadow_mode_translate(d) ) { if ( v-arch.paging.shadow.guest_vtable ) -sh_unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); -v-arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn); +unmap_domain_page_global(v-arch.paging.shadow.guest_vtable); +v-arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn); /* Does this really need map_domain_page_global? Handle the * error properly if so. */ BUG_ON(v-arch.paging.shadow.guest_vtable == NULL); /* XXX */ diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h index f72ea9f..eff39dc 100644 --- a/xen/arch/x86/mm/shadow/private.h +++ b/xen/arch/x86/mm/shadow/private.h @@ -517,18 +517,6 @@ sh_unmap_domain_page(void *p) unmap_domain_page(p); } -static inline void * -sh_map_domain_page_global(mfn_t mfn) -{ -return map_domain_page_global(mfn_x(mfn)); -} - -static inline void -sh_unmap_domain_page_global(void *p) -{ -unmap_domain_page_global(p); -} - /**/ /* Shadow-page refcounting. */ diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index
[Xen-devel] [PATCH v3 4/4] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()
Removed as they were wrappers around map_domain_page() to make it appear to take an mfn_t type. Signed-off-by: Ben Catterall ben.catter...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Reviewed-by: Tim Deegan t...@xen.org --- xen/arch/x86/mm/hap/hap.c| 4 +- xen/arch/x86/mm/shadow/common.c | 22 +++--- xen/arch/x86/mm/shadow/multi.c | 152 +++ xen/arch/x86/mm/shadow/private.h | 13 xen/include/asm-x86/hap.h| 15 5 files changed, 89 insertions(+), 117 deletions(-) diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index d0d3f1e..63980af 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -395,7 +395,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, mfn_t l4mfn) struct domain *d = v-domain; l4_pgentry_t *l4e; -l4e = hap_map_domain_page(l4mfn); +l4e = map_domain_page(l4mfn); /* Copy the common Xen mappings from the idle domain */ memcpy(l4e[ROOT_PAGETABLE_FIRST_XEN_SLOT], @@ -411,7 +411,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, mfn_t l4mfn) l4e[l4_table_offset(LINEAR_PT_VIRT_START)] = l4e_from_pfn(mfn_x(l4mfn), __PAGE_HYPERVISOR); -hap_unmap_domain_page(l4e); +unmap_domain_page(l4e); } static mfn_t hap_make_monitor_table(struct vcpu *v) diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c index c36ffeb..6574206 100644 --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -781,11 +781,11 @@ static void oos_hash_add(struct vcpu *v, mfn_t gmfn) if ( swap ) SWAP(oos_snapshot[idx], oos_snapshot[oidx]); -gptr = sh_map_domain_page(oos[oidx]); -gsnpptr = sh_map_domain_page(oos_snapshot[oidx]); +gptr = map_domain_page(oos[oidx]); +gsnpptr = map_domain_page(oos_snapshot[oidx]); memcpy(gsnpptr, gptr, PAGE_SIZE); -sh_unmap_domain_page(gptr); -sh_unmap_domain_page(gsnpptr); +unmap_domain_page(gptr); +unmap_domain_page(gsnpptr); } /* Remove an MFN from the list of out-of-sync guest pagetables */ @@ -1498,7 +1498,7 @@ mfn_t shadow_alloc(struct domain *d, p = __map_domain_page(sp); ASSERT(p != NULL); clear_page(p); -sh_unmap_domain_page(p); +unmap_domain_page(p); INIT_PAGE_LIST_ENTRY(sp-list); page_list_add(sp, tmp_list); sp-u.sh.type = shadow_type; @@ -2524,7 +2524,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, mfn_t smfn) if (sp-up == 0) return 0; pmfn = _mfn(sp-up PAGE_SHIFT); ASSERT(mfn_valid(pmfn)); -vaddr = sh_map_domain_page(pmfn); +vaddr = map_domain_page(pmfn); ASSERT(vaddr); vaddr += sp-up (PAGE_SIZE-1); ASSERT(l1e_get_pfn(*(l1_pgentry_t *)vaddr) == mfn_x(smfn)); @@ -2554,7 +2554,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, mfn_t smfn) default: BUG(); /* Some wierd unknown shadow type */ } -sh_unmap_domain_page(vaddr); +unmap_domain_page(vaddr); if ( rc ) perfc_incr(shadow_up_pointer); else @@ -3028,7 +3028,7 @@ int shadow_enable(struct domain *d, u32 mode) e[i] = ((0x40U * i) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); -sh_unmap_domain_page(e); +unmap_domain_page(e); pg-u.inuse.type_info = PGT_l2_page_table | 1 | PGT_validated; } @@ -3631,8 +3631,8 @@ int shadow_track_dirty_vram(struct domain *d, if ( sl1mfn != map_mfn ) { if ( map_sl1p ) -sh_unmap_domain_page(map_sl1p); -map_sl1p = sh_map_domain_page(_mfn(sl1mfn)); +unmap_domain_page(map_sl1p); +map_sl1p = map_domain_page(_mfn(sl1mfn)); map_mfn = sl1mfn; } sl1e = map_sl1p + (sl1ma ~PAGE_MASK); @@ -3663,7 +3663,7 @@ int shadow_track_dirty_vram(struct domain *d, } if ( map_sl1p ) -sh_unmap_domain_page(map_sl1p); +unmap_domain_page(map_sl1p); memcpy(dirty_bitmap, dirty_vram-dirty_bitmap, dirty_size); memset(dirty_vram-dirty_bitmap, 0, dirty_size); diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 0a942f8..00e8f1f 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -221,16 +221,16 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, walk_t *gw, int version) #if GUEST_PAGING_LEVELS = 4 /* 64-bit only... */ l4p = (guest_l4e_t *)v-arch.paging.shadow.guest_vtable; mismatch |= (gw-l4e.l4 != l4p[guest_l4_table_offset(va)].l4); -l3p = sh_map_domain_page(gw-l3mfn); +l3p = map_domain_page(gw-l3mfn); mismatch
[Xen-devel] [PATCH v3 2/4] xen/domain_page: Convert copy/clear_domain_page() to using mfn_t
From: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Andrew Cooper andrew.coop...@citrix.com [Convert grant_table.c to pass mfn_t types and fix ARM compiling] Signed-off-by: Ben Catterall ben.catter...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Jan Beulich jbeul...@suse.com --- xen/arch/x86/mm.c | 7 --- xen/common/grant_table.c | 2 +- xen/common/kimage.c | 12 ++-- xen/common/memory.c | 12 +--- xen/include/xen/domain_page.h | 15 ++- 5 files changed, 22 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index b011c95..df9c190 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -3293,7 +3293,7 @@ long do_mmuext_op( /* A page is dirtied when it's being cleared. */ paging_mark_dirty(pg_owner, page_to_mfn(page)); -clear_domain_page(page_to_mfn(page)); +clear_domain_page(_mfn(page_to_mfn(page))); put_page_and_type(page); break; @@ -3327,7 +3327,8 @@ long do_mmuext_op( /* A page is dirtied when it's being copied to. */ paging_mark_dirty(pg_owner, page_to_mfn(dst_page)); -copy_domain_page(page_to_mfn(dst_page), page_to_mfn(src_page)); +copy_domain_page(_mfn(page_to_mfn(dst_page)), + _mfn(page_to_mfn(src_page))); put_page_and_type(dst_page); put_page(src_page); @@ -6003,7 +6004,7 @@ int create_perdomain_mapping(struct domain *d, unsigned long va, pg = alloc_domheap_page(d, MEMF_no_owner); if ( pg ) { -clear_domain_page(page_to_mfn(pg)); +clear_domain_page(_mfn(page_to_mfn(pg))); if ( !IS_NIL(ppg) ) *ppg++ = pg; l1tab[l1_table_offset(va)] = diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index dc3487d..681a553 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -1848,7 +1848,7 @@ gnttab_transfer( goto unlock_and_copyback; } -copy_domain_page(page_to_mfn(new_page), mfn); +copy_domain_page(_mfn(page_to_mfn(new_page)), _mfn(mfn)); page-count_info = ~(PGC_count_mask|PGC_allocated); free_domheap_page(page); diff --git a/xen/common/kimage.c b/xen/common/kimage.c index 8c4854d..742e4e8 100644 --- a/xen/common/kimage.c +++ b/xen/common/kimage.c @@ -77,7 +77,7 @@ static struct page_info *kimage_alloc_zeroed_page(unsigned memflags) if ( !page ) return NULL; -clear_domain_page(page_to_mfn(page)); +clear_domain_page(_mfn(page_to_mfn(page))); return page; } @@ -409,7 +409,7 @@ static struct page_info *kimage_alloc_crash_control_page(struct kexec_image *ima if ( page ) { image-next_crash_page = hole_end; -clear_domain_page(page_to_mfn(page)); +clear_domain_page(_mfn(page_to_mfn(page))); } return page; @@ -637,15 +637,15 @@ static struct page_info *kimage_alloc_page(struct kexec_image *image, if ( old ) { /* If so move it. */ -unsigned long old_mfn = *old PAGE_SHIFT; -unsigned long mfn = addr PAGE_SHIFT; +mfn_t old_mfn = _mfn(*old PAGE_SHIFT); +mfn_t mfn = _mfn(addr PAGE_SHIFT); copy_domain_page(mfn, old_mfn); clear_domain_page(old_mfn); *old = (addr ~PAGE_MASK) | IND_SOURCE; unmap_domain_page(old); -page = mfn_to_page(old_mfn); +page = mfn_to_page(mfn_x(old_mfn)); break; } else @@ -917,7 +917,7 @@ int kimage_build_ind(struct kexec_image *image, unsigned long ind_mfn, goto done; } -copy_domain_page(page_to_mfn(xen_page), mfn); +copy_domain_page(_mfn(page_to_mfn(xen_page)), _mfn(mfn)); put_page(guest_page); ret = kimage_add_page(image, page_to_maddr(xen_page)); diff --git a/xen/common/memory.c b/xen/common/memory.c index c84fcdd..ae4c32e 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -1170,25 +1170,23 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } -#ifdef CONFIG_DOMAIN_PAGE -void clear_domain_page(unsigned long mfn) +void clear_domain_page(mfn_t mfn) { -void *ptr = map_domain_page(mfn); +void *ptr = map_domain_page(mfn_x(mfn)); clear_page(ptr); unmap_domain_page(ptr); } -void copy_domain_page(unsigned long dmfn, unsigned long smfn) +void copy_domain_page(mfn_t dest, mfn_t source) { -const void *src = map_domain_page(smfn); -void *dst = map_domain_page(dmfn); +const void *src = map_domain_page(mfn_x(source)); +void *dst = map_domain_page(mfn_x(dest)); copy_page(dst, src);
Re: [Xen-devel] [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops
On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: +static int hvmemul_linear_mmio_access( +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer, +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t known_gpfn) +{ +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io; +unsigned long offset = gla ~PAGE_MASK; +unsigned int chunk; +paddr_t gpa; +unsigned long one_rep = 1; +int rc; + +chunk = min_t(unsigned int, size, PAGE_SIZE - offset); + +if ( known_gpfn ) +gpa = pfn_to_paddr(vio-mmio_gpfn) | offset; +else +{ +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec, +hvmemul_ctxt); +if ( rc != X86EMUL_OKAY ) +return rc; +} + +for ( ;; ) +{ +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer); +if ( rc != X86EMUL_OKAY ) +break; + +gla += chunk; +buffer += chunk; +size -= chunk; + +if ( size == 0 ) +break; + +ASSERT((gla ~PAGE_MASK) == 0); While I don't mean you to re-submit another time, I'd still like to get my question answered: Does this really matter for the code below? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 1/2] OSSTEST: introduce a raisin build test
On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote: Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com In general I'm very suspicious of a 200 line patch with _no_ commit message at all. What about the details of the stuff around overriding the versions iff tree and revision are set etc and why that matters and how the two projects therefore interact wrt selecting what to build? I remember discussing this at length either IRL or in older versions of the thread and we both know it wasn't trivial to arrive at how this should be done, but I _already_ can't remember how we reached this point or why and that is information which really needs to be recorded so it can be referred to later when we wonder why it does this. The scope is also missing, i.e. this is currently just a straight raisin build test, and the results are not consumed by anything. This is important because of all the stuff about splitting up the dist dir and incremental builds which we discussed means that as it stands this test step is not producing build artefacts suitable as an input for actual tests. I expect there's also stuff in the intra-version changelog which actually belongs in the main changelog, such as why you are refactoring certain things into BuildSupport.pm, which deserves some sort of brief mention IMHO. [...] Changes in v3: - use $raisindir throughout ts-raisin-build - do not specify ENABLED_COMPONENTS so that empty REVISION variables can be used to disable building a raisin component I see we do again in the code below, which I suspect was deliberate and based on some discussion, but it's not mentioned in the Changes in v4+ and since the changelog doesn't explain anything I can't even begin to guess why or how we arrived at this point. [...] +sub build () { +target_cmd_root($ho, END); +cd $raisindir +./raise -y install-builddep If two of these happen to run in parallel (build machines can be shared) then one or the other risks timing out on the underlying dpkg lock waiting for the other, since the wait might be arbitrarily long depending on what is going on. It also risks other builds happening in an environment which differs from one build to the next (if this happened to run in the middle) or even things changing while a build is happening. This is why all build dependencies are installed from ts-xen-build-prep, that step is run once for each build host as it is installed. This does unfortunately mean that I think we can't take advantage of raisin's tracking of necessary build dependencies, but at least it will be checking things for us. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3] x86: correct socket_cpumask allocation
For booting cpu, the socket number is not needed to be 0 so it needs to be computed by cpu number. For secondary cpu, phys_proc_id is not valid in CPU_PREPARE notifier(cpu_smpboot_alloc), so cpu_to_socket(cpu) can't be used. Instead, pre-allocate secondary_cpu_mask in cpu_smpboot_alloc() and later consume it in smp_store_cpu_info(). This patch also change socket_cpumask type from 'cpumask_var_t *' to 'cpumask_t **' so that smaller NR_CPUS works. Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com Tested-by: Dario Faggioli dario.faggi...@citrix.com Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- Changes in v3: * use type safe xzalloc(cpumask_t) Changes in v2: * Fix case that booting cpu is on the socket other than socket0. * cpumask_var_t = cpumask_t * to make smaller NR_CPUS builds. --- xen/arch/x86/smpboot.c| 25 ++--- xen/include/asm-x86/smp.h | 2 +- 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index c73aa1b..0f03364 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -61,7 +61,8 @@ cpumask_t cpu_online_map __read_mostly; EXPORT_SYMBOL(cpu_online_map); unsigned int __read_mostly nr_sockets; -cpumask_var_t *__read_mostly socket_cpumask; +cpumask_t **__read_mostly socket_cpumask; +static cpumask_t *secondary_socket_cpumask; struct cpuinfo_x86 cpu_data[NR_CPUS]; @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS]; static void smp_store_cpu_info(int id) { struct cpuinfo_x86 *c = cpu_data + id; +unsigned int socket; *c = boot_cpu_data; if ( id != 0 ) +{ identify_cpu(c); +socket = cpu_to_socket(id); +if ( !socket_cpumask[socket] ) +{ +socket_cpumask[socket] = secondary_socket_cpumask; +secondary_socket_cpumask = NULL; +} +} + /* * Certain Athlons might work (for various values of 'work') in SMP * but they are not certified as MP capable. @@ -658,7 +669,7 @@ static void cpu_smpboot_free(unsigned int cpu) if ( cpumask_empty(socket_cpumask[socket]) ) { -free_cpumask_var(socket_cpumask[socket]); +xfree(socket_cpumask[socket]); socket_cpumask[socket] = NULL; } @@ -705,7 +716,6 @@ static int cpu_smpboot_alloc(unsigned int cpu) nodeid_t node = cpu_to_node(cpu); struct desc_struct *gdt; unsigned long stub_page; -unsigned int socket = cpu_to_socket(cpu); if ( node != NUMA_NO_NODE ) memflags = MEMF_node(node); @@ -748,8 +758,8 @@ static int cpu_smpboot_alloc(unsigned int cpu) goto oom; per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu); -if ( !socket_cpumask[socket] - !zalloc_cpumask_var(socket_cpumask + socket) ) +if ( secondary_socket_cpumask == NULL + (secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL ) goto oom; if ( zalloc_cpumask_var(per_cpu(cpu_sibling_mask, cpu)) @@ -804,8 +814,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus) set_nr_sockets(); -socket_cpumask = xzalloc_array(cpumask_var_t, nr_sockets); -if ( !socket_cpumask || !zalloc_cpumask_var(socket_cpumask) ) +socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets); +if ( socket_cpumask == NULL || + (socket_cpumask[cpu_to_socket(0)] = xzalloc(cpumask_t)) == NULL ) panic(No memory for socket CPU siblings map); if ( !zalloc_cpumask_var(per_cpu(cpu_sibling_mask, 0)) || diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h index e594062..ea07888 100644 --- a/xen/include/asm-x86/smp.h +++ b/xen/include/asm-x86/smp.h @@ -67,7 +67,7 @@ extern unsigned int nr_sockets; void set_nr_sockets(void); /* Representing HT and core siblings in each socket. */ -extern cpumask_var_t *socket_cpumask; +extern cpumask_t **socket_cpumask; #endif /* !__ASSEMBLY__ */ -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] x86: correct socket_cpumask allocation
On 07/09/2015 11:16 AM, Andrew Cooper wrote: On 09/07/15 15:23, Chao Peng wrote: For booting cpu, the socket number is not needed to be 0 so it needs to be computed by cpu number. For secondary cpu, phys_proc_id is not valid in CPU_PREPARE notifier(cpu_smpboot_alloc), so cpu_to_socket(cpu) can't be used. Instead, pre-allocate secondary_cpu_mask in cpu_smpboot_alloc() and later consume it in smp_store_cpu_info(). This patch also change socket_cpumask type from 'cpumask_var_t *' to 'cpumask_t **' so that smaller NR_CPUS works. Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com Tested-by: Dario Faggioli dario.faggi...@citrix.com Signed-off-by: Chao Peng chao.p.p...@linux.intel.com The v2 of this path seems to be holding up fine against the XenServer test pool. No failures to boot encountered so far. Tested-by: Andrew Cooper andrew.coop...@citrix.com As the differences between v2 and v3 appear largely mechanical and not functional in nature, this Tested-by can probably still stand. Works for me as well. (You can add 'Tested-by: Boris Ostrovsky boris.ostrov...@oracle.com' although I think by now there are enough of those) -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v7 1/2] cr-publish-flight-logs: support publishing files from $HOME/public_html
On Thu, 2015-07-09 at 13:48 +0100, Ian Jackson wrote: Ian Campbell writes ([PATCH OSSTEST v7 1/2] cr-publish-flight-logs: support publishing files from $HOME/public_html): Any arguments given after the flight are assumed to be files or directories relative to $HOME/public_html which should be published to $c{Publish} at the same relative path. Removing the requirement that a flight be provided, by accepting '', allows this to be used even in contexts which have no flight. The results subtree is always published if $c{ResultsPublish} is set, since it doesn't depend on any particular flight. Uh. Looking at this, I hadn't spotted that in your other patch you generate the output in ~/public_html. Shouldn't it be in results/ (which is nowadays a general place for publicly visible output) ? At the very least, if you do mean ~/public_html, you should be using WebspaceFile. Results would be ok too, if you prefer (which you seem to so I shall make that change). Also: -sub copydir ($$) { +sub copylogs ($$) { Why did you rename this function ? I meant to mention this in the commit message and forgot, it was just because the thing I was now copying wasn't a dir... But if I generate the file into results this isn't need, the existing publication of Results will do the job for free. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry
-Original Message- From: Paul Durrant Sent: 09 July 2015 14:43 To: 'Jan Beulich' Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 14:38 To: Paul Durrant Cc: Andrew Cooper; xen-de...@lists.xenproject.org; Keir (Xen.org) Subject: RE: [PATCH v6 09/16] x86/hvm: limit reps to avoid the need to handle retry On 09.07.15 at 14:50, paul.durr...@citrix.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 13:04 On 09.07.15 at 13:11, paul.durr...@citrix.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 11:05 On 03.07.15 at 18:25, paul.durr...@citrix.com wrote: @@ -287,17 +271,56 @@ static int hvmemul_do_io_addr( bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa) { -struct page_info *ram_page; +struct vcpu *v = current; +unsigned long ram_gmfn = paddr_to_pfn(ram_gpa); +unsigned int page_off = ram_gpa (PAGE_SIZE - 1); +struct page_info *ram_page[2]; +int nr_pages = 0; unsigned int No, it's intentionally signed because the unwind code at the end of the function is: while ( --nr_pages = 0 ) hvmemul_release_page(ram_page[nr_pages]); I.e. the loop terminates when nr_pages gets to -1. while ( nr_pages-- ) hvmemul_release_page(ram_page[nr_pages]); But that wouldn't be correct, since the loop would try to release ram_page[1] and ram_page[0] in the case where only ram_page[0] had been acquired. It would have to be: while ( nr_pages ) hvmemul_release_page(ram_page[--nr_pages]); Unless you need the value of nr_pages after the loop, I can't see the difference between your and my proposal - the decrement in both cases happens between determining whether to continue the loop and accessing the intended array element. D'oh. Of course it does. which I'll do, since you so dislike signed ints. Sounds like you're not understanding why. Just look at the generated code when using signed vs unsigned int as array index. Ok, I'll take a look because I would not have thought it would make a huge difference. So Two test programs: test1.c: --- #include stdio.h int main(void) { int i = 10; int a[10]; while ( --i = 0 ) { a[i] = i; printf(%d\n, i); } return 0; } test2.c: --- #include stdio.h int main(void) { unsigned int i = 10; int a[10]; while ( i ) { a[--i] = i; printf(%d\n, i); } return 0; } I compiled them just using 'make test1' and 'make test2', so using whatever the default options are for gcc (on a debian wheezy box). The relevant bits of the objdump are: test1 - 0040050c main: 40050c: 55 push %rbp 40050d: 48 89 e5mov%rsp,%rbp 400510: 48 83 ec 30 sub$0x30,%rsp 400514: c7 45 fc 0a 00 00 00movl $0xa,-0x4(%rbp) 40051b: eb 20 jmp40053d main+0x31 40051d: 8b 45 fcmov-0x4(%rbp),%eax 400520: 48 98 cltq 400522: 8b 55 fcmov-0x4(%rbp),%edx 400525: 89 54 85 d0 mov%edx,-0x30(%rbp,%rax,4) 400529: 8b 45 fcmov-0x4(%rbp),%eax 40052c: 89 c6 mov%eax,%esi 40052e: bf fc 05 40 00 mov$0x4005fc,%edi 400533: b8 00 00 00 00 mov$0x0,%eax 400538: e8 a3 fe ff ff callq 4003e0 printf@plt 40053d: 83 6d fc 01 subl $0x1,-0x4(%rbp) 400541: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 400545: 79 d6 jns40051d main+0x11 400547: b8 00 00 00 00 mov$0x0,%eax 40054c: c9 leaveq 40054d: c3 retq 40054e: 90 nop 40054f: 90 nop test2 - 0040050c main: 40050c: 55 push %rbp 40050d: 48 89 e5mov%rsp,%rbp 400510: 48 83 ec 30 sub$0x30,%rsp 400514: c7 45 fc 0a 00 00 00movl $0xa,-0x4(%rbp) 40051b: eb 22 jmp40053f main+0x33 40051d: 83 6d fc 01 subl $0x1,-0x4(%rbp) 400521: 8b 55 fcmov-0x4(%rbp),%edx 400524: 8b 45 fcmov-0x4(%rbp),%eax 400527: 89 54 85 d0 mov%edx,-0x30(%rbp,%rax,4) 40052b: 8b 45 fcmov-0x4(%rbp),%eax 40052e:
Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int
On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: Building on the previous patch, this patch restricts portio port numbers to uint16_t in registration/relocate calls. In portio_action_t the port number is change to unsigned int though to avoid the compiler generating 16-bit operations unnecessarily. The patch also changes I/O sizes to unsigned int which then allows the io_handler size field to reduce to an unsigned int. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v7: - Change port type in portio_action_t to unsigned int as requested by Jan Yet title and description were left in places, and ... @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p); int hvm_buffered_io_send(ioreq_t *p); static inline void register_portio_handler( -struct domain *d, unsigned long addr, -unsigned long size, portio_action_t action) +struct domain *d, uint16_t port, unsigned int size, +portio_action_t action) { -register_io_handler(d, addr, size, action, HVM_PORTIO); +register_io_handler(d, port, size, action, HVM_PORTIO); } static inline void relocate_portio_handler( -struct domain *d, unsigned long old_addr, unsigned long new_addr, -unsigned long size) +struct domain *d, uint16_t old_port, uint16_t new_port, +unsigned int size) { -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO); +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO); } ... these still use uint16_t. I'm pretty sure I gave my comment in a way indicating that this should generally change, perhaps just at the example of portio_action_t. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v8 2/2] mg-all-branch-statuses: Show how up to date each branch is
Ian Campbell writes ([PATCH OSSTEST v8 2/2] mg-all-branch-statuses: Show how up to date each branch is): Using report_find_push_age_info allows us to provide counts of attempts since the last baseline on current tip as well as the first attempt of each of those. Since everything serialises on the repo lock I didn't bother trying to parallelise anything. Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 2/2] OSSTest: push successful raisin builds
On Mon, 2015-06-22 at 16:16 +0100, Stefano Stabellini wrote: Determine the most recent raisin revision that needs to be tested, by comparing with the already tested xen-tested-master branch. Push to raisin.git:xen-tested-master when the build is successful. The mechanics here look right to me, but do you perhaps want to adopt the staging-master push gate scheme used in xen.git etc? This would make master (the default thing which users use) be something which has undergone at least some amount of testing. We typically use xen-tested-master for 3rd party things where the output of the test gate is mainly just for osstest's bookkeeping and not really aimed at actually users. Oh, one thing you missed is adding raisin to the default $BRANCHES at the top of cr-for-branches, without that there will be no raisin flights run and therefore no push. You might also want to considering all non-raisin build and test jobs from the raisin flight, since they are a waste of time until the raisin build results are actually used for something. You can filter tests in make-flight:job_create_test_filter_callback and for jobs you will need my mfi-common: Allow make-*flight to filter the set of build jobs to include patch which adds job_create_build_filter_callback. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] MSR_PLATFORM_INFO register can't be emulated in vmx_msr_read_intercept (xen latest version)
Xen latest new version, CPU: ivybridge. SiSoftware Sandra software fails to run on a windows vm, the guest print BSOD 3B, after debug I found the problem in function vmx_msr_read_intercept windows vm first read from 0xce register and then read 0x620 caused gp_fault by the follow steps: vmx_msr_read_intercept -rdmsr_safe() 0xce is MSR_PLATFORM_INFO refered to Intel(r) 64 and IA-32 Architectures Software Developer's Manual MSR_PLATFORM_INFO can't be emulated in xen, so can anyone give me some advice to solve the problem ad make SiSoftware Sandra software work well. Thanks Henglong Fan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v7 1/2] cr-publish-flight-logs: support publishing files from $HOME/public_html
Ian Campbell writes (Re: [PATCH OSSTEST v7 1/2] cr-publish-flight-logs: support publishing files from $HOME/public_html): On Thu, 2015-07-09 at 13:48 +0100, Ian Jackson wrote: At the very least, if you do mean ~/public_html, you should be using WebspaceFile. Results would be ok too, if you prefer (which you seem to so I shall make that change). Thanks. Also: -sub copydir ($$) { +sub copylogs ($$) { Why did you rename this function ? I meant to mention this in the commit message and forgot, it was just because the thing I was now copying wasn't a dir... But if I generate the file into results this isn't need, the existing publication of Results will do the job for free. Indeed. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain
On 07/09/2015 03:02 AM, Jan Beulich wrote: On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote: As I started to update the patches I realized that in some cases (especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the domain. d-vcpu[0] should work. Otherwise I'll either need a new field in struct domain or wrap has_32bit_shinfo into something PVH-specific, like is_32bit_pvh_vcpu(). Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like for PVH, especially if you also intend the tools to use the 64-bit guest context variant even for 32-bit PVH? Once again - are you intending to prohibit 32-bit PVH switching to 64-bit mode (which would seem both wrong and possibly cumbersome to me)? With current PVH implementation I don't think we can switch. We are starting the guest in very much PV-like fashion. That's why we are getting into switch_compat() --- via XEN_DOMCTL_set_address_size. For XEN_DOMCTL_get_address_size specifically we need to be able to figure out the mode *before* the guest is running because we use it to set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't change the mode. -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain
On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote: On 07/09/2015 03:02 AM, Jan Beulich wrote: On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote: As I started to update the patches I realized that in some cases (especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the domain. d-vcpu[0] should work. Otherwise I'll either need a new field in struct domain or wrap has_32bit_shinfo into something PVH-specific, like is_32bit_pvh_vcpu(). Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like for PVH, especially if you also intend the tools to use the 64-bit guest context variant even for 32-bit PVH? Once again - are you intending to prohibit 32-bit PVH switching to 64-bit mode (which would seem both wrong and possibly cumbersome to me)? With current PVH implementation I don't think we can switch. We are starting the guest in very much PV-like fashion. That's why we are getting into switch_compat() --- via XEN_DOMCTL_set_address_size. For XEN_DOMCTL_get_address_size specifically we need to be able to figure out the mode *before* the guest is running because we use it to set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't change the mode. Okay - but is there code (being put) in place to refuse switch attempts? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 01/13] x86/hvm: change portio port numbers and sizes to unsigned int
Building on the previous patch, this patch changes portio port numbers and sizes to unsigned int which then allows the io_handler size field to reduce to an unsigned int. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - Get rid of remaining uint16_t port numbers as requested by Jan v7: - Change port type in portio_action_t to unsigned int as requested by Jan v6: - Added Andrew's reviewed-by v5: - New patch to tidy up more types --- xen/arch/x86/hvm/hvm.c |6 +++--- xen/arch/x86/hvm/i8254.c |8 xen/arch/x86/hvm/intercept.c |4 ++-- xen/arch/x86/hvm/pmtimer.c |4 ++-- xen/arch/x86/hvm/rtc.c |2 +- xen/arch/x86/hvm/stdvga.c|2 +- xen/arch/x86/hvm/vpic.c |4 ++-- xen/include/asm-x86/hvm/io.h | 20 ++-- 8 files changed, 25 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 554e239..126882d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -559,7 +559,7 @@ static int hvm_add_ioreq_gmfn( } static int hvm_print_line( -int dir, uint32_t port, uint32_t bytes, uint32_t *val) +int dir, unsigned int port, unsigned int bytes, uint32_t *val) { struct domain *cd = current-domain; char c = *val; @@ -585,7 +585,7 @@ static int hvm_print_line( } static int hvm_access_cf8( -int dir, uint32_t port, uint32_t bytes, uint32_t *val) +int dir, unsigned int port, unsigned int bytes, uint32_t *val) { struct domain *d = current-domain; @@ -597,7 +597,7 @@ static int hvm_access_cf8( } static int handle_pvh_io( -int dir, uint32_t port, uint32_t bytes, uint32_t *val) +int dir, unsigned int port, unsigned int bytes, uint32_t *val) { struct domain *currd = current-domain; diff --git a/xen/arch/x86/hvm/i8254.c b/xen/arch/x86/hvm/i8254.c index 36a0a53..8a93c88 100644 --- a/xen/arch/x86/hvm/i8254.c +++ b/xen/arch/x86/hvm/i8254.c @@ -50,9 +50,9 @@ #define RW_STATE_WORD1 4 static int handle_pit_io( -int dir, uint32_t port, uint32_t bytes, uint32_t *val); +int dir, unsigned int port, unsigned int bytes, uint32_t *val); static int handle_speaker_io( -int dir, uint32_t port, uint32_t bytes, uint32_t *val); +int dir, unsigned int port, unsigned int bytes, uint32_t *val); #define get_guest_time(v) \ (is_hvm_vcpu(v) ? hvm_get_guest_time(v) : (u64)get_s_time()) @@ -479,7 +479,7 @@ void pit_deinit(struct domain *d) /* the intercept action for PIT DM retval:0--not handled; 1--handled */ static int handle_pit_io( -int dir, uint32_t port, uint32_t bytes, uint32_t *val) +int dir, unsigned int port, unsigned int bytes, uint32_t *val) { struct PITState *vpit = vcpu_vpit(current); @@ -522,7 +522,7 @@ static uint32_t speaker_ioport_read( } static int handle_speaker_io( -int dir, uint32_t port, uint32_t bytes, uint32_t *val) +int dir, unsigned int port, uint32_t bytes, uint32_t *val) { struct PITState *vpit = vcpu_vpit(current); diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c index cc44733..52879ff 100644 --- a/xen/arch/x86/hvm/intercept.c +++ b/xen/arch/x86/hvm/intercept.c @@ -369,7 +369,7 @@ int hvm_io_intercept(ioreq_t *p, int type) } void register_io_handler( -struct domain *d, unsigned long addr, unsigned long size, +struct domain *d, unsigned long addr, unsigned int size, void *action, int type) { struct hvm_io_handler *handler = d-arch.hvm_domain.io_handler; @@ -386,7 +386,7 @@ void register_io_handler( void relocate_io_handler( struct domain *d, unsigned long old_addr, unsigned long new_addr, -unsigned long size, int type) +unsigned int size, int type) { struct hvm_io_handler *handler = d-arch.hvm_domain.io_handler; int i; diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c index 6ad2797..34fc062 100644 --- a/xen/arch/x86/hvm/pmtimer.c +++ b/xen/arch/x86/hvm/pmtimer.c @@ -142,7 +142,7 @@ static void pmt_timer_callback(void *opaque) /* Handle port I/O to the PM1a_STS and PM1a_EN registers */ static int handle_evt_io( -int dir, uint32_t port, uint32_t bytes, uint32_t *val) +int dir, unsigned int port, unsigned int bytes, uint32_t *val) { struct vcpu *v = current; PMTState *s = v-domain-arch.hvm_domain.pl_time.vpmt; @@ -205,7 +205,7 @@ static int handle_evt_io( /* Handle port I/O to the TMR_VAL register */ static int handle_pmt_io( -int dir, uint32_t port, uint32_t bytes, uint32_t *val) +int dir, unsigned int port, unsigned int bytes, uint32_t *val) { struct vcpu *v = current; PMTState *s = v-domain-arch.hvm_domain.pl_time.vpmt; diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c index 3448971..a9efeaf 100644 --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -697,7 +697,7 @@ static uint32_t
Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
- jbeul...@suse.com wrote: On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote: You are right, it needs to be rebased. I can post later rebased on memory leak fix version, if you thin its a way to go. I didn't look at v9 yet, and can't predict when I will be able to. Jan Jan Would you like me to post v10 with memory leak patch included in the patchset before you start looking at v9? Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain
On 07/09/2015 10:30 AM, Boris Ostrovsky wrote: On 07/09/2015 10:17 AM, Jan Beulich wrote: On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote: On 07/09/2015 03:02 AM, Jan Beulich wrote: On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote: As I started to update the patches I realized that in some cases (especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the domain. d-vcpu[0] should work. Otherwise I'll either need a new field in struct domain or wrap has_32bit_shinfo into something PVH-specific, like is_32bit_pvh_vcpu(). Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like for PVH, especially if you also intend the tools to use the 64-bit guest context variant even for 32-bit PVH? Once again - are you intending to prohibit 32-bit PVH switching to 64-bit mode (which would seem both wrong and possibly cumbersome to me)? With current PVH implementation I don't think we can switch. We are starting the guest in very much PV-like fashion. That's why we are getting into switch_compat() --- via XEN_DOMCTL_set_address_size. For XEN_DOMCTL_get_address_size specifically we need to be able to figure out the mode *before* the guest is running because we use it to set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't change the mode. Okay - but is there code (being put) in place to refuse switch attempts? No, I should add code to deal with this. Forgot to include here --- so what is your preference wrt what I am asking in the first paragraph? d-vcpu[0], new field (or maybe a flag with bits per 32bit-pv and 32bit-pvh), or a PVH-wrapper for has_32bit_shinfo? -boris -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m
From: Wei Liu [mailto:wei.l...@citrix.com] Sent: Thursday, July 09, 2015 4:49 AM Question for you and Ed, how much testing have you done to this series? I assume you've done testing with it turned on and off, to the point you get a functioning guest. Have you run any test suites and got positive results? Yes, we have tested with Windows hvm guests and got positive results, and Tamas' tool patches based tests have as well. [0]: e1z8rcu-0003v6...@ukmail1.uk.xensource.com This ref link was messed up Thanks, Ravi ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int
On 09.07.15 at 18:10, paul.durr...@citrix.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:24 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: Building on the previous patch, this patch restricts portio port numbers to uint16_t in registration/relocate calls. In portio_action_t the port number is change to unsigned int though to avoid the compiler generating 16-bit operations unnecessarily. The patch also changes I/O sizes to unsigned int which then allows the io_handler size field to reduce to an unsigned int. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v7: - Change port type in portio_action_t to unsigned int as requested by Jan Yet title and description were left in places, and ... The title remains. The description was modified: In portio_action_t the port number is change to unsigned int though to avoid the compiler generating 16-bit operations unnecessarily. Maybe I'm just being confused by the two and-s in the title, which I assumed can't both be meant to be and, or else you'd have written port numbers, uint16_t, and sizes. Plus it seems unclear what restrict a uint16_t to unsigned int actually means. @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p); int hvm_buffered_io_send(ioreq_t *p); static inline void register_portio_handler( -struct domain *d, unsigned long addr, -unsigned long size, portio_action_t action) +struct domain *d, uint16_t port, unsigned int size, +portio_action_t action) { -register_io_handler(d, addr, size, action, HVM_PORTIO); +register_io_handler(d, port, size, action, HVM_PORTIO); } static inline void relocate_portio_handler( -struct domain *d, unsigned long old_addr, unsigned long new_addr, -unsigned long size) +struct domain *d, uint16_t old_port, uint16_t new_port, +unsigned int size) { -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO); +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO); } ... these still use uint16_t. I'm pretty sure I gave my comment in a way indicating that this should generally change, perhaps just at the example of portio_action_t. Why? Do we not want to restrict to uint16_t at the interface level? Quite the inverse - why would we want to? This just makes the calling sequence less efficient (due to the needed operand size overrides), and hides mistakes in callers properly truncating when reading guest registers. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m
On Thu, 2015-07-09 at 16:13 +, Sahita, Ravi wrote: [0]: e1z8rcu-0003v6...@ukmail1.uk.xensource.com This ref link was messed up It was a message id, you can paste it into the URL of your favourite ML archive which supports such things. e.g. http://mid.gmane.org/e1z8rcu-0003v6...@ukmail1.uk.xensource.com Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.
I think you might assume that the first M VCPUs in the runq are the current running VCPUs on the M pCPUs. Am I correct? (From what you described in the following example, I think I'm correct. ;-) ) Mmm... Interesting. Yes, I was. I was basing this assumption on this chunk on Dagaen's patch: // If we become one of top [# CPUs] in the runq, tickle it // TODO: make this work when multiple tickles are required if ( new_position 0 new_position = prv-NUM_CPUS ) runq_tickle(ops, svc); And forgot (and did not go check) about the __q_remove() in rt_schedule(). My bad again. But then, since we don't have the running vCPUs in the runq, how the code above is supposed to be correct? This was my bad. I need to change so running vcpus are in runq, and this would then be correct. I tell that you make the above assumption from here. However, in the current implementation, runq does not hold the current running VCPUs on the pCPUs. We remove the vcpu from runq in rt_schedule() function. What you described above make perfect sense if we decide to make runq hold the current running VCPUs. Yep. And it indeed seems to me that we may well think about doing so. It will make it possible to base on the position for making/optimizing scheduling decisions, and at the same time I don't think I see much downsides in that, do you? Actually, after thinking about the example you described, I think we can hold the current running VCPUs *and* the current idle pCPUs in the scheduler-wide structure; What do you mean with 'current idle pCPUs'? I said something similar as well, and what I meant was a cpumask with bit i set if i-eth pCPU is idle, do you also mean this? About the running vCPUs, why just not leave them in the actual runq? This seemed like the straightforward way to me, too. In other words, we can have another runningq (not runq) and a idle_pcpu list in the rt_private; Now all VCPUs are stored in three queues: runningq, runq, and depletedq, in increasing priority order. Perhaps, but I'm not sure I see the need for another list. Again, why just not leave them in runq? I appreciate this is a rather big change (although, perhaps it looks bigger said than done), but I think it could be worth pursuing. For double checking, asserting, and making sure that we are able to identify the running svc-s, we have the __RTDS_scheduled flag. I also don't feel we need another list. Regards, ~Dagaen Golomb ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 07/13] x86/hvm: only call hvm_io_assist() from hvm_wait_for_io()
By removing the calls in hvmemul_do_io() (which is replaced by a single assignment) and hvm_complete_assist_request() (which is replaced by a call to hvm_process_portio_intercept() with a suitable set of ops) then hvm_io_assist() can be moved into hvm.c and made static (and hence be a candidate for inlining). The calls to msix_write_completion() and vcpu_end_shutdown_deferral() are also made unconditionally because the ioreq state will always be STATE_IOREQ_NONE at the end of hvm_io_assist() so the test was pointless. These calls are also only relevant when the emulation has been handled externally which is now always the case. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Acked-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - No change v7: - No change v6: - Added Andrew's reviewed-by v5: - Added Jan's acked-by --- xen/arch/x86/hvm/emulate.c| 34 ++--- xen/arch/x86/hvm/hvm.c| 67 ++--- xen/arch/x86/hvm/io.c | 39 xen/include/asm-x86/hvm/hvm.h |1 - xen/include/asm-x86/hvm/io.h |1 - 5 files changed, 66 insertions(+), 76 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 02bde26..c26be54 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -50,6 +50,32 @@ static void hvmtrace_io_assist(const ioreq_t *p) trace_var(event, 0/*!cycles*/, size, buffer); } +static int null_read(const struct hvm_io_handler *io_handler, + uint64_t addr, + uint32_t size, + uint64_t *data) +{ +*data = ~0ul; +return X86EMUL_OKAY; +} + +static int null_write(const struct hvm_io_handler *handler, + uint64_t addr, + uint32_t size, + uint64_t data) +{ +return X86EMUL_OKAY; +} + +static const struct hvm_io_ops null_ops = { +.read = null_read, +.write = null_write +}; + +static const struct hvm_io_handler null_handler = { +.ops = null_ops +}; + static int hvmemul_do_io( bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size, uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data) @@ -139,8 +165,7 @@ static int hvmemul_do_io( switch ( rc ) { case X86EMUL_OKAY: -p.state = STATE_IORESP_READY; -hvm_io_assist(p); +vio-io_data = p.data; vio-io_state = HVMIO_none; break; case X86EMUL_UNHANDLEABLE: @@ -151,8 +176,9 @@ static int hvmemul_do_io( /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) { -hvm_complete_assist_req(p); -rc = X86EMUL_OKAY; +rc = hvm_process_io_intercept(null_handler, p); +if ( rc == X86EMUL_OKAY ) +vio-io_data = p.data; vio-io_state = HVMIO_none; } else diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 9bdc1d6..7358acf 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -411,6 +411,42 @@ bool_t hvm_io_pending(struct vcpu *v) return 0; } +static void hvm_io_assist(ioreq_t *p) +{ +struct vcpu *curr = current; +struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io; +enum hvm_io_state io_state; + +p-state = STATE_IOREQ_NONE; + +io_state = vio-io_state; +vio-io_state = HVMIO_none; + +switch ( io_state ) +{ +case HVMIO_awaiting_completion: +vio-io_state = HVMIO_completed; +vio-io_data = p-data; +break; +case HVMIO_handle_mmio_awaiting_completion: +vio-io_state = HVMIO_completed; +vio-io_data = p-data; +(void)handle_mmio(); +break; +case HVMIO_handle_pio_awaiting_completion: +if ( vio-io_size == 4 ) /* Needs zero extension. */ +guest_cpu_user_regs()-rax = (uint32_t)p-data; +else +memcpy(guest_cpu_user_regs()-rax, p-data, vio-io_size); +break; +default: +break; +} + +msix_write_completion(curr); +vcpu_end_shutdown_deferral(curr); +} + static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) { /* NB. Optimised for common case (p-state == STATE_IOREQ_NONE). */ @@ -2663,37 +2699,6 @@ int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p) return X86EMUL_UNHANDLEABLE; } -void hvm_complete_assist_req(ioreq_t *p) -{ -switch ( p-type ) -{ -case IOREQ_TYPE_PCI_CONFIG: -ASSERT_UNREACHABLE(); -break; -case IOREQ_TYPE_COPY: -case IOREQ_TYPE_PIO: -if ( p-dir == IOREQ_READ ) -{ -if ( !p-data_is_ptr ) -p-data = ~0ul; -else -{ -int i, step = p-df ? -p-size : p-size; -uint32_t data = ~0; - -for ( i = 0; i
[Xen-devel] [PATCH v8 08/13] x86/hvm: split I/O completion handling from state model
The state of in-flight I/O and how its completion will be handled are logically separate and conflating the two makes the code unnecessarily confusing. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - No change v7: - Modified struct field placement as requested by Jan v6: - Added Andrew's reviewed-by v5: - Confirmed call to msix_write_completion() is in the correct place. --- xen/arch/x86/hvm/hvm.c| 44 +++-- xen/arch/x86/hvm/io.c |6 ++--- xen/arch/x86/hvm/vmx/realmode.c | 27 +-- xen/include/asm-x86/hvm/vcpu.h| 16 +- xen/include/asm-x86/hvm/vmx/vmx.h |1 + 5 files changed, 67 insertions(+), 27 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 7358acf..093a710 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -59,6 +59,7 @@ #include asm/hvm/trace.h #include asm/hvm/nestedhvm.h #include asm/hvm/event.h +#include asm/hvm/vmx/vmx.h #include asm/mtrr.h #include asm/apic.h #include public/sched.h @@ -428,17 +429,6 @@ static void hvm_io_assist(ioreq_t *p) vio-io_state = HVMIO_completed; vio-io_data = p-data; break; -case HVMIO_handle_mmio_awaiting_completion: -vio-io_state = HVMIO_completed; -vio-io_data = p-data; -(void)handle_mmio(); -break; -case HVMIO_handle_pio_awaiting_completion: -if ( vio-io_size == 4 ) /* Needs zero extension. */ -guest_cpu_user_regs()-rax = (uint32_t)p-data; -else -memcpy(guest_cpu_user_regs()-rax, p-data, vio-io_size); -break; default: break; } @@ -479,6 +469,7 @@ void hvm_do_resume(struct vcpu *v) struct hvm_vcpu_io *vio = v-arch.hvm_vcpu.hvm_io; struct domain *d = v-domain; struct hvm_ioreq_server *s; +enum hvm_io_completion io_completion; check_wakeup_from_wait(); @@ -505,8 +496,37 @@ void hvm_do_resume(struct vcpu *v) } } -if ( vio-mmio_retry ) +io_completion = vio-io_completion; +vio-io_completion = HVMIO_no_completion; + +switch ( io_completion ) +{ +case HVMIO_no_completion: +break; +case HVMIO_mmio_completion: handle_mmio(); +break; +case HVMIO_pio_completion: +if ( vio-io_size == 4 ) /* Needs zero extension. */ +guest_cpu_user_regs()-rax = (uint32_t)vio-io_data; +else +memcpy(guest_cpu_user_regs()-rax, vio-io_data, vio-io_size); +vio-io_state = HVMIO_none; +break; +case HVMIO_realmode_completion: +{ +struct hvm_emulate_ctxt ctxt; + +hvm_emulate_prepare(ctxt, guest_cpu_user_regs()); +vmx_realmode_emulate_one(ctxt); +hvm_emulate_writeback(ctxt); + +break; +} +default: +ASSERT_UNREACHABLE(); +break; +} /* Inject pending hw/sw trap */ if ( v-arch.hvm_vcpu.inject_trap.vector != -1 ) diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index fe099d8..221d05e 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -92,8 +92,8 @@ int handle_mmio(void) if ( rc != X86EMUL_RETRY ) vio-io_state = HVMIO_none; -if ( vio-io_state == HVMIO_awaiting_completion ) -vio-io_state = HVMIO_handle_mmio_awaiting_completion; +if ( vio-io_state == HVMIO_awaiting_completion || vio-mmio_retry ) +vio-io_completion = HVMIO_mmio_completion; else vio-mmio_access = (struct npfec){}; @@ -158,7 +158,7 @@ int handle_pio(uint16_t port, unsigned int size, int dir) return 0; /* Completion in hvm_io_assist() with no re-emulation required. */ ASSERT(dir == IOREQ_READ); -vio-io_state = HVMIO_handle_pio_awaiting_completion; +vio-io_completion = HVMIO_pio_completion; break; default: gdprintk(XENLOG_ERR, Weird HVM ioemulation status %d.\n, rc); diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c index fe8b4a0..76ff9a5 100644 --- a/xen/arch/x86/hvm/vmx/realmode.c +++ b/xen/arch/x86/hvm/vmx/realmode.c @@ -101,15 +101,19 @@ static void realmode_deliver_exception( } } -static void realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt) +void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt) { struct vcpu *curr = current; +struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io; int rc; perfc_incr(realmode_emulations); rc = hvm_emulate_one(hvmemul_ctxt); +if ( vio-io_state == HVMIO_awaiting_completion || vio-mmio_retry ) +vio-io_completion = HVMIO_realmode_completion; + if ( rc == X86EMUL_UNHANDLEABLE ) { gdprintk(XENLOG_ERR, Failed to emulate insn.\n); @@ -177,9 +181,6 @@ void vmx_realmode(struct
[Xen-devel] [PATCH v8 09/13] x86/hvm: remove HVMIO_dispatched I/O state
By removing the HVMIO_dispatched state and making all pending emulations (i.e. all those not handled by the hypervisor) use HVMIO_awating_completion, various code-paths can be simplified. The completion case for HVMIO_dispatched can also be trivally removed from hvmemul_do_io() as it was already unreachable. This is because that state was only ever used for writes or I/O to/from a guest page and hvmemul_do_io() is never called to complete such I/O. NOTE: There is one sublety in handle_pio()... The only case when handle_pio() got a return code of X86EMUL_RETRY back from hvmemul_do_pio_buffer() and found io_state was not HVMIO_awaiting_completion was in the case where the domain is shutting down. This is because all writes normally yield a return of HVMEMUL_OKAY and all reads put io_state into HVMIO_awaiting_completion. Hence the io_state check there is replaced with a check of the is_shutting_down flag on the domain. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - No change v7: - Modified struct field placement as knock-on from previous patch v6: - Added Andrew's reviewed-by v5: - Added some extra comments to the commit --- xen/arch/x86/hvm/emulate.c | 12 +++- xen/arch/x86/hvm/hvm.c | 12 +++- xen/arch/x86/hvm/io.c | 14 +++--- xen/arch/x86/hvm/vmx/realmode.c |2 +- xen/include/asm-x86/hvm/vcpu.h | 10 +- 5 files changed, 23 insertions(+), 27 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index c26be54..933a605 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -137,20 +137,14 @@ static int hvmemul_do_io( if ( data_is_addr || dir == IOREQ_WRITE ) return X86EMUL_UNHANDLEABLE; goto finish_access; -case HVMIO_dispatched: -/* May have to wait for previous cycle of a multi-write to complete. */ -if ( is_mmio !data_is_addr (dir == IOREQ_WRITE) - (addr == (vio-mmio_large_write_pa + - vio-mmio_large_write_bytes)) ) -return X86EMUL_RETRY; -/* fallthrough */ default: return X86EMUL_UNHANDLEABLE; } -vio-io_state = (data_is_addr || dir == IOREQ_WRITE) ? -HVMIO_dispatched : HVMIO_awaiting_completion; +vio-io_state = HVMIO_awaiting_completion; vio-io_size = size; +vio-io_dir = dir; +vio-io_data_is_addr = data_is_addr; if ( dir == IOREQ_WRITE ) { diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 093a710..8e487d4 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -416,22 +416,16 @@ static void hvm_io_assist(ioreq_t *p) { struct vcpu *curr = current; struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io; -enum hvm_io_state io_state; p-state = STATE_IOREQ_NONE; -io_state = vio-io_state; -vio-io_state = HVMIO_none; - -switch ( io_state ) +if ( hvm_vcpu_io_need_completion(vio) ) { -case HVMIO_awaiting_completion: vio-io_state = HVMIO_completed; vio-io_data = p-data; -break; -default: -break; } +else +vio-io_state = HVMIO_none; msix_write_completion(curr); vcpu_end_shutdown_deferral(curr); diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index 221d05e..3b51d59 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -90,9 +90,7 @@ int handle_mmio(void) rc = hvm_emulate_one(ctxt); -if ( rc != X86EMUL_RETRY ) -vio-io_state = HVMIO_none; -if ( vio-io_state == HVMIO_awaiting_completion || vio-mmio_retry ) +if ( hvm_vcpu_io_need_completion(vio) || vio-mmio_retry ) vio-io_completion = HVMIO_mmio_completion; else vio-mmio_access = (struct npfec){}; @@ -142,6 +140,9 @@ int handle_pio(uint16_t port, unsigned int size, int dir) rc = hvmemul_do_pio_buffer(port, size, dir, data); +if ( hvm_vcpu_io_need_completion(vio) ) +vio-io_completion = HVMIO_pio_completion; + switch ( rc ) { case X86EMUL_OKAY: @@ -154,11 +155,10 @@ int handle_pio(uint16_t port, unsigned int size, int dir) } break; case X86EMUL_RETRY: -if ( vio-io_state != HVMIO_awaiting_completion ) +/* We should not advance RIP/EIP if the domain is shutting down */ +if ( curr-domain-is_shutting_down ) return 0; -/* Completion in hvm_io_assist() with no re-emulation required. */ -ASSERT(dir == IOREQ_READ); -vio-io_completion = HVMIO_pio_completion; + break; default: gdprintk(XENLOG_ERR, Weird HVM ioemulation status %d.\n, rc); diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c index 76ff9a5..deb53ae 100644 ---
[Xen-devel] [PATCH v8 05/13] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
It's clear from the following check in hvmemul_rep_movs: if ( sp2mt == p2m_mmio_direct || dp2mt == p2m_mmio_direct || (sp2mt == p2m_mmio_dm dp2mt == p2m_mmio_dm) ) return X86EMUL_UNHANDLEABLE; that mmio - mmio copy is not handled. This means the code in the stdvga mmio intercept that explicitly handles mmio - mmio copy when hvm_copy_to/from_guest_phys() fails is never going to be executed. This patch therefore adds a check in hvmemul_do_io_addr() to make sure mmio - mmio is disallowed and then registers standard mmio intercept ops in stdvga_init(). With this patch all mmio and portio handled within Xen now goes through process_io_intercept(). Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - Addressed further comments from Jan v7: - Use hvm_mmio_{first,last}_byte in stdvga_mem_accept for correctness - Add comments requested by Jan v6: - Added Andrew's reviewed-by v5: - Fixed semantic problems pointed out by Jan --- xen/arch/x86/hvm/emulate.c |9 ++ xen/arch/x86/hvm/intercept.c | 30 -- xen/arch/x86/hvm/stdvga.c| 212 +++--- xen/include/asm-x86/hvm/io.h | 10 +- 4 files changed, 113 insertions(+), 148 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 7eeaaea..195de00 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -266,6 +266,15 @@ static int hvmemul_acquire_page(unsigned long gmfn, struct page_info **page) return X86EMUL_RETRY; } +/* This code should not be reached if the gmfn is not RAM */ +if ( p2m_is_mmio(p2mt) ) +{ +domain_crash(curr_d); + +put_page(*page); +return X86EMUL_UNHANDLEABLE; +} + return X86EMUL_OKAY; } diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c index 71c4a0f..e36189e 100644 --- a/xen/arch/x86/hvm/intercept.c +++ b/xen/arch/x86/hvm/intercept.c @@ -263,20 +263,21 @@ const struct hvm_io_handler *hvm_find_io_handler(ioreq_t *p) int hvm_io_intercept(ioreq_t *p) { const struct hvm_io_handler *handler; - -if ( p-type == IOREQ_TYPE_COPY ) -{ -int rc = stdvga_intercept_mmio(p); -if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) ) -return rc; -} +const struct hvm_io_ops *ops; +int rc; handler = hvm_find_io_handler(p); if ( handler == NULL ) return X86EMUL_UNHANDLEABLE; -return hvm_process_io_intercept(handler, p); +rc = hvm_process_io_intercept(handler, p); + +ops = handler-ops; +if ( ops-complete != NULL ) +ops-complete(handler); + +return rc; } struct hvm_io_handler *hvm_next_io_handler(struct domain *d) @@ -338,6 +339,8 @@ void relocate_portio_handler(struct domain *d, unsigned int old_port, bool_t hvm_mmio_internal(paddr_t gpa) { +const struct hvm_io_handler *handler; +const struct hvm_io_ops *ops; ioreq_t p = { .type = IOREQ_TYPE_COPY, .addr = gpa, @@ -345,7 +348,16 @@ bool_t hvm_mmio_internal(paddr_t gpa) .size = 1, }; -return hvm_find_io_handler(p) != NULL; +handler = hvm_find_io_handler(p); + +if ( handler == NULL ) +return 0; + +ops = handler-ops; +if ( ops-complete != NULL ) +ops-complete(handler); + +return 1; } /* diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index e6dfdb7..4a7593d 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -148,7 +148,7 @@ static int stdvga_outb(uint64_t addr, uint8_t val) } else if ( prev_stdvga !s-stdvga ) { -gdprintk(XENLOG_INFO, leaving stdvga\n); +gdprintk(XENLOG_INFO, leaving stdvga mode\n); } return rc; @@ -275,9 +275,10 @@ static uint8_t stdvga_mem_readb(uint64_t addr) return ret; } -static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size) +static int stdvga_mem_read(const struct hvm_io_handler *handler, + uint64_t addr, uint32_t size, uint64_t *p_data) { -uint64_t data = 0; +uint64_t data = ~0ul; switch ( size ) { @@ -309,11 +310,12 @@ static uint64_t stdvga_mem_read(uint64_t addr, uint64_t size) break; default: -gdprintk(XENLOG_WARNING, invalid io size: %PRId64\n, size); +gdprintk(XENLOG_WARNING, invalid io size: %u\n, size); break; } -return data; +*p_data = data; +return X86EMUL_OKAY; } static void stdvga_mem_writeb(uint64_t addr, uint32_t val) @@ -424,8 +426,23 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val) } } -static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size) +static int stdvga_mem_write(const struct hvm_io_handler *handler, +uint64_t addr, uint32_t size, +
[Xen-devel] [PATCH v8 06/13] x86/hvm: limit reps to avoid the need to handle retry
By limiting hvmemul_do_io_addr() to reps falling within the page on which a reference has already been taken, we can guarantee that calls to hvm_copy_to/from_guest_phys() will not hit the HVMCOPY_gfn_paged_out or HVMCOPY_gfn_shared cases. Thus we can remove the retry logic (added by c/s 82ed8716b fix direct PCI port I/O emulation retry and error handling) from the intercept code and simplify it significantly. Normally hvmemul_do_io_addr() will only reference single page at a time. It will, however, take an extra page reference for I/O spanning a page boundary. It is still important to know, upon returning from x86_emulate(), whether the number of reps was reduced so the mmio_retry flag is retained for that purpose. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - No change v7: - Fixed flaw in calculation pointed out by Jan - Make nr_pages unsigned as requested by Jan - Added Andrew's reviewed-by v6: - Added comment requested by Andrew v5: - Addressed further comments from Jan --- xen/arch/x86/hvm/emulate.c | 76 ++-- xen/arch/x86/hvm/hvm.c |4 +++ xen/arch/x86/hvm/intercept.c | 57 +++--- xen/include/asm-x86/hvm/vcpu.h |2 +- 4 files changed, 66 insertions(+), 73 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 195de00..02bde26 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -51,7 +51,7 @@ static void hvmtrace_io_assist(const ioreq_t *p) } static int hvmemul_do_io( -bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size, +bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size, uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data) { struct vcpu *curr = current; @@ -60,6 +60,7 @@ static int hvmemul_do_io( .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, .addr = addr, .size = size, +.count = reps, .dir = dir, .df = df, .data = data, @@ -125,15 +126,6 @@ static int hvmemul_do_io( HVMIO_dispatched : HVMIO_awaiting_completion; vio-io_size = size; -/* - * When retrying a repeated string instruction, force exit to guest after - * completion of the retried iteration to allow handling of interrupts. - */ -if ( vio-mmio_retrying ) -*reps = 1; - -p.count = *reps; - if ( dir == IOREQ_WRITE ) { if ( !data_is_addr ) @@ -147,17 +139,9 @@ static int hvmemul_do_io( switch ( rc ) { case X86EMUL_OKAY: -case X86EMUL_RETRY: -*reps = p.count; p.state = STATE_IORESP_READY; -if ( !vio-mmio_retry ) -{ -hvm_io_assist(p); -vio-io_state = HVMIO_none; -} -else -/* Defer hvm_io_assist() invocation to hvm_do_resume(). */ -vio-io_state = HVMIO_handle_mmio_awaiting_completion; +hvm_io_assist(p); +vio-io_state = HVMIO_none; break; case X86EMUL_UNHANDLEABLE: { @@ -235,7 +219,7 @@ static int hvmemul_do_io_buffer( BUG_ON(buffer == NULL); -rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0, +rc = hvmemul_do_io(is_mmio, addr, *reps, size, dir, df, 0, (uintptr_t)buffer); if ( rc == X86EMUL_UNHANDLEABLE dir == IOREQ_READ ) memset(buffer, 0xff, size); @@ -287,17 +271,56 @@ static int hvmemul_do_io_addr( bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size, uint8_t dir, bool_t df, paddr_t ram_gpa) { -struct page_info *ram_page; +struct vcpu *v = current; +unsigned long ram_gmfn = paddr_to_pfn(ram_gpa); +unsigned int page_off = ram_gpa (PAGE_SIZE - 1); +struct page_info *ram_page[2]; +unsigned int nr_pages = 0; +unsigned long count; int rc; -rc = hvmemul_acquire_page(paddr_to_pfn(ram_gpa), ram_page); +rc = hvmemul_acquire_page(ram_gmfn, ram_page[nr_pages]); if ( rc != X86EMUL_OKAY ) -return rc; +goto out; -rc = hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 1, +nr_pages++; + +/* Detemine how many reps will fit within this page */ +count = min_t(unsigned long, + *reps, + df ? + ((page_off + size - 1) ~PAGE_MASK) / size : + (PAGE_SIZE - page_off) / size); + +if ( count == 0 ) +{ +/* + * This access must span two pages, so grab a reference to + * the next page and do a single rep. + * It is safe to assume multiple pages are physically + * contiguous at this point as hvmemul_linear_to_phys() will + * ensure this is the case. + */ +rc = hvmemul_acquire_page(df ? ram_gmfn - 1 : ram_gmfn + 1, +
[Xen-devel] [PATCH v8 02/13] x86/hvm: unify internal portio and mmio intercepts
The implementation of mmio and portio intercepts is unnecessarily different. This leads to much code duplication. This patch unifies much of the intercept handling, leaving only distinct handlers for stdvga mmio and dpci portio. Subsequent patches will unify those handlers. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - Re-base v7: - Change start/end fields in portio handler to port/size so that relocation only has to modify a single field - Addressed further comments from Jan v6: - Added Andrew's reviewed-by and made the modification requested by Roger v5: - Addressed further comments from Jan and simplified implementation by passing ioreq_t to accept() function --- xen/arch/x86/hvm/emulate.c| 11 +- xen/arch/x86/hvm/hpet.c |4 +- xen/arch/x86/hvm/hvm.c|9 +- xen/arch/x86/hvm/intercept.c | 475 + xen/arch/x86/hvm/stdvga.c | 32 +- xen/arch/x86/hvm/vioapic.c|4 +- xen/arch/x86/hvm/vlapic.c |5 +- xen/arch/x86/hvm/vmsi.c | 10 +- xen/drivers/passthrough/amd/iommu_guest.c | 30 +- xen/include/asm-x86/hvm/domain.h |1 + xen/include/asm-x86/hvm/io.h | 118 --- 11 files changed, 331 insertions(+), 368 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 7d2e3c1..7eeaaea 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -142,16 +142,7 @@ static int hvmemul_do_io( hvmtrace_io_assist(p); } -if ( is_mmio ) -{ -rc = hvm_mmio_intercept(p); -if ( rc == X86EMUL_UNHANDLEABLE ) -rc = hvm_buffered_io_intercept(p); -} -else -{ -rc = hvm_portio_intercept(p); -} +rc = hvm_io_intercept(p); switch ( rc ) { diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c index 30ac5dd..732504a 100644 --- a/xen/arch/x86/hvm/hpet.c +++ b/xen/arch/x86/hvm/hpet.c @@ -504,7 +504,7 @@ static int hpet_range(struct vcpu *v, unsigned long addr) (addr (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) ); } -const struct hvm_mmio_ops hpet_mmio_ops = { +static const struct hvm_mmio_ops hpet_mmio_ops = { .check = hpet_range, .read = hpet_read, .write = hpet_write @@ -659,6 +659,8 @@ void hpet_init(struct domain *d) h-hpet.comparator64[i] = ~0ULL; h-pt[i].source = PTSRC_isa; } + +register_mmio_handler(d, hpet_mmio_ops); } void hpet_deinit(struct domain *d) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 126882d..1fd5efc 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1455,9 +1455,6 @@ int hvm_domain_initialise(struct domain *d) spin_lock_init(d-arch.hvm_domain.irq_lock); spin_lock_init(d-arch.hvm_domain.uc_lock); -INIT_LIST_HEAD(d-arch.hvm_domain.msixtbl_list); -spin_lock_init(d-arch.hvm_domain.msixtbl_list_lock); - hvm_init_cacheattr_region_list(d); rc = paging_enable(d, PG_refcounts|PG_translate|PG_external); @@ -1465,11 +1462,11 @@ int hvm_domain_initialise(struct domain *d) goto fail0; d-arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS); -d-arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler); +d-arch.hvm_domain.io_handler = xzalloc_array(struct hvm_io_handler, + NR_IO_HANDLERS); rc = -ENOMEM; if ( !d-arch.hvm_domain.params || !d-arch.hvm_domain.io_handler ) goto fail1; -d-arch.hvm_domain.io_handler-num_slot = 0; /* Set the default IO Bitmap. */ if ( is_hardware_domain(d) ) @@ -1506,6 +1503,8 @@ int hvm_domain_initialise(struct domain *d) rtc_init(d); +msixtbl_init(d); + register_portio_handler(d, 0xe9, 1, hvm_print_line); register_portio_handler(d, 0xcf8, 4, hvm_access_cf8); diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c index 52879ff..f97ee52 100644 --- a/xen/arch/x86/hvm/intercept.c +++ b/xen/arch/x86/hvm/intercept.c @@ -32,205 +32,86 @@ #include xen/event.h #include xen/iommu.h -static const struct hvm_mmio_ops *const -hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] = +static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler, + const ioreq_t *p) { -hpet_mmio_ops, -vlapic_mmio_ops, -vioapic_mmio_ops, -msixtbl_mmio_ops, -iommu_mmio_ops -}; +BUG_ON(handler-type != IOREQ_TYPE_COPY); + +return handler-mmio.ops-check(current, p-addr); +} -static int hvm_mmio_access(struct vcpu *v, - ioreq_t *p, - hvm_mmio_read_t read, - hvm_mmio_write_t write) +static int hvm_mmio_read(const struct hvm_io_handler
[Xen-devel] [PATCH v8 00/13] x86/hvm: I/O emulation cleanup and fix
This patch series re-works much of the code involved in emulation of port and memory mapped I/O for HVM guests. The code has become very convoluted and, at least by inspection, certain emulations will apparently malfunction. The series is broken down into 13 patches (which are also available in my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git on the emulation36 branch). Previous changelog -- v4: - Removed previous patch (make sure translated MMIO reads or writes fall within a page) and rebased rest of series. - Address Jan's comments on patch #1 v3: - Addressed comments from Jan - Re-ordered series to bring a couple of more trivial patches to the front - Backport to XenServer (4.5) now passing automated tests - Tested on unstable with QEMU upstream and trad, with and without HAP (to force shadow emulation) v2: - Removed bogus assertion from patch #15 - Re-worked patch #17 after basic testing of back-port onto XenServer Subsequent changes are logged in the individual patch files (thanks to David Vrabel for that). Testing --- v6 of the series was been back-ported to staging-4.5 and then dropped onto the XenServer (Dundee) patch queue. All automated branch-safety tests pass. v8 has just been compile tested since changes since v6 are largely cosmetic. It will be back-ported in the near future. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 03/13] x86/hvm: add length to mmio check op
When memory mapped I/O is range checked by internal handlers, the length of the access should be taken into account. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - No change v7: - No change v6: - Added Andrew's reviewed-by v5: - Simplified by leaving mmio_check() implementation alone and calling to check last byte if first-byte check passes --- xen/arch/x86/hvm/intercept.c | 23 --- xen/include/asm-x86/hvm/io.h | 16 2 files changed, 36 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c index f97ee52..f4dbf17 100644 --- a/xen/arch/x86/hvm/intercept.c +++ b/xen/arch/x86/hvm/intercept.c @@ -35,9 +35,20 @@ static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler, const ioreq_t *p) { +paddr_t first = hvm_mmio_first_byte(p); +paddr_t last = hvm_mmio_last_byte(p); + BUG_ON(handler-type != IOREQ_TYPE_COPY); -return handler-mmio.ops-check(current, p-addr); +if ( !handler-mmio.ops-check(current, first) ) +return 0; + +/* Make sure the handler will accept the whole access */ +if ( p-size 1 + !handler-mmio.ops-check(current, last) ) +domain_crash(current-domain); + +return 1; } static int hvm_mmio_read(const struct hvm_io_handler *handler, @@ -106,7 +117,8 @@ static const struct hvm_io_ops portio_ops = { int hvm_process_io_intercept(const struct hvm_io_handler *handler, ioreq_t *p) { -struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io; +struct vcpu *curr = current; +struct hvm_vcpu_io *vio = curr-arch.hvm_vcpu.hvm_io; const struct hvm_io_ops *ops = (p-type == IOREQ_TYPE_COPY) ? mmio_ops : portio_ops; int rc = X86EMUL_OKAY, i, step = p-df ? -p-size : p-size; @@ -215,6 +227,9 @@ int hvm_process_io_intercept(const struct hvm_io_handler *handler, if ( i != 0 ) { +if ( rc == X86EMUL_UNHANDLEABLE ) +domain_crash(curr-domain); + p-count = i; rc = X86EMUL_OKAY; } @@ -331,7 +346,9 @@ bool_t hvm_mmio_internal(paddr_t gpa) { ioreq_t p = { .type = IOREQ_TYPE_COPY, -.addr = gpa +.addr = gpa, +.count = 1, +.size = 1, }; return hvm_find_io_handler(p) != NULL; diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index 4594e91..2b22b50 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -43,6 +43,22 @@ struct hvm_mmio_ops { hvm_mmio_write_t write; }; +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) +{ +return p-df ? + p-addr - (p-count - 1ul) * p-size : + p-addr; +} + +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) +{ +unsigned long count = p-count; + +return p-df ? + p-addr + p-size - 1: + p-addr + (count * p-size) - 1; +} + typedef int (*portio_action_t)( int dir, unsigned int port, unsigned int bytes, uint32_t *val); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
On Thu, Jul 09, 2015 at 05:00:45PM +0100, Jan Beulich wrote: On 09.07.15 at 17:53, elena.ufimts...@oracle.com wrote: - jbeul...@suse.com wrote: On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote: You are right, it needs to be rebased. I can post later rebased on memory leak fix version, if you thin its a way to go. I didn't look at v9 yet, and can't predict when I will be able to. Would you like me to post v10 with memory leak patch included in the patchset before you start looking at v9? If there is a dependency on the changes in the leak fix v6, then this would be a good idea. If not, you can keep things as they are now. I view the entire set more as a bug fix than a feature anyway, and hence see no reason not to get this in after the freeze. But I'm adding Wei just in case... I just looked at v9. The first three patches are quite mechanical. The fourth patch is relatively bigger but it's also quite straightforward (mostly parsing input). All in all, this series itself is self-contained. I'm don't think OSSTest is able to test that, so it would not cause visible regression on our side. I also agree it's a bug fix. Preferably this series should be applied before first RC. Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] bsd-sys-queue-h-seddery: Massage `offsetof'
For some reason BSD's queue.h uses `__offsetof'. It expects it to work just like offsetof. So use offsetof. Reported-by: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com --- tools/include/xen-external/bsd-sys-queue-h-seddery |2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/include/xen-external/bsd-sys-queue-h-seddery b/tools/include/xen-external/bsd-sys-queue-h-seddery index 7a957e3..3f8716d 100755 --- a/tools/include/xen-external/bsd-sys-queue-h-seddery +++ b/tools/include/xen-external/bsd-sys-queue-h-seddery @@ -69,4 +69,6 @@ s/\b struct \s+ type \b/type/xg; s,^\#include.*sys/cdefs.*,/* $ */,xg; +s,\b __offsetof \b ,offsetof,xg; + s/\b( NULL )/0/xg; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-mainline test] 59214: regressions - FAIL
flight 59214 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/59214/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl-xsm 15 guest-start/debian.repeat fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-debianhvm-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-ovmf-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-freebsd10-amd64 12 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-ovmf-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-freebsd10-i386 12 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-debianhvm-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-win7-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 11 guest-saverestore fail REGR. vs. 59059 test-amd64-amd64-xl-qemuu-winxpsp3 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-win7-amd64 11 guest-saverestore fail REGR. vs. 59059 test-amd64-i386-xl-qemuu-winxpsp3 11 guest-saverestorefail REGR. vs. 59059 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 11 guest-start fail REGR. vs. 59059 test-amd64-i386-libvirt 11 guest-start fail like 59059 test-amd64-amd64-libvirt 11 guest-start fail like 59059 test-amd64-i386-libvirt-xsm 11 guest-start fail like 59059 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass version targeted for testing: qemuud09952ee8caeea928695d5a3dc3ec50d8afb98c6 baseline version: qemuu35360642d043c2a5366e8a04a10e5545e7353bd5 Last test of basis59059 2015-07-05 10:39:20 Z4 days Failing since 59109 2015-07-06 14:58:21 Z3 days3 attempts Testing same since59214 2015-07-08 17:07:56 Z0 days1 attempts People who touched revisions under test: Alberto Garcia be...@igalia.com Alex Williamson alex.william...@redhat.com Alexander Graf ag...@suse.de Alexey Kardashevskiy a...@ozlabs.ru Alvise Rigo a.r...@virtualopensystems.com Andrew Jones drjo...@redhat.com Artyom Tarasenko atar4q...@gmail.com Aurelien Jarno aurel...@aurel32.net Benjamin Herrenschmidt b...@kernel.crashing.org Bharata B Rao bhar...@linux.vnet.ibm.com Brian Kress kre...@moose.net Cormac O'Brien i.am.cormac.obr...@gmail.com Cornelia Huck cornelia.h...@de.ibm.com David Gibson da...@gibson.dropbear.id.au Denis V. Lunev d...@openvz.org Dmitry Osipenko dig...@gmail.com Dr. David Alan Gilbert dgilb...@redhat.com Eduardo Habkost ehabk...@redhat.com Eric Auger eric.au...@linaro.org Fam Zheng f...@redhat.com Gabriel Laupre glau...@chelsio.com Gavin Shan gws...@linux.vnet.ibm.com Gerd Hoffmann kra...@redhat.com Gonglei arei.gong...@huawei.com Greg Kurz gk...@linux.vnet.ibm.com Igor Mammedov imamm...@redhat.com Jan Kiszka jan.kis...@siemens.com Johannes Schlatow schla...@ida.ing.tu-bs.de John Snow js...@redhat.com Juan Quintela quint...@redhat.com Justin Ossevoort jus...@quarantainenet.nl Kirk Allan kal...@suse.com Laszlo Ersek ler...@redhat.com Laurent Vivier laur...@vivier.eu Laurent Vivier lviv...@redhat.com Li Zhijian lizhij...@cn.fujitsu.com Marc-André Lureau marcandre.lur...@redhat.com Markus Armbruster arm...@redhat.com Max Filippov jcmvb...@gmail.com Michael Roth mdr...@linux.vnet.ibm.com Michael S. Tsirkin m...@redhat.com Nikunj A Dadhania nik...@linux.vnet.ibm.com Olga Krishtal okrish...@virtuozzo.com Paolo Bonzini pbonz...@redhat.com Paul Durrant paul.durr...@citrix.com Paulo Alcantara pca...@gmail.com Paulo Alcantara pca...@zytor.com Peter Crosthwaite crosthwaite.pe...@gmail.com Peter
Re: [Xen-devel] [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:13 To: Paul Durrant Cc: xen-devel@lists.xen.org; Keir (Xen.org) Subject: Re: [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: +static int hvmemul_linear_mmio_access( +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer, +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t known_gpfn) +{ +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io; +unsigned long offset = gla ~PAGE_MASK; +unsigned int chunk; +paddr_t gpa; +unsigned long one_rep = 1; +int rc; + +chunk = min_t(unsigned int, size, PAGE_SIZE - offset); + +if ( known_gpfn ) +gpa = pfn_to_paddr(vio-mmio_gpfn) | offset; +else +{ +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec, +hvmemul_ctxt); +if ( rc != X86EMUL_OKAY ) +return rc; +} + +for ( ;; ) +{ +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer); +if ( rc != X86EMUL_OKAY ) +break; + +gla += chunk; +buffer += chunk; +size -= chunk; + +if ( size == 0 ) +break; + +ASSERT((gla ~PAGE_MASK) == 0); While I don't mean you to re-submit another time, I'd still like to get my question answered: Does this really matter for the code below? No, it doesn't, but if it's not true then an incorrect chunk size was chosen above. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops
On 09.07.15 at 18:16, paul.durr...@citrix.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:13 To: Paul Durrant Cc: xen-devel@lists.xen.org; Keir (Xen.org) Subject: Re: [PATCH v7 01/15] x86/hvm: remove multiple open coded 'chunking' loops On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: +static int hvmemul_linear_mmio_access( +unsigned long gla, unsigned int size, uint8_t dir, uint8_t *buffer, +uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t known_gpfn) +{ +struct hvm_vcpu_io *vio = current-arch.hvm_vcpu.hvm_io; +unsigned long offset = gla ~PAGE_MASK; +unsigned int chunk; +paddr_t gpa; +unsigned long one_rep = 1; +int rc; + +chunk = min_t(unsigned int, size, PAGE_SIZE - offset); + +if ( known_gpfn ) +gpa = pfn_to_paddr(vio-mmio_gpfn) | offset; +else +{ +rc = hvmemul_linear_to_phys(gla, gpa, chunk, one_rep, pfec, +hvmemul_ctxt); +if ( rc != X86EMUL_OKAY ) +return rc; +} + +for ( ;; ) +{ +rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer); +if ( rc != X86EMUL_OKAY ) +break; + +gla += chunk; +buffer += chunk; +size -= chunk; + +if ( size == 0 ) +break; + +ASSERT((gla ~PAGE_MASK) == 0); While I don't mean you to re-submit another time, I'd still like to get my question answered: Does this really matter for the code below? No, it doesn't, but if it's not true then an incorrect chunk size was chosen above. I suspected as much. It's then just slightly less odd than having x = 0; ASSERT(x == 0); I guess I'll strip the ASSERT() when applying. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 17:22 To: Paul Durrant Cc: xen-devel@lists.xen.org; Keir (Xen.org) Subject: RE: [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with standard mmio intercept On 09.07.15 at 18:12, paul.durr...@citrix.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:33 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: @@ -424,8 +426,22 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val) } } -static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size) +static int stdvga_mem_write(const struct hvm_io_handler *handler, +uint64_t addr, uint32_t size, +uint64_t data) { +struct hvm_hw_stdvga *s = current-domain- arch.hvm_domain.stdvga; +ioreq_t p = { .type = IOREQ_TYPE_COPY, + .addr = addr, + .size = size, + .count = 1, + .dir = IOREQ_WRITE, + .data = data, +}; Indentation (still - I know I pointed this out on v6, just perhaps at another example). See e.g. the context of the earlier change to the beginning of hvm_mmio_internal() in this patch for how this should look like. It has to be something my emacs is doing then; I can't see any brokenness. Indentation simply is too deep. See the good example I pointed you at. Yes, I'll look... and maybe see what it looks like with vi too. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int
On 09.07.15 at 18:23, paul.durr...@citrix.com wrote: I would have thought that the compiler would point out the overflow if a caller tried to pass 64k in the port number, That would at best happen when passing literal numbers. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.
I think you might assume that the first M VCPUs in the runq are the current running VCPUs on the M pCPUs. Am I correct? (From what you described in the following example, I think I'm correct. ;-) ) This is an assumption he gathered from my implementation, because I wrote code that uses this fact. Right now the code is wrong since current vcpus are not kept in the queue. I tell that you make the above assumption from here. However, in the current implementation, runq does not hold the current running VCPUs on the pCPUs. We remove the vcpu from runq in rt_schedule() function. What you described above make perfect sense if we decide to make runq hold the current running VCPUs. Actually, after thinking about the example you described, I think we can hold the current running VCPUs *and* the current idle pCPUs in the scheduler-wide structure; In other words, we can have another runningq (not runq) and a idle_pcpu list in the rt_private; Now all VCPUs are stored in three queues: runningq, runq, and depletedq, in increasing priority order. When we make the tickle decision, we only need to scan the idle_pcpu and then runningq to figure out which pCPU to tickle. All of other design you describe still hold here, except that the position where a VCPU is inserted into runq cannot directly give us which pCPU to tickle. What do you think? This is an good idea too. I think having the running vcpus simply at the beginning of the runq could work, too. In case (b) there may be idle pCPUs (and, if that's the case, we should tickle one of them, of course) or not. If not, we need to go figure out which pCPU to tickle, which is exactly what runq_tickle() does, but we at least know for sure that we want to tickle the pCPU where vCPU k runs, or others where vCPUs with deadline greater than vCPU k run. Does this make sense? Yes, if we decide to hold the currently running VCPUs in scheduler-wide structure: it can be runq or runningq. Right. I think for now I'll just keep them in runq to keep most of the selection logic the same. Thank you again! It is very clear and I'm clear which part is unclear now. :-D Dagaen, Meng, any question? I really think that, if we manage to implement all this, code quality and performance would improve a lot. Oh, considering all the various and (logically) different changes that I've hinted at, the patch needs to become a patch series! :-D Sure! Dagaen, what do you think? Yes, this may become a series with several changes like this. For now I am going to get it working with the running vcpus in the runq. I thought returning the inserted index was a good way of checking if we need to tickle, and moving the runnings vpus to runq will make the scheduler almost done as far as functional correctness goes. Various other features hinted at would be a series on top of this. Regards, ~Dagaen Golomb ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 15/15] x86/hvm: track large memory mapped accesses by buffer offset
On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: @@ -635,13 +605,49 @@ static int hvmemul_phys_mmio_access( return rc; } +/* + * Multi-cycle MMIO handling is based upon the assumption that emulation + * of the same instruction will not access the same MMIO region more + * than once. Hence we can deal with re-emulation (for secondary or + * subsequent cycles) by looking up the result or previous I/O in a + * cache indexed by linear MMIO address. + */ +static struct hvm_mmio_cache *hvmemul_find_mmio_cache( +struct hvm_vcpu_io *vio, unsigned long gla, uint8_t dir) +{ +unsigned int i; +struct hvm_mmio_cache *cache; + +for ( i = 0; i vio-mmio_cache_count; i ++ ) +{ +cache = vio-mmio_cache[i]; + +if ( gla == cache-gla + dir == cache-dir ) +return cache; +} + +i = vio-mmio_cache_count++; +if( i == ARRAY_SIZE(vio-mmio_cache) ) +domain_crash(current-domain); But you mustn't continue here, or at least force i into range so you don't corrupt other data. And while doing that please also add the missing space on the if() line. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/4] x86/compat: Test whether guest has 32b shinfo instead of being a PV 32b domain
On 09.07.15 at 18:05, boris.ostrov...@oracle.com wrote: On 07/09/2015 10:30 AM, Boris Ostrovsky wrote: On 07/09/2015 10:17 AM, Jan Beulich wrote: On 09.07.15 at 16:10, boris.ostrov...@oracle.com wrote: On 07/09/2015 03:02 AM, Jan Beulich wrote: On 08.07.15 at 22:57, boris.ostrov...@oracle.com wrote: As I started to update the patches I realized that in some cases (especially in arch_do_domctl():XEN_DOMCTL_get_address_size) we don't have VCPU (which is what hvm_guest_x86_mode() wants) but rather only the domain. d-vcpu[0] should work. Otherwise I'll either need a new field in struct domain or wrap has_32bit_shinfo into something PVH-specific, like is_32bit_pvh_vcpu(). Shouldn't XEN_DOMCTL_get_address_size be handled HVM-like for PVH, especially if you also intend the tools to use the 64-bit guest context variant even for 32-bit PVH? Once again - are you intending to prohibit 32-bit PVH switching to 64-bit mode (which would seem both wrong and possibly cumbersome to me)? With current PVH implementation I don't think we can switch. We are starting the guest in very much PV-like fashion. That's why we are getting into switch_compat() --- via XEN_DOMCTL_set_address_size. For XEN_DOMCTL_get_address_size specifically we need to be able to figure out the mode *before* the guest is running because we use it to set cpuid bits in xc_cpuid_pv_policy(). So just that means we can't change the mode. Okay - but is there code (being put) in place to refuse switch attempts? No, I should add code to deal with this. Forgot to include here --- so what is your preference wrt what I am asking in the first paragraph? d-vcpu[0], new field (or maybe a flag with bits per 32bit-pv and 32bit-pvh), or a PVH-wrapper for has_32bit_shinfo? To be honest, my preference would be to have none of those, and have the whole thing more HVM-like from the beginning. Considering that this isn't realistic, a PVH alias for has_32bit_shinfo() would seem least ugly to me. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 00/12] Alternate p2m: support multiple copies of host p2m
On Thu, Jul 09, 2015 at 04:13:20PM +, Sahita, Ravi wrote: From: Wei Liu [mailto:wei.l...@citrix.com] Sent: Thursday, July 09, 2015 4:49 AM Question for you and Ed, how much testing have you done to this series? I assume you've done testing with it turned on and off, to the point you get a functioning guest. Have you run any test suites and got positive results? Yes, we have tested with Windows hvm guests and got positive results, and Tamas' tool patches based tests have as well. [0]: e1z8rcu-0003v6...@ukmail1.uk.xensource.com This ref link was messed up That's the message-id of my email. Here is a link to the archive: http://www.gossamer-threads.com/lists/xen/devel/386386 It's just my 2 weeks' reminder email, which has information on how to ask for freeze exception. I believed you saw that already. Wei. Thanks, Ravi ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 07/15] x86/hvm: unify stdvga mmio intercept with standard mmio intercept
On 09.07.15 at 18:12, paul.durr...@citrix.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:33 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: @@ -424,8 +426,22 @@ static void stdvga_mem_writeb(uint64_t addr, uint32_t val) } } -static void stdvga_mem_write(uint64_t addr, uint64_t data, uint64_t size) +static int stdvga_mem_write(const struct hvm_io_handler *handler, +uint64_t addr, uint32_t size, +uint64_t data) { +struct hvm_hw_stdvga *s = current-domain-arch.hvm_domain.stdvga; +ioreq_t p = { .type = IOREQ_TYPE_COPY, + .addr = addr, + .size = size, + .count = 1, + .dir = IOREQ_WRITE, + .data = data, +}; Indentation (still - I know I pointed this out on v6, just perhaps at another example). See e.g. the context of the earlier change to the beginning of hvm_mmio_internal() in this patch for how this should look like. It has to be something my emacs is doing then; I can't see any brokenness. Indentation simply is too deep. See the good example I pointed you at. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable test] 59295: trouble: pass/preparing/queued/running
flight 59295 xen-unstable running [real] http://logs.test-lab.xenproject.org/osstest/logs/59295/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i386-rumpuserxen none executed queued build-amd64-rumpuserxen none executed queued test-amd64-amd64-xl-pvh-intelnone executed queued test-amd64-amd64-xl-pvh-amd none executed queued test-amd64-i386-libvirt-xsm none executed queued test-amd64-i386-libvirt none executed queued build-i386-libvirt none executed queued test-amd64-amd64-libvirtnone executed queued test-amd64-amd64-libvirt-xsmnone executed queued test-amd64-amd64-xl-credit2 none executed queued test-amd64-amd64-xl-xsm none executed queued build-amd64-libvirt none executed queued test-armhf-armhf-xl-rtdsnone executed queued test-amd64-amd64-xl none executed queued test-amd64-amd64-rumpuserxen-amd64none executed queued test-amd64-i386-xl-xsm none executed queued test-amd64-amd64-xl-multivcpunone executed queued test-amd64-i386-rumpuserxen-i386none executed queued test-amd64-i386-qemuu-rhel6hvm-intelnone executed queued test-amd64-i386-qemut-rhel6hvm-intelnone executed queued test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmnone executedqueued test-amd64-i386-qemut-rhel6hvm-amdnone executed queued test-armhf-armhf-xl-arndale none executed queued test-amd64-amd64-pair none executed queued test-armhf-armhf-xl-credit2 none executed queued test-amd64-i386-xl none executed queued test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm none executed queued test-amd64-i386-qemuu-rhel6hvm-amdnone executed queued test-amd64-amd64-xl-qemuu-debianhvm-amd64none executedqueued test-amd64-i386-xl-qemut-win7-amd64none executed queued test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsmnone executed queued test-amd64-amd64-xl-rtdsnone executed queued test-amd64-i386-pairnone executed queued test-amd64-i386-freebsd10-amd64none executed queued test-amd64-amd64-xl-qemut-win7-amd64none executed queued test-amd64-i386-freebsd10-i386none executed queued test-armhf-armhf-xl-cubietrucknone executed queued test-armhf-armhf-xl-xsm none executed queued test-armhf-armhf-xl none executed queued test-armhf-armhf-xl-multivcpunone executed queued test-armhf-armhf-libvirt-xsmnone executed queued test-amd64-amd64-xl-qemut-debianhvm-amd64none executedqueued test-armhf-armhf-libvirtnone executed queued test-amd64-i386-xl-qemuu-debianhvm-amd64-xsmnone executed queued test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmnone executedqueued test-amd64-i386-xl-qemut-debianhvm-amd64-xsmnone executed queued test-amd64-i386-xl-qemuu-debianhvm-amd64none executed queued test-amd64-i386-xl-qemut-debianhvm-amd64none executed queued test-amd64-amd64-xl-qemuu-ovmf-amd64none executed queued test-amd64-i386-xl-qemut-winxpsp3-vcpus1none executed queued test-amd64-i386-xl-qemuu-ovmf-amd64none executed queued test-amd64-i386-xl-qemuu-winxpsp3-vcpus1none executed queued test-amd64-amd64-xl-qemuu-win7-amd64none executed queued test-amd64-i386-xl-qemuu-win7-amd64none executed queued test-amd64-i386-xl-qemut-winxpsp3none executed queued test-amd64-amd64-xl-qemuu-winxpsp3none executed queued test-amd64-i386-xl-qemuu-winxpsp3none executed queued test-amd64-amd64-xl-qemut-winxpsp3none executed queued build-amd64-oldkern 2 hosts-allocate running build-i386-oldkern2 hosts-allocate running build-amd64-xsm 2 hosts-allocate running build-i386-xsm2 hosts-allocate running build-amd64 2 hosts-allocate running build-i3862 hosts-allocate running build-amd64-pvops 2 hosts-allocate running build-i386-pvops 2 hosts-allocate running build-armhf-libvirt 5 libvirt-buildrunning
[Xen-devel] [PATCH v8 13/13] x86/hvm: track large memory mapped accesses by buffer offset
The code in hvmemul_do_io() that tracks large reads or writes, to avoid re-issue of component I/O, is defeated by accesses across a page boundary because it uses physical address. The code is also only relevant to memory mapped I/O to or from a buffer. This patch re-factors the code and moves it into hvmemul_phys_mmio_access() where it is relevant and tracks using buffer offset rather than address. Separate I/O emulations (of which there may be up to three per instruction) are distinguished by linear address. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - Make sure emulation does not continue after domain_crash() v7: - Added comment requested by Jan - Changed BUG_ON() to domain_crash() v6: - Added Andrew's reviewed-by v5: - Fixed to cache up three distict I/O emulations per instruction --- xen/arch/x86/hvm/emulate.c | 136 ++-- xen/include/asm-x86/hvm/vcpu.h | 25 +--- 2 files changed, 92 insertions(+), 69 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 89b1616..01ee972 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -106,29 +106,6 @@ static int hvmemul_do_io( return X86EMUL_UNHANDLEABLE; } -if ( is_mmio !data_is_addr ) -{ -/* Part of a multi-cycle read or write? */ -if ( dir == IOREQ_WRITE ) -{ -paddr_t pa = vio-mmio_large_write_pa; -unsigned int bytes = vio-mmio_large_write_bytes; -if ( (addr = pa) ((addr + size) = (pa + bytes)) ) -return X86EMUL_OKAY; -} -else -{ -paddr_t pa = vio-mmio_large_read_pa; -unsigned int bytes = vio-mmio_large_read_bytes; -if ( (addr = pa) ((addr + size) = (pa + bytes)) ) -{ -memcpy(p_data, vio-mmio_large_read[addr - pa], - size); -return X86EMUL_OKAY; -} -} -} - switch ( vio-io_req.state ) { case STATE_IOREQ_NONE: @@ -208,33 +185,6 @@ static int hvmemul_do_io( memcpy(p_data, p.data, size); } -if ( is_mmio !data_is_addr ) -{ -/* Part of a multi-cycle read or write? */ -if ( dir == IOREQ_WRITE ) -{ -paddr_t pa = vio-mmio_large_write_pa; -unsigned int bytes = vio-mmio_large_write_bytes; -if ( bytes == 0 ) -pa = vio-mmio_large_write_pa = addr; -if ( addr == (pa + bytes) ) -vio-mmio_large_write_bytes += size; -} -else -{ -paddr_t pa = vio-mmio_large_read_pa; -unsigned int bytes = vio-mmio_large_read_bytes; -if ( bytes == 0 ) -pa = vio-mmio_large_read_pa = addr; -if ( (addr == (pa + bytes)) - ((bytes + size) = sizeof(vio-mmio_large_read)) ) -{ -memcpy(vio-mmio_large_read[bytes], p_data, size); -vio-mmio_large_read_bytes += size; -} -} -} - return X86EMUL_OKAY; } @@ -590,11 +540,12 @@ static int hvmemul_virtual_to_linear( } static int hvmemul_phys_mmio_access( -paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer) +struct hvm_mmio_cache *cache, paddr_t gpa, unsigned int size, uint8_t dir, +uint8_t *buffer, unsigned int offset) { unsigned long one_rep = 1; unsigned int chunk; -int rc; +int rc = X86EMUL_OKAY; /* Accesses must fall within a page. */ BUG_ON((gpa ~PAGE_MASK) + size PAGE_SIZE); @@ -611,14 +562,33 @@ static int hvmemul_phys_mmio_access( for ( ;; ) { -rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, -buffer); -if ( rc != X86EMUL_OKAY ) -break; +/* Have we already done this chunk? */ +if ( offset cache-size ) +{ +ASSERT((offset + chunk) = cache-size); + +if ( dir == IOREQ_READ ) +memcpy(buffer[offset], cache-buffer[offset], chunk); +else if ( memcmp(buffer[offset], cache-buffer[offset], chunk) != 0 ) +domain_crash(current-domain); +} +else +{ +ASSERT(offset == cache-size); + +rc = hvmemul_do_mmio_buffer(gpa, one_rep, chunk, dir, 0, +buffer[offset]); +if ( rc != X86EMUL_OKAY ) +break; + +/* Note that we have now done this chunk. */ +memcpy(cache-buffer[offset], buffer[offset], chunk); +cache-size += chunk; +} /* Advance to the next chunk. */ gpa += chunk; -buffer += chunk; +offset += chunk; size -= chunk; if (
Re: [Xen-devel] [PATCH v7 15/15] x86/hvm: track large memory mapped accesses by buffer offset
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:46 To: Paul Durrant Cc: xen-devel@lists.xen.org; Keir (Xen.org) Subject: Re: [PATCH v7 15/15] x86/hvm: track large memory mapped accesses by buffer offset On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: @@ -635,13 +605,49 @@ static int hvmemul_phys_mmio_access( return rc; } +/* + * Multi-cycle MMIO handling is based upon the assumption that emulation + * of the same instruction will not access the same MMIO region more + * than once. Hence we can deal with re-emulation (for secondary or + * subsequent cycles) by looking up the result or previous I/O in a + * cache indexed by linear MMIO address. + */ +static struct hvm_mmio_cache *hvmemul_find_mmio_cache( +struct hvm_vcpu_io *vio, unsigned long gla, uint8_t dir) +{ +unsigned int i; +struct hvm_mmio_cache *cache; + +for ( i = 0; i vio-mmio_cache_count; i ++ ) +{ +cache = vio-mmio_cache[i]; + +if ( gla == cache-gla + dir == cache-dir ) +return cache; +} + +i = vio-mmio_cache_count++; +if( i == ARRAY_SIZE(vio-mmio_cache) ) +domain_crash(current-domain); But you mustn't continue here, or at least force i into range so you don't corrupt other data. And while doing that please also add the missing space on the if() line. Sorry, I was forgetting domain_crash() is not immediate. I'll send v8. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 17:20 To: Paul Durrant Cc: xen-devel@lists.xen.org; Keir (Xen.org) Subject: RE: [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int On 09.07.15 at 18:10, paul.durr...@citrix.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:24 On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: Building on the previous patch, this patch restricts portio port numbers to uint16_t in registration/relocate calls. In portio_action_t the port number is change to unsigned int though to avoid the compiler generating 16-bit operations unnecessarily. The patch also changes I/O sizes to unsigned int which then allows the io_handler size field to reduce to an unsigned int. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v7: - Change port type in portio_action_t to unsigned int as requested by Jan Yet title and description were left in places, and ... The title remains. The description was modified: In portio_action_t the port number is change to unsigned int though to avoid the compiler generating 16-bit operations unnecessarily. Maybe I'm just being confused by the two and-s in the title, which I assumed can't both be meant to be and, or else you'd have written port numbers, uint16_t, and sizes. Plus it seems unclear what restrict a uint16_t to unsigned int actually means. @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p); int hvm_buffered_io_send(ioreq_t *p); static inline void register_portio_handler( -struct domain *d, unsigned long addr, -unsigned long size, portio_action_t action) +struct domain *d, uint16_t port, unsigned int size, +portio_action_t action) { -register_io_handler(d, addr, size, action, HVM_PORTIO); +register_io_handler(d, port, size, action, HVM_PORTIO); } static inline void relocate_portio_handler( -struct domain *d, unsigned long old_addr, unsigned long new_addr, -unsigned long size) +struct domain *d, uint16_t old_port, uint16_t new_port, +unsigned int size) { -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO); +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO); } ... these still use uint16_t. I'm pretty sure I gave my comment in a way indicating that this should generally change, perhaps just at the example of portio_action_t. Why? Do we not want to restrict to uint16_t at the interface level? Quite the inverse - why would we want to? This just makes the calling sequence less efficient (due to the needed operand size overrides), and hides mistakes in callers properly truncating when reading guest registers. I would have thought that the compiler would point out the overflow if a caller tried to pass 64k in the port number, but I'll get rid of the remaining uint16_ts. Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 05/13] x86/altp2m: basic data structures and support routines.
On Wed, Jul 1, 2015 at 7:09 PM, Ed White edmund.h.wh...@intel.com wrote: diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h index b4f035e..301ca59 100644 --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -217,7 +217,7 @@ declare_mm_lock(nestedp2m) #define nestedp2m_lock(d) mm_lock(nestedp2m, (d)-arch.nested_p2m_lock) #define nestedp2m_unlock(d) mm_unlock((d)-arch.nested_p2m_lock) -/* P2M lock (per-p2m-table) +/* P2M lock (per-non-alt-p2m-table) * * This protects all queries and updates to the p2m table. * Queries may be made under the read lock but all modifications @@ -225,10 +225,44 @@ declare_mm_lock(nestedp2m) * * The write lock is recursive as it is common for a code path to look * up a gfn and later mutate it. + * + * Note that this lock shares its implementation with the altp2m + * lock (not the altp2m list lock), so the implementation + * is found there. */ declare_mm_rwlock(p2m); -#define p2m_lock(p) mm_write_lock(p2m, (p)-lock); + +/* Alternate P2M list lock (per-domain) + * + * A per-domain lock that protects the list of alternate p2m's. + * Any operation that walks the list needs to acquire this lock. + * Additionally, before destroying an alternate p2m all VCPU's + * in the target domain must be paused. + */ + +declare_mm_lock(altp2mlist) +#define altp2m_lock(d) mm_lock(altp2mlist, (d)-arch.altp2m_lock) +#define altp2m_unlock(d) mm_unlock((d)-arch.altp2m_lock) This is unlocking an altp2mlist lock type; and the lock above is protecting an altp2m list, not an altp2m. Please use altp2m_list_lock, both for the lock itself and the macro that locks it. Also, even after reading Tim's comment and going through the whole series, I still don't have a clear idea in what circumstances the various locks (p2m, altp2mlist, and altp2m) will be acquired after one another, such that this sandwiched lock structure is required. Please include a comment brief description of what codepaths might cause different pairs of locks to be acquired, so that someone coming and looking at this code doesn't have to try to figure it out themselves. Everything else looks OK to me. Thanks. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: 09 July 2015 16:24 To: Paul Durrant Cc: xen-devel@lists.xen.org; Keir (Xen.org) Subject: Re: [PATCH v7 03/15] x86/hvm: restrict port numbers and uint16_t and sizes to unsigned int On 09.07.15 at 15:10, paul.durr...@citrix.com wrote: Building on the previous patch, this patch restricts portio port numbers to uint16_t in registration/relocate calls. In portio_action_t the port number is change to unsigned int though to avoid the compiler generating 16-bit operations unnecessarily. The patch also changes I/O sizes to unsigned int which then allows the io_handler size field to reduce to an unsigned int. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v7: - Change port type in portio_action_t to unsigned int as requested by Jan Yet title and description were left in places, and ... The title remains. The description was modified: In portio_action_t the port number is change to unsigned int though to avoid the compiler generating 16-bit operations unnecessarily. @@ -96,17 +96,17 @@ int hvm_mmio_intercept(ioreq_t *p); int hvm_buffered_io_send(ioreq_t *p); static inline void register_portio_handler( -struct domain *d, unsigned long addr, -unsigned long size, portio_action_t action) +struct domain *d, uint16_t port, unsigned int size, +portio_action_t action) { -register_io_handler(d, addr, size, action, HVM_PORTIO); +register_io_handler(d, port, size, action, HVM_PORTIO); } static inline void relocate_portio_handler( -struct domain *d, unsigned long old_addr, unsigned long new_addr, -unsigned long size) +struct domain *d, uint16_t old_port, uint16_t new_port, +unsigned int size) { -relocate_io_handler(d, old_addr, new_addr, size, HVM_PORTIO); +relocate_io_handler(d, old_port, new_port, size, HVM_PORTIO); } ... these still use uint16_t. I'm pretty sure I gave my comment in a way indicating that this should generally change, perhaps just at the example of portio_action_t. Why? Do we not want to restrict to uint16_t at the interface level? Paul Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] Modified RTDS scheduler to use an event-driven model instead of polling.
Can you summarize the design discussion with Dario on the first version of the patch? The above information is ok, but it does not say much about the design. As you can see in our discussion with Dario, there exist several design choices to achieve this. Explicitly state it here can help people understand what this patch is doing. Yes, that would be helpful. It's not required to summarize all the various ML thread in the actual changelog(s), but, really, a summary and maybe a few links and pointers in the cover letter, would be useful, to give interested people that may join late the chance to get some context. Note that, even for single patch and not only for series, it is ok to include a cover letter (i.e., a [PATCH 0/x]), e.g., right for this purpose. So, as I was saying, I like the architecture now. There are issues, or at least I have some questions about specific decisions or code chunks, but I think this is on the right track now. I'll strengthen the summary/cover letter on future patches. +struct timer replenishment_timer; +unsigned NUM_CPUS; First, it does not follow the convension of variable names. UPPER CASE should be the macro definition and a constant, which is not the case. You can use a more proper name, such as ncpus, here and add what it is used for. Agreed. Will change. cpumask_t tickled; /* cpus been tickled */ }; @@ -178,6 +181,8 @@ struct rt_vcpu { unsigned flags; /* mark __RTDS_scheduled, etc.. */ }; +static void runq_tickle(const struct scheduler *, struct rt_vcpu *); Not sure if declaring the function is the best approach or having another patch to move the function forward. But anyway, we don't have to worry about this yet right now, considering there are more important issues to tackle for this patch. Sure. Personally, I would like moving the function better (in a separate patch). Forward declarations make it (slightly, yes, but still...) more difficult to find (and switch, like in editors) between call sites and where the actual code is. I'd like to move it, but as mentioned that can be a separate patch. @@ -411,15 +422,77 @@ __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc) struct rt_vcpu * iter_svc = __q_elem(iter); if ( svc-cur_deadline = iter_svc-cur_deadline ) break; +else +++inserted_index; } list_add_tail(svc-q_elem, iter); +return inserted_index; } else { -list_add(svc-q_elem, prv-depletedq); +// If we are inserting into previously empty depletedq, rearm +// rearm replenish timer. It will handle further scheduling until +// the depletedq is empty again (if ever) +if( list_empty(depletedq) ) +set_timer( prv-replenishment_timer, svc-cur_deadline ); Hmm, can you handle the set_timer outside of this function? It should be possible and make this runq_insert() function serve for only one purpose. TBH, I think I like it being here. I mean, __runq_insert is called in many places (at least right now), and I wouldn't like to have the check and the arming repeated everywhere. Then, that being said, I'm still hoping that some of these calls could go away, and maybe there are call sites where we know that the timer won't be armed in any case... So, we'll have to see about this. So, just take this comment as I don't dislike the timer machinery to be in here. :-) I thought it was an acceptable way of accomplishing this. We can focus on trimming more later if this doesn't seem ideal. +// If we become one of top [# CPUs] in the runq, tickle it +// TODO: make this work when multiple tickles are required Why do you need multiple tickles? Because the loop above may have been done more than one replenishment, which makes sense. While we are here. I've said that I don't like the fact that we have one big spinlock for the whole scheduler. However, we do have it right now, so let's make good use of it. So (but please double check what I'm about to say, and provide your view), it should be safe to access the various svc-vc-processor from inside here, since we're holding the big log. If that's the case, then it should be fairly easy to figure out which are the pCPUs that needs to reschedule (playing with the index, etc), and then use their vc-processor to decide what pCPU to actually tickle, isn't this the case? (let me know if I've explained myself...) +// Replenishment timer, for now always on CPU 0 +init_timer(prv-replenishment_timer, replenishment_handler, ops, 0); Oh, yes, I wanted to discuss about this! It is true that in Xen, timers are bound to cpus (while, e.g., in Linux, but both the interface and the implementation are different). So, for know, you're doing the right thing in just putting
[Xen-devel] [PATCH v8 12/13] x86/hvm: always re-emulate I/O from a buffer
If memory mapped I/O is 'chunked' then the I/O must be re-emulated, otherwise only the first chunk will be processed. This patch makes sure all I/O from a buffer is re-emulated regardless of whether it is a read or a write. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Acked-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v7: - No change v6: - Added Andrew's reviewed-by v5: - Added Jan's acked-by --- xen/arch/x86/hvm/emulate.c |4 ++-- xen/include/asm-x86/hvm/vcpu.h |3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 1444b56..89b1616 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -147,7 +147,7 @@ static int hvmemul_do_io( (p.data_is_ptr != data_is_addr) ) domain_crash(curr-domain); -if ( data_is_addr || dir == IOREQ_WRITE ) +if ( data_is_addr ) return X86EMUL_UNHANDLEABLE; goto finish_access; default: @@ -187,7 +187,7 @@ static int hvmemul_do_io( rc = hvm_send_assist_req(s, p); if ( rc != X86EMUL_RETRY || curr-domain-is_shutting_down ) vio-io_req.state = STATE_IOREQ_NONE; -else if ( data_is_addr || dir == IOREQ_WRITE ) +else if ( data_is_addr ) rc = X86EMUL_OKAY; } break; diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h index 01cbfe5..13ff54f 100644 --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -81,8 +81,7 @@ struct hvm_vcpu_io { static inline bool_t hvm_vcpu_io_need_completion(const struct hvm_vcpu_io *vio) { return (vio-io_req.state == STATE_IOREQ_READY) - !vio-io_req.data_is_ptr - (vio-io_req.dir == IOREQ_READ); + !vio-io_req.data_is_ptr; } #define VMCX_EADDR(~0ULL) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 11/13] x86/hvm: use ioreq_t to track in-flight state
Use an ioreq_t rather than open coded state, size, dir and data fields in struct hvm_vcpu_io. This also allows PIO completion to be handled similarly to MMIO completion by re-issuing the handle_pio() call. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Cc: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - No change v7: - Modified struct field placement as knock-on from previous patch v6: - Added Andrew's reviewed-by v5: - Added missing hunk with call to handle_pio() --- xen/arch/x86/hvm/emulate.c | 35 +-- xen/arch/x86/hvm/hvm.c | 13 + xen/arch/x86/hvm/svm/nestedsvm.c |2 +- xen/arch/x86/hvm/vmx/realmode.c |4 ++-- xen/include/asm-x86/hvm/vcpu.h | 12 5 files changed, 33 insertions(+), 33 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 010e2fd..1444b56 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -91,6 +91,7 @@ static int hvmemul_do_io( .df = df, .data = data, .data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */ +.state = STATE_IOREQ_READY, }; void *p_data = (void *)data; int rc; @@ -128,12 +129,24 @@ static int hvmemul_do_io( } } -switch ( vio-io_state ) +switch ( vio-io_req.state ) { case STATE_IOREQ_NONE: break; case STATE_IORESP_READY: -vio-io_state = STATE_IOREQ_NONE; +vio-io_req.state = STATE_IOREQ_NONE; +p = vio-io_req; + +/* Verify the emulation request has been correctly re-issued */ +if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) || + (p.addr != addr) || + (p.size != size) || + (p.count != reps) || + (p.dir != dir) || + (p.df != df) || + (p.data_is_ptr != data_is_addr) ) +domain_crash(curr-domain); + if ( data_is_addr || dir == IOREQ_WRITE ) return X86EMUL_UNHANDLEABLE; goto finish_access; @@ -141,11 +154,6 @@ static int hvmemul_do_io( return X86EMUL_UNHANDLEABLE; } -vio-io_state = STATE_IOREQ_READY; -vio-io_size = size; -vio-io_dir = dir; -vio-io_data_is_addr = data_is_addr; - if ( dir == IOREQ_WRITE ) { if ( !data_is_addr ) @@ -154,13 +162,14 @@ static int hvmemul_do_io( hvmtrace_io_assist(p); } +vio-io_req = p; + rc = hvm_io_intercept(p); switch ( rc ) { case X86EMUL_OKAY: -vio-io_data = p.data; -vio-io_state = STATE_IOREQ_NONE; +vio-io_req.state = STATE_IOREQ_NONE; break; case X86EMUL_UNHANDLEABLE: { @@ -171,15 +180,13 @@ static int hvmemul_do_io( if ( !s ) { rc = hvm_process_io_intercept(null_handler, p); -if ( rc == X86EMUL_OKAY ) -vio-io_data = p.data; -vio-io_state = STATE_IOREQ_NONE; +vio-io_req.state = STATE_IOREQ_NONE; } else { rc = hvm_send_assist_req(s, p); if ( rc != X86EMUL_RETRY || curr-domain-is_shutting_down ) -vio-io_state = STATE_IOREQ_NONE; +vio-io_req.state = STATE_IOREQ_NONE; else if ( data_is_addr || dir == IOREQ_WRITE ) rc = X86EMUL_OKAY; } @@ -198,7 +205,7 @@ static int hvmemul_do_io( hvmtrace_io_assist(p); if ( !data_is_addr ) -memcpy(p_data, vio-io_data, size); +memcpy(p_data, p.data, size); } if ( is_mmio !data_is_addr ) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3be17f9..31ae4d4 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p) if ( hvm_vcpu_io_need_completion(vio) ) { -vio-io_state = STATE_IORESP_READY; -vio-io_data = p-data; +vio-io_req.state = STATE_IORESP_READY; +vio-io_req.data = p-data; } else -vio-io_state = STATE_IOREQ_NONE; +vio-io_req.state = STATE_IOREQ_NONE; msix_write_completion(curr); vcpu_end_shutdown_deferral(curr); @@ -501,11 +501,8 @@ void hvm_do_resume(struct vcpu *v) handle_mmio(); break; case HVMIO_pio_completion: -if ( vio-io_size == 4 ) /* Needs zero extension. */ -guest_cpu_user_regs()-rax = (uint32_t)vio-io_data; -else -memcpy(guest_cpu_user_regs()-rax, vio-io_data, vio-io_size); -vio-io_state = STATE_IOREQ_NONE; +(void)handle_pio(vio-io_req.addr, vio-io_req.size, + vio-io_req.dir); break; case HVMIO_realmode_completion: { diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index 2243873..2653bc1 100644 ---
[Xen-devel] [PATCH v8 10/13] x86/hvm: remove hvm_io_state enumeration
Emulation request status is already covered by STATE_IOREQ_XXX values so just use those. The mapping is: HVMIO_none- STATE_IOREQ_NONE HVMIO_awaiting_completion - STATE_IOREQ_READY HVMIO_completed - STATE_IORESP_READY Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Keir Fraser k...@xen.org Acked-by: Jan Beulich jbeul...@suse.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v8: - No change v7: - Modified struct field placement as knock-on from previous patch v6: - Added Andrew's reviewed-by v5: - Added Jan's acked-by --- xen/arch/x86/hvm/emulate.c | 14 +++--- xen/arch/x86/hvm/hvm.c |6 +++--- xen/arch/x86/hvm/svm/nestedsvm.c |2 +- xen/arch/x86/hvm/vmx/realmode.c |4 ++-- xen/include/asm-x86/hvm/vcpu.h | 10 ++ 5 files changed, 15 insertions(+), 21 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 933a605..010e2fd 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -130,10 +130,10 @@ static int hvmemul_do_io( switch ( vio-io_state ) { -case HVMIO_none: +case STATE_IOREQ_NONE: break; -case HVMIO_completed: -vio-io_state = HVMIO_none; +case STATE_IORESP_READY: +vio-io_state = STATE_IOREQ_NONE; if ( data_is_addr || dir == IOREQ_WRITE ) return X86EMUL_UNHANDLEABLE; goto finish_access; @@ -141,7 +141,7 @@ static int hvmemul_do_io( return X86EMUL_UNHANDLEABLE; } -vio-io_state = HVMIO_awaiting_completion; +vio-io_state = STATE_IOREQ_READY; vio-io_size = size; vio-io_dir = dir; vio-io_data_is_addr = data_is_addr; @@ -160,7 +160,7 @@ static int hvmemul_do_io( { case X86EMUL_OKAY: vio-io_data = p.data; -vio-io_state = HVMIO_none; +vio-io_state = STATE_IOREQ_NONE; break; case X86EMUL_UNHANDLEABLE: { @@ -173,13 +173,13 @@ static int hvmemul_do_io( rc = hvm_process_io_intercept(null_handler, p); if ( rc == X86EMUL_OKAY ) vio-io_data = p.data; -vio-io_state = HVMIO_none; +vio-io_state = STATE_IOREQ_NONE; } else { rc = hvm_send_assist_req(s, p); if ( rc != X86EMUL_RETRY || curr-domain-is_shutting_down ) -vio-io_state = HVMIO_none; +vio-io_state = STATE_IOREQ_NONE; else if ( data_is_addr || dir == IOREQ_WRITE ) rc = X86EMUL_OKAY; } diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 8e487d4..3be17f9 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -421,11 +421,11 @@ static void hvm_io_assist(ioreq_t *p) if ( hvm_vcpu_io_need_completion(vio) ) { -vio-io_state = HVMIO_completed; +vio-io_state = STATE_IORESP_READY; vio-io_data = p-data; } else -vio-io_state = HVMIO_none; +vio-io_state = STATE_IOREQ_NONE; msix_write_completion(curr); vcpu_end_shutdown_deferral(curr); @@ -505,7 +505,7 @@ void hvm_do_resume(struct vcpu *v) guest_cpu_user_regs()-rax = (uint32_t)vio-io_data; else memcpy(guest_cpu_user_regs()-rax, vio-io_data, vio-io_size); -vio-io_state = HVMIO_none; +vio-io_state = STATE_IOREQ_NONE; break; case HVMIO_realmode_completion: { diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index ad8afb4..2243873 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -1221,7 +1221,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v) * Delay the injection because this would result in delivering * an interrupt *within* the execution of an instruction. */ -if ( v-arch.hvm_vcpu.hvm_io.io_state != HVMIO_none ) +if ( v-arch.hvm_vcpu.hvm_io.io_state != STATE_IOREQ_NONE ) return hvm_intblk_shadow; if ( !nv-nv_vmexit_pending n2vmcb-exitintinfo.bytes != 0 ) { diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c index deb53ae..25533dc 100644 --- a/xen/arch/x86/hvm/vmx/realmode.c +++ b/xen/arch/x86/hvm/vmx/realmode.c @@ -205,7 +205,7 @@ void vmx_realmode(struct cpu_user_regs *regs) vmx_realmode_emulate_one(hvmemul_ctxt); -if ( vio-io_state != HVMIO_none || vio-mmio_retry ) +if ( vio-io_state != STATE_IOREQ_NONE || vio-mmio_retry ) break; /* Stop emulating unless our segment state is not safe */ @@ -219,7 +219,7 @@ void vmx_realmode(struct cpu_user_regs *regs) } /* Need to emulate next time if we've started an IO operation */ -if ( vio-io_state != HVMIO_none ) +if ( vio-io_state != STATE_IOREQ_NONE ) curr-arch.hvm_vmx.vmx_emulate = 1; if ( !curr-arch.hvm_vmx.vmx_emulate
Re: [Xen-devel] [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy
Tiejun Chen writes ([v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy): This patch introduces user configurable parameters to specify RDM resource and according policies, ... int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci) { +/* We'd like to force reserve rdm specific to a device by default.*/ +if ( pci-rdm_policy == LIBXL_RDM_RESERVE_POLICY_INVALID) ^ I have just spotted that spurious whitespace. However I won't block this for that. Acked-by: Ian Jackson ian.jack...@eu.citrix.com (actually). I would appreciate it if you could ensure that this is fixed in any repost. You may retain my ack if you do that. Committers should feel free to fix it on commit. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest
Tiejun Chen writes ([v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest): Here we'll construct a basic guest e820 table via XENMEM_set_memory_map. This table includes lowmem, highmem and RDMs if they exist, and hvmloader would need this info later. Note this guest e820 table would be same as before if the platform has no any RDM or we disable RDM (by default). ... tools/libxl/libxl_dom.c | 5 +++ tools/libxl/libxl_internal.h | 24 + tools/libxl/libxl_x86.c | 83 ... diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 62ef120..41da479 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, goto out; } +if (libxl__domain_construct_e820(gc, d_config, domid, args)) { +LOG(ERROR, setting domain memory map failed); +goto out; +} This is platform-independent code, isn't it ? In which case this will break the build on ARM, I think. Would an ARM maintainer please confirm. Aside from that I have no issues with this patch. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 4/4] iommu: add rmrr Xen command line option for extra rmrrs
On Wed, Jul 08, 2015 at 09:50:26PM -0400, elena.ufimts...@oracle.com wrote: From: Elena Ufimtseva elena.ufimts...@oracle.com On some platforms RMRR regions may be not specified in ACPI and thus will not be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Thanks! Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- docs/misc/xen-command-line.markdown | 13 +++ xen/drivers/passthrough/vtd/dmar.c | 209 +++- 2 files changed, 221 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index aa684c0..f307f3d 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1197,6 +1197,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr + '= start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRR units that are missing from ACPI table along with device they +belong to and use them for 1:1 mapping. End addresses can be omitted and one +page will be mapped. The ranges are inclusive when start and end are specified. +If segment of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If a segment is specified for other than the first device and it does not match +the one specified for the first one, an error will be reported. +Note: grub2 requires to escape or use quotations if special characters are used, +namely ';', refer to the grub2 documentation if multiple ranges are specified. + ### ro-hpet `= boolean` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index a8e1e5d..f62fb02 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -869,6 +869,145 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 + +/* RMRR units derived from command line rmrr option. */ +#define MAX_EXTRA_RMRR_DEV 20 +struct extra_rmrr_unit { +struct list_head list; +unsigned long base_pfn, end_pfn; +unsigned int dev_count; +u32sbdf[MAX_EXTRA_RMRR_DEV]; +}; +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR]; + +/* Macro for RMRR inclusive range formatting. */ +#define PRI_RMRR(s,e) [%lx-%lx] + +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *acpi_rmrr; +struct acpi_rmrr_unit *rmrru; +unsigned int dev, seg, i, j; +unsigned long pfn; +bool_t overlap; + +for ( i = 0; i nr_rmrr; i++ ) +{ +if ( extra_rmrr_units[i].base_pfn extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Invalid RMRR Range PRI_RMRR(s,e)\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn = + MAX_EXTRA_RMRR_PAGES ) +{ +printk(XENLOG_ERR VTDPREFIX + RMRR range PRI_RMRR(s,e) exceeds __stringify(MAX_EXTRA_RMRR_PAGES) pages\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +for ( j = 0; j nr_rmrr; j++ ) +{ +if ( i != j + extra_rmrr_units[i].base_pfn = extra_rmrr_units[j].end_pfn + extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn ) +{ +
Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
- wei.l...@citrix.com wrote: On Thu, Jul 09, 2015 at 05:00:45PM +0100, Jan Beulich wrote: On 09.07.15 at 17:53, elena.ufimts...@oracle.com wrote: - jbeul...@suse.com wrote: On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote: You are right, it needs to be rebased. I can post later rebased on memory leak fix version, if you thin its a way to go. I didn't look at v9 yet, and can't predict when I will be able to. Would you like me to post v10 with memory leak patch included in the patchset before you start looking at v9? If there is a dependency on the changes in the leak fix v6, then this would be a good idea. If not, you can keep things as they are now. I view the entire set more as a bug fix than a feature anyway, and hence see no reason not to get this in after the freeze. But I'm adding Wei just in case... Thanks Jan. The dependency exists on memory leak patch, so I will add it to this series and squash the first patch from v9. I just looked at v9. The first three patches are quite mechanical. The fourth patch is relatively bigger but it's also quite straightforward (mostly parsing input). All in all, this series itself is self-contained. I'm don't think OSSTest is able to test that, so it would not cause visible regression on our side. I also agree it's a bug fix. Preferably this series should be applied before first RC. Wei. Thank you Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
Tiejun Chen writes ([v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): While building a VM, HVM domain builder provides struct hvm_info_table{} to help hvmloader. Currently it includes two fields to construct guest e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should check them to fix any conflict with RDM. ... CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevint.t...@intel.com I have found a few things in this patch which I would like to see improved. See below. Given how late I am with this review, I do not feel that I should be nacking it at this time. You have a tools ack from Wei, so my comments are not a blocker for this series. But if you need to respin, please take these comments into account, and consider which are feasible to fix in the time available. If you are respinning this series targeting Xen 4.7 or later, please address all of the points I make below. Thanks. +int libxl__domain_device_construct_rdm(libxl__gc *gc, + libxl_domain_config *d_config, + uint64_t rdm_mem_boundary, + struct xc_hvm_build_args *args) ... +uint64_t highmem_end = args-highmem_end ? args-highmem_end : (1ull\ 32); ... +if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE !d_config-num_\ pcidevs) There are quite a few of these long lines, which should be wrapped. See tools/libxl/CODING_STYLE. +d_config-num_rdms = nr_entries; +d_config-rdms = libxl__realloc(NOGC, d_config-rdms, +d_config-num_rdms * sizeof(libxl_device_rdm)); This code is remarkably similar to a function later on which adds an rdm. Please can you factor it out. +} else +d_config-num_rdms = 0; Please can you put { } around the else block too. I don't think this mixed style is good. +for (j = 0; j d_config-num_rdms; j++) { +if (d_config-rdms[j].start == + (uint64_t)xrdm[0].start_pfn XC_PAGE_SHIFT) This construct (uint64_t)some_pfn XC_PAGE_SHIFT appears an awful lot. I would prefer it if it were done in an inline function (or maybe a macro). +libxl_domain_build_info *const info = d_config-b_info; +/* + * Currently we fix this as 2G to guarantte how to handle ^ Should read guarantee. +ret = libxl__domain_device_construct_rdm(gc, d_config, + rdm_mem_boundary, + args); +if (ret) { +LOG(ERROR, checking reserved device memory failed); +goto out; +} `rc' should be used here rather than `ret'. (It is unfortunate that this function has poor style already, but it would be best not to make it worse.) Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 07/27] tools/libxl: Extra management APIs for the save helper
With migration v2, there are several moving parts needing to be juggled at once. This requires the error handling logic to be able to query the state of each moving part, possibly before they have been started, and be able to cancel them. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- Since v1: * Add an _init() function which allows _inuse() to be safe to call even before the save helper has started. --- tools/libxl/libxl_internal.h |9 + tools/libxl/libxl_save_callout.c | 17 ++--- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 588cfb8..4bd6ea1 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3272,6 +3272,15 @@ _hidden void libxl__xc_domain_restore(libxl__egc *egc, _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, int rc, int retval, int errnoval); +_hidden void libxl__save_helper_init(libxl__save_helper_state *shs); +_hidden void libxl__save_helper_abort(libxl__egc *egc, + libxl__save_helper_state *shs); + +static inline bool libxl__save_helper_inuse(const libxl__save_helper_state *shs) +{ +return libxl__ev_child_inuse(shs-child); +} + /* Each time the dm needs to be saved, we must call suspend and then save */ _hidden int libxl__domain_suspend_device_model(libxl__gc *gc, libxl__domain_suspend_state *dss); diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 1136b79..cd18cd2 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -146,6 +146,13 @@ void libxl__xc_domain_saverestore_async_callback_done(libxl__egc *egc, shs-egc = 0; } +void libxl__save_helper_init(libxl__save_helper_state *shs) +{ +libxl__ao_abortable_init(shs-abrt); +libxl__ev_fd_init(shs-readable); +libxl__ev_child_init(shs-child); +} + /*- helper execution -*/ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, @@ -167,9 +174,7 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, shs-rc = 0; shs-completed = 0; shs-pipes[0] = shs-pipes[1] = 0; -libxl__ao_abortable_init(shs-abrt); -libxl__ev_fd_init(shs-readable); -libxl__ev_child_init(shs-child); +libxl__save_helper_init(shs); shs-abrt.ao = shs-ao; shs-abrt.callback = helper_stop; @@ -279,6 +284,12 @@ static void helper_stop(libxl__egc *egc, libxl__ao_abortable *abrt, int rc) libxl__kill(gc, shs-child.pid, SIGTERM, save/restore helper); } +void libxl__save_helper_abort(libxl__egc *egc, + libxl__save_helper_state *shs) +{ +helper_stop(egc, shs-abrt, ERROR_FAIL); +} + static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev, int fd, short events, short revents) { -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 04/27] tools/libxl: Introduce libxl__kill()
as a wrapper to kill(2), and use it in preference to sendig in libxl_save_callout.c. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- Logically new in v2 - split out from a v1 change which was itself a cherrypick-and-modify from the AO Abort series --- tools/libxl/libxl_aoutils.c | 15 +++ tools/libxl/libxl_internal.h |2 ++ tools/libxl/libxl_save_callout.c | 10 ++ 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c index 0931eee..274ef39 100644 --- a/tools/libxl/libxl_aoutils.c +++ b/tools/libxl/libxl_aoutils.c @@ -621,3 +621,18 @@ bool libxl__async_exec_inuse(const libxl__async_exec_state *aes) assert(time_inuse == child_inuse); return child_inuse; } + +void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what) +{ +int r = kill(pid, sig); +if (r) LOGE(WARN, failed to kill() %s [%lu] (signal %d), +what, (unsigned long)pid, sig); +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 19fc425..9147de1 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2244,6 +2244,8 @@ struct libxl__async_exec_state { int libxl__async_exec_start(libxl__async_exec_state *aes); bool libxl__async_exec_inuse(const libxl__async_exec_state *aes); +void libxl__kill(libxl__gc *gc, pid_t pid, int sig, const char *what); + /*- device addition/removal -*/ typedef struct libxl__ao_device libxl__ao_device; diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 087c2d5..b82a5c1 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -244,12 +244,6 @@ static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs, libxl__carefd_close(childs_pipes[1]); helper_failed(egc, shs, rc);; } -static void sendsig(libxl__gc *gc, libxl__save_helper_state *shs, int sig) -{ -int r = kill(shs-child.pid, sig); -if (r) LOGE(WARN, failed to kill save/restore helper [%lu] (signal %d), -(unsigned long)shs-child.pid, sig); -} static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs, int rc) @@ -266,7 +260,7 @@ static void helper_failed(libxl__egc *egc, libxl__save_helper_state *shs, return; } -sendsig(gc, shs, SIGKILL); +libxl__kill(gc, shs-child.pid, SIGKILL, save/restore helper); } static void helper_stop(libxl__egc *egc, libxl__ao_abortable *abrt, int rc) @@ -282,7 +276,7 @@ static void helper_stop(libxl__egc *egc, libxl__ao_abortable *abrt, int rc) if (!shs-rc) shs-rc = rc; -sendsig(gc, shs, SIGTERM); +libxl__kill(gc, shs-child.pid, SIGTERM, save/restore helper); } static void helper_stdout_readable(libxl__egc *egc, libxl__ev_fd *ev, -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 05/27] tools/libxl: Stash all restore parameters in domain_create_state
Shortly more parameters will appear, and this saves unboxing each one. libxl_domain_restore_params is mandatory for restore streams, and ignored for plain creation. The old 'checkpointed_stream' was incorrectly identified as a private parameter when it was infact public. No functional change. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Yang Hongyang yan...@cn.fujitsu.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- Since v1: * Gate validity on restore_fd being valid. --- tools/libxl/libxl_create.c | 13 +++-- tools/libxl/libxl_internal.h |2 +- tools/libxl/libxl_save_callout.c |2 +- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index b785ddd..61515da 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1512,8 +1512,8 @@ static void domain_create_cb(libxl__egc *egc, int rc, uint32_t domid); static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, -uint32_t *domid, -int restore_fd, int checkpointed_stream, +uint32_t *domid, int restore_fd, +const libxl_domain_restore_params *params, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) { @@ -1526,8 +1526,9 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, libxl_domain_config_init(cdcs-dcs.guest_config_saved); libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config); cdcs-dcs.restore_fd = restore_fd; +if (restore_fd -1) +cdcs-dcs.restore_params = *params; cdcs-dcs.callback = domain_create_cb; -cdcs-dcs.checkpointed_stream = checkpointed_stream; libxl__ao_progress_gethow(cdcs-dcs.aop_console_how, aop_console_how); cdcs-domid_out = domid; @@ -1553,7 +1554,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) { -return do_domain_create(ctx, d_config, domid, -1, 0, +return do_domain_create(ctx, d_config, domid, -1, NULL, ao_how, aop_console_how); } @@ -1563,8 +1564,8 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) { -return do_domain_create(ctx, d_config, domid, restore_fd, -params-checkpointed_stream, ao_how, aop_console_how); +return do_domain_create(ctx, d_config, domid, restore_fd, params, +ao_how, aop_console_how); } /* diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 9147de1..5ab945a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3217,11 +3217,11 @@ struct libxl__domain_create_state { libxl_domain_config *guest_config; libxl_domain_config guest_config_saved; /* vanilla config */ int restore_fd; +libxl_domain_restore_params restore_params; libxl__domain_create_cb *callback; libxl_asyncprogress_how aop_console_how; /* private to domain_create */ int guest_domid; -int checkpointed_stream; libxl__domain_build_state build_state; libxl__bootloader_state bl; libxl__stub_dm_spawn_state dmss; diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index b82a5c1..80aae1b 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -60,7 +60,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs, state-store_domid, state-console_port, state-console_domid, hvm, pae, superpages, -cbflags, dcs-checkpointed_stream, +cbflags, dcs-restore_params.checkpointed_stream, }; dcs-shs.ao = ao; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 03/27] tools/libxl: Introduce ROUNDUP()
This is the same as is used by libxc. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl_internal.h |3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 5235d25..19fc425 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -110,6 +110,9 @@ #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0])) +#define ROUNDUP(_val, _order) \ +(((unsigned long)(_val)+(1UL(_order))-1) ~((1UL(_order))-1)) + #define min(X, Y) ({ \ const typeof (X) _x = (X); \ const typeof (Y) _y = (Y); \ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 09/27] docs: Libxl migration v2 stream specification
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- docs/specs/libxl-migration-stream.pandoc | 205 ++ 1 file changed, 205 insertions(+) create mode 100644 docs/specs/libxl-migration-stream.pandoc diff --git a/docs/specs/libxl-migration-stream.pandoc b/docs/specs/libxl-migration-stream.pandoc new file mode 100644 index 000..7235317 --- /dev/null +++ b/docs/specs/libxl-migration-stream.pandoc @@ -0,0 +1,205 @@ +% LibXenLight Domain Image Format +% Andrew Cooper andrew.coop...@citrix.com +% Draft B + +Introduction + + +For the purposes of this document, `xl` is used as a representation of any +implementer of the `libxl` API. `xl` should be considered completely +interchangeable with alternates, such as `libvirt` or `xenopsd-xl`. + +Purpose +--- + +The _domain image format_ is the context of a running domain used for +snapshots of a domain or for transferring domains between hosts during +migration. + +There are a number of problems with the domain image format used in Xen 4.5 +and earlier (the _legacy format_) + +* There is no `libxl` context information. `xl` is required to send certain + pieces of `libxl` context itself. + +* The contents of the stream is passed directly through `libxl` to `libxc`. + The legacy `libxc` format contained some information which belonged at the + `libxl` level, resulting in awkward layer violation to return the + information back to `libxl`. + +* The legacy `libxc` format was inextensible, causing inextensibility in the + legacy `libxl` handling. + +This design addresses the above points, allowing for a completely +self-contained, extensible stream with each layer responsibile for its own +appropriate information. + + +Not Yet Included + + +The following features are not yet fully specified and will be +included in a future draft. + +* Remus + +* ARM + + +Overview + + +The image format consists of a _Header_, followed by 1 or more _Records_. +Each record consists of a type and length field, followed by any type-specific +data. + +\clearpage + +Header +== + +The header identifies the stream as a `libxl` stream, including the version of +this specification that it complies with. + +All fields in this header shall be in _big-endian_ byte order, regardless of +the setting of the endianness bit. + + 0 1 2 3 4 5 6 7 octet ++-+ +| ident | ++---+-+ +| version | options | ++---+-+ + + +Field Description +--- +ident 0x4c6962786c466d74 (LibxlFmt in ASCII). + +version 0x0002. The version of this specification. + +options bit 0: Endianness.0 = little-endian, 1 = big-endian. + +bit 1: Legacy Format. If set, this stream was created by + the legacy conversion tool. + +bits 2-31: Reserved. + + +The endianness shall be 0 (little-endian) for images generated on an +i386, x86_64, or arm host. + +\clearpage + + +Records +=== + +A record has a record header, type specific data and a trailing footer. If +`length` is not a multiple of 8, the body is padded with zeroes to align the +end of the record on an 8 octet boundary. + + 0 1 2 3 4 5 6 7 octet ++---+-+ +| type | body_length | ++---+---+-+ +| body... | +... +| | padding (0 to 7 octets) | ++---+-+ + + +FieldDescription +--- --- +type 0x: END + + 0x0001: LIBXC_CONTEXT + + 0x0002: XENSTORE_DATA + + 0x0003: EMULATOR_CONTEXT + + 0x0004 - 0x7FFF: Reserved for future _mandatory_ + records. + + 0x8000 - 0x: Reserved for future _optional_ + records. + +body_length Length in octets of the record body. + +body Content of the record. + +padding 0 to 7 octets of zeros to pad the whole record to a multiple + of 8 octets. + + +\clearpage + +END + + +A end record marks the end of
Re: [Xen-devel] [RFC Patch V1 06/12] PCI: Use for_pci_msi_entry() to access MSI device list
On Thu, Jul 09, 2015 at 04:00:41PM +0800, Jiang Liu wrote: Use accessor for_pci_msi_entry() to access MSI device list, so we could easily move msi_list from struct pci_dev into struct device later. Signed-off-by: Jiang Liu jiang@linux.intel.com --- drivers/pci/msi.c | 39 --- drivers/pci/xen-pcifront.c |2 +- Acked-by on the Xen bits. 2 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7b4c20c9f9ca..d09afa78d7a1 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -131,7 +131,7 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) if (type == PCI_CAP_ID_MSI nvec 1) return 1; - list_for_each_entry(entry, dev-msi_list, list) { + for_each_pci_msi_entry(entry, dev) { ret = arch_setup_msi_irq(dev, entry); if (ret 0) return ret; @@ -151,7 +151,7 @@ void default_teardown_msi_irqs(struct pci_dev *dev) int i; struct msi_desc *entry; - list_for_each_entry(entry, dev-msi_list, list) + for_each_pci_msi_entry(entry, dev) if (entry-irq) for (i = 0; i entry-nvec_used; i++) arch_teardown_msi_irq(entry-irq + i); @@ -168,7 +168,7 @@ static void default_restore_msi_irq(struct pci_dev *dev, int irq) entry = NULL; if (dev-msix_enabled) { - list_for_each_entry(entry, dev-msi_list, list) { + for_each_pci_msi_entry(entry, dev) { if (irq == entry-irq) break; } @@ -282,7 +282,7 @@ void default_restore_msi_irqs(struct pci_dev *dev) { struct msi_desc *entry; - list_for_each_entry(entry, dev-msi_list, list) + for_each_pci_msi_entry(entry, dev) default_restore_msi_irq(dev, entry-irq); } @@ -363,21 +363,22 @@ EXPORT_SYMBOL_GPL(pci_write_msi_msg); static void free_msi_irqs(struct pci_dev *dev) { + struct list_head *msi_list = dev_to_msi_list(dev-dev); struct msi_desc *entry, *tmp; struct attribute **msi_attrs; struct device_attribute *dev_attr; int i, count = 0; - list_for_each_entry(entry, dev-msi_list, list) + for_each_pci_msi_entry(entry, dev) if (entry-irq) for (i = 0; i entry-nvec_used; i++) BUG_ON(irq_has_action(entry-irq + i)); pci_msi_teardown_msi_irqs(dev); - list_for_each_entry_safe(entry, tmp, dev-msi_list, list) { + list_for_each_entry_safe(entry, tmp, msi_list, list) { if (entry-msi_attrib.is_msix) { - if (list_is_last(entry-list, dev-msi_list)) + if (list_is_last(entry-list, msi_list)) iounmap(entry-mask_base); } @@ -448,7 +449,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev) if (!dev-msix_enabled) return; - BUG_ON(list_empty(dev-msi_list)); + BUG_ON(list_empty(dev_to_msi_list(dev-dev))); /* route the table */ pci_intx_for_msi(dev, 0); @@ -456,7 +457,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev) PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); arch_restore_msi_irqs(dev); - list_for_each_entry(entry, dev-msi_list, list) + for_each_pci_msi_entry(entry, dev) msix_mask_irq(entry, entry-masked); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); @@ -501,7 +502,7 @@ static int populate_msi_sysfs(struct pci_dev *pdev) int count = 0; /* Determine how many msi entries we have */ - list_for_each_entry(entry, pdev-msi_list, list) + for_each_pci_msi_entry(entry, pdev) ++num_msi; if (!num_msi) return 0; @@ -510,7 +511,7 @@ static int populate_msi_sysfs(struct pci_dev *pdev) msi_attrs = kzalloc(sizeof(void *) * (num_msi + 1), GFP_KERNEL); if (!msi_attrs) return -ENOMEM; - list_for_each_entry(entry, pdev-msi_list, list) { + for_each_pci_msi_entry(entry, pdev) { msi_dev_attr = kzalloc(sizeof(*msi_dev_attr), GFP_KERNEL); if (!msi_dev_attr) goto error_attrs; @@ -599,7 +600,7 @@ static int msi_verify_entries(struct pci_dev *dev) { struct msi_desc *entry; - list_for_each_entry(entry, dev-msi_list, list) { + for_each_pci_msi_entry(entry, dev) { if (!dev-no_64bit_msi || !entry-msg.address_hi) continue; dev_err(dev-dev, Device has broken 64-bit MSI but arch @@ -636,7 +637,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec) mask = msi_mask(entry-msi_attrib.multi_cap);
[Xen-devel] [PATCH 7/9] libxl: event tests: Provide miniature poll machinery
This allows a test case to have a poll-based event loop with exact control of the sequence of operations. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com --- tools/libxl/test_common.c | 44 +++- tools/libxl/test_common.h | 15 +++ 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/tools/libxl/test_common.c b/tools/libxl/test_common.c index 83b94eb..c6bbbab 100644 --- a/tools/libxl/test_common.c +++ b/tools/libxl/test_common.c @@ -12,4 +12,46 @@ void test_common_setup(int level) int rc = libxl_ctx_alloc(ctx, LIBXL_VERSION, 0, logger); assert(!rc); -} +} + +struct timeval now; + +void test_common_get_now(void) +{ +int r = gettimeofday(now, 0); assert(!r); +} + +int poll_nfds, poll_nfds_allocd; +struct pollfd *poll_fds; +int poll_timeout; + +void test_common_beforepoll(void) +{ +for (;;) { +test_common_get_now(); + +poll_timeout = -1; +poll_nfds = poll_nfds_allocd; +int rc = libxl_osevent_beforepoll(ctx, poll_nfds, poll_fds, + poll_timeout, now); +if (!rc) return; +assert(rc == ERROR_BUFFERFULL); + +assert(poll_nfds poll_nfds_allocd); +poll_fds = realloc(poll_fds, poll_nfds * sizeof(poll_fds[0])); +assert(poll_fds); +poll_nfds_allocd = poll_nfds; +} +} + +void test_common_dopoll(void) { +errno = 0; +int r = poll(poll_fds, poll_nfds, poll_timeout); +fprintf(stderr, poll: r=%d errno=%s\n, r, strerror(errno)); +} + +void test_common_afterpoll(void) +{ +test_common_get_now(); +libxl_osevent_afterpoll(ctx, poll_nfds, poll_fds, now); +} diff --git a/tools/libxl/test_common.h b/tools/libxl/test_common.h index 8b2471e..10c7166 100644 --- a/tools/libxl/test_common.h +++ b/tools/libxl/test_common.h @@ -6,9 +6,24 @@ #include assert.h #include stdlib.h #include unistd.h +#include fcntl.h +#include sys/stat.h +#include sys/types.h void test_common_setup(int level); extern libxl_ctx *ctx; +void test_common_get_now(void); + +extern struct timeval now; + +void test_common_beforepoll(void); +void test_common_dopoll(void); +void test_common_afterpoll(void); + +extern int poll_nfds, poll_nfds_allocd; +extern struct pollfd *poll_fds; +extern int poll_timeout; + #endif /*TEST_COMMON_H*/ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/9] libxl: poll: Make libxl__poller_get have only one success return path
In preparation for doing some more work on successful exit. No functional change. Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Jim Fehlig jfeh...@suse.com --- tools/libxl/libxl_event.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c index 9072df4..b332dd7 100644 --- a/tools/libxl/libxl_event.c +++ b/tools/libxl/libxl_event.c @@ -1627,15 +1627,14 @@ libxl__poller *libxl__poller_get(libxl__gc *gc) libxl__poller *p = LIBXL_LIST_FIRST(CTX-pollers_idle); if (p) { LIBXL_LIST_REMOVE(p, entry); -return p; -} - -p = libxl__zalloc(NOGC, sizeof(*p)); +} else { +p = libxl__zalloc(NOGC, sizeof(*p)); -rc = libxl__poller_init(gc, p); -if (rc) { -free(p); -return NULL; +rc = libxl__poller_init(gc, p); +if (rc) { +free(p); +return NULL; +} } return p; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/9] libxl: poll: Avoid fd deregistration race POLLNVAL
The first three patches in this series fix a race which I think is the cause of a bug reported by Jim Fehlig (thanks, Jim). They are IMO backport candidates. The next 5 provide a test case which reproduces the bug. I actually developed these patches in the opposite order, and have verified that the without the fix, the test case fails the POLLNVAL assertion as expected. The final patch is a fix to another libxl event test. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 17/27] tools/libxl: Support converting a legacy stream to a v2 stream
When a legacy stream is found, it needs to be converted to a v2 stream for the reading logic. This is done by exec()ing the python conversion utility. One complication is that the caller of this interface needs to assume ownership of the output fd, to prevent it being closed while still in use in a datacopier. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- No major differences since v1. It has gained a new ao_abrt as a result of rebasing over the AO ABORT series, and has been made x86-specific. --- tools/libxl/Makefile|1 + tools/libxl/libxl_convert_callout.c | 172 +++ tools/libxl/libxl_internal.h| 48 ++ 3 files changed, 221 insertions(+) create mode 100644 tools/libxl/libxl_convert_callout.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index c71c5fe..0ebc35a 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -59,6 +59,7 @@ endif LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o +LIBXL_OBJS-$(CONFIG_X86) += libxl_convert_callout.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o ifeq ($(CONFIG_NetBSD),y) diff --git a/tools/libxl/libxl_convert_callout.c b/tools/libxl/libxl_convert_callout.c new file mode 100644 index 000..02f3b82 --- /dev/null +++ b/tools/libxl/libxl_convert_callout.c @@ -0,0 +1,172 @@ +/* + * Copyright (C) 2014 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h + +#include libxl_internal.h + +/* + * Infrastructure for converting a legacy migration stream into a libxl v2 + * stream. + * + * This is done by fork()ing the python conversion script, which takes in a + * legacy stream, and puts out a suitably-formatted v2 stream. + */ + +static void helper_failed(libxl__egc *egc, + libxl__conversion_helper_state *chs, int rc); +static void helper_stop(libxl__egc *egc, libxl__ao_abortable*, int rc); +static void helper_exited(libxl__egc *egc, libxl__ev_child *ch, + pid_t pid, int status); +static void helper_done(libxl__egc *egc, +libxl__conversion_helper_state *chs); + +void libxl__conversion_helper_init(libxl__conversion_helper_state *chs) +{ +libxl__ao_abortable_init(chs-abrt); +libxl__ev_child_init(chs-child); +} + +void libxl__convert_legacy_stream(libxl__egc *egc, + libxl__conversion_helper_state *chs) +{ +STATE_AO_GC(chs-ao); +libxl__carefd *child_in = NULL, *child_out = NULL; +int ret = 0; + +chs-rc = 0; +libxl__conversion_helper_init(chs); + +chs-abrt.ao = chs-ao; +chs-abrt.callback = helper_stop; +ret = libxl__ao_abortable_register(chs-abrt); +if (ret) goto err; + +libxl__carefd_begin(); +int fds[2]; +if (libxl_pipe(CTX, fds)) { +ret = ERROR_FAIL; +libxl__carefd_unlock(); +goto err; +} +child_out = libxl__carefd_record(CTX, fds[0]); +child_in = libxl__carefd_record(CTX, fds[1]); +libxl__carefd_unlock(); + +pid_t pid = libxl__ev_child_fork(gc, chs-child, helper_exited); +if (!pid) { +char * const args[] = +{ +getenv(LIBXL_CONVERT_HELPER) ?: +LIBEXEC_BIN /convert-legacy-stream, +--in, GCSPRINTF(%d, chs-legacy_fd), +--out,GCSPRINTF(%d, fds[1]), +--width, +#ifdef __i386__ +32, +#else +64, +#endif +--guest, chs-hvm ? hvm : pv, +--format, libxl, +/* --verbose, */ +NULL, +}; + +libxl_fd_set_cloexec(CTX, chs-legacy_fd, 0); +libxl_fd_set_cloexec(CTX, libxl__carefd_fd(child_in), 0); + +libxl__exec(gc, +-1, -1, -1, +args[0], args, NULL); +} + +libxl__carefd_close(child_in); +chs-v2_carefd = child_out; + +assert(!ret); +return; + + err: +assert(ret); +helper_failed(egc, chs, ret); +} + +void libxl__convert_legacy_stream_abort(libxl__egc *egc, +libxl__conversion_helper_state *chs, +int rc) +{ +helper_stop(egc, chs-abrt, rc); +} + +static void helper_failed(libxl__egc *egc, +
Re: [Xen-devel] [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary
Tiejun Chen writes ([v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary): Previously we always fix that predefined boundary as 2G to handle conflict between memory and rdm, but now this predefined boundar can be changes with the parameter rdm_mem_boundary in .cfg file. CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 00/27] Libxl migration v2
This series adds support for the libxl migration v2 stream, and untangles the existing layering violations of the toolstack and qemu records. It can be found on the branch libxl-migv2-v2 git://xenbits.xen.org/people/andrewcoop/xen.git http://xenbits.xen.org/git-http/people/andrewcoop/xen.git Major changes in v2 are being rebased over the libxl AO-abort series, and a redesign of the internal logic to support Remus/COLO buffering and failover. At the end of the series, legacy migration is no longer used. The Remus code is untested by me. All other combinations of suspend/migrate/resume have been tested with PV and HVM guests (qemu-trad and qemu-upstream), including 32 - 64 bit migration (which was the underlying bug causing us to write migration v2 in the first place). Anyway, thoughts/comments welcome. Please test! ~Andrew Summary of Acks/Modified/New from v1 N bsd-sys-queue-h-seddery: Massage `offsetof' A tools/libxc: Always compile the compat qemu variables into xc_sr_context A tools/libxl: Introduce ROUNDUP() N tools/libxl: Introduce libxl__kill() AM tools/libxl: Stash all restore parameters in domain_create_state N tools/libxl: Split libxl__domain_create_state.restore_fd in two M tools/libxl: Extra management APIs for the save helper AM tools/xl: Mandatory flag indicating the format of the migration stream docs: Libxl migration v2 stream specification AM tools/python: Libxc migration v2 infrastructure AM tools/python: Libxl migration v2 infrastructure N tools/python: Other migration infrastructure AM tools/python: Verification utility for v2 stream spec compliance AM tools/python: Conversion utility for legacy migration streams M tools/libxl: Migration v2 stream format M tools/libxl: Infrastructure for reading a libxl migration v2 stream M tools/libxl: Support converting a legacy stream to a v2 stream M tools/libxl: Convert a legacy stream if needed M tools/libxc+libxl+xl: Restore v2 streams M tools/libxl: Infrastructure for writing a v2 stream M tools/libxc+libxl+xl: Save v2 streams AM docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams M tools/libxl: Write checkpoint records into the stream M tools/libx{c,l}: Introduce restore_callbacks.checkpoint() M tools/libxl: Handle checkpoint records in a libxl migration v2 stream A tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc A tools/libxl: Drop all knowledge of toolstack callbacks Andrew Cooper (23): tools/libxc: Always compile the compat qemu variables into xc_sr_context tools/libxl: Introduce ROUNDUP() tools/libxl: Introduce libxl__kill() tools/libxl: Stash all restore parameters in domain_create_state tools/libxl: Split libxl__domain_create_state.restore_fd in two tools/libxl: Extra management APIs for the save helper tools/xl: Mandatory flag indicating the format of the migration stream docs: Libxl migration v2 stream specification tools/python: Libxc migration v2 infrastructure tools/python: Libxl migration v2 infrastructure tools/python: Other migration infrastructure tools/python: Verification utility for v2 stream spec compliance tools/python: Conversion utility for legacy migration streams tools/libxl: Support converting a legacy stream to a v2 stream tools/libxl: Convert a legacy stream if needed tools/libxc+libxl+xl: Restore v2 streams tools/libxc+libxl+xl: Save v2 streams docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams tools/libxl: Write checkpoint records into the stream tools/libx{c,l}: Introduce restore_callbacks.checkpoint() tools/libxl: Handle checkpoint records in a libxl migration v2 stream tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc tools/libxl: Drop all knowledge of toolstack callbacks Ian Jackson (1): bsd-sys-queue-h-seddery: Massage `offsetof' Ross Lagerwall (3): tools/libxl: Migration v2 stream format tools/libxl: Infrastructure for reading a libxl migration v2 stream tools/libxl: Infrastructure for writing a v2 stream docs/specs/libxl-migration-stream.pandoc | 217 ++ tools/include/xen-external/bsd-sys-queue-h-seddery |2 + tools/libxc/Makefile |2 - tools/libxc/include/xenguest.h |9 + tools/libxc/xc_sr_common.h | 12 +- tools/libxc/xc_sr_restore.c| 71 +- tools/libxc/xc_sr_restore_x86_hvm.c| 124 tools/libxc/xc_sr_save_x86_hvm.c | 36 - tools/libxl/Makefile |2 + tools/libxl/libxl.h| 19 + tools/libxl/libxl_aoutils.c| 15 + tools/libxl/libxl_convert_callout.c| 172 + tools/libxl/libxl_create.c | 86 ++- tools/libxl/libxl_dom.c| 65 +- tools/libxl/libxl_internal.h | 192 -
[Xen-devel] [PATCH v2 01/27] bsd-sys-queue-h-seddery: Massage `offsetof'
From: Ian Jackson ian.jack...@eu.citrix.com For some reason BSD's queue.h uses `__offsetof'. It expects it to work just like offsetof. So use offsetof. Reported-by: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com --- tools/include/xen-external/bsd-sys-queue-h-seddery |2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/include/xen-external/bsd-sys-queue-h-seddery b/tools/include/xen-external/bsd-sys-queue-h-seddery index 7a957e3..3f8716d 100755 --- a/tools/include/xen-external/bsd-sys-queue-h-seddery +++ b/tools/include/xen-external/bsd-sys-queue-h-seddery @@ -69,4 +69,6 @@ s/\b struct \s+ type \b/type/xg; s,^\#include.*sys/cdefs.*,/* $ */,xg; +s,\b __offsetof \b ,offsetof,xg; + s/\b( NULL )/0/xg; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 02/27] tools/libxc: Always compile the compat qemu variables into xc_sr_context
This is safe (as the variables will simply be unused), and is required for correct compilation when midway through untangling the libxc/libxl interaction. The #define is left in place to highlight that the variables can be removed once the untangling is complete. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/xc_sr_common.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 565c5da..08c66db 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -307,10 +307,10 @@ struct xc_sr_context void *context; size_t contextsz; -#ifdef XG_LIBXL_HVM_COMPAT +/* #ifdef XG_LIBXL_HVM_COMPAT */ uint32_t qlen; void *qbuf; -#endif +/* #endif */ } restore; }; } x86_hvm; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 08/27] tools/xl: Mandatory flag indicating the format of the migration stream
Introduced at this point so the python stream conversion code has a concrete ABI to use. Later when libxl itself starts supporting a v2 stream, it will be added to XL_MANDATORY_FLAG_ALL. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- v2: Expand commit message --- tools/libxl/xl_cmdimpl.c |1 + 1 file changed, 1 insertion(+) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 971209c..26b1e7d 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -109,6 +109,7 @@ */ #define XL_MANDATORY_FLAG_JSON (1U 0) /* config data is in JSON format */ +#define XL_MANDATORY_FLAG_STREAMv2 (1U 1) /* stream is v2 */ #define XL_MANDATORY_FLAG_ALL (XL_MANDATORY_FLAG_JSON) struct save_file_header { char magic[32]; /* savefileheader_magic */ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 06/27] tools/libxl: Split libxl__domain_create_state.restore_fd in two
In a future patch, we shall support automatically converting a legacy stream to a v2 stream, in which case libxc needs to read from a different fd. Simply overwriting restore_fd does not work; the two fd's have different circumstances. The restore_fd needs to be returned to its origial state before libxl_domain_create_restore() returns, while in the converted case, the fd needs allocating and deallocating appropriately. No functional change. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- New in v2 --- tools/libxl/libxl_create.c |2 +- tools/libxl/libxl_internal.h |2 +- tools/libxl/libxl_save_callout.c |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 61515da..be13204 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1525,7 +1525,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, cdcs-dcs.guest_config = d_config; libxl_domain_config_init(cdcs-dcs.guest_config_saved); libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config); -cdcs-dcs.restore_fd = restore_fd; +cdcs-dcs.restore_fd = cdcs-dcs.libxc_fd = restore_fd; if (restore_fd -1) cdcs-dcs.restore_params = *params; cdcs-dcs.callback = domain_create_cb; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 5ab945a..588cfb8 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3216,7 +3216,7 @@ struct libxl__domain_create_state { libxl__ao *ao; libxl_domain_config *guest_config; libxl_domain_config guest_config_saved; /* vanilla config */ -int restore_fd; +int restore_fd, libxc_fd; libxl_domain_restore_params restore_params; libxl__domain_create_cb *callback; libxl_asyncprogress_how aop_console_how; diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 80aae1b..1136b79 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -48,7 +48,7 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs, /* Convenience aliases */ const uint32_t domid = dcs-guest_domid; -const int restore_fd = dcs-restore_fd; +const int restore_fd = dcs-libxc_fd; libxl__domain_build_state *const state = dcs-build_state; unsigned cbflags = libxl__srm_callout_enumcallbacks_restore -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 26/27] tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
Libxl has now been fully adjusted not to need it. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/xc_sr_common.h |5 -- tools/libxc/xc_sr_restore.c | 18 - tools/libxc/xc_sr_restore_x86_hvm.c | 124 --- tools/libxc/xc_sr_save_x86_hvm.c| 36 -- 4 files changed, 183 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 1f4d4e4..64f6082 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -309,11 +309,6 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; - -/* #ifdef XG_LIBXL_HVM_COMPAT */ -uint32_t qlen; -void *qbuf; -/* #endif */ } restore; }; } x86_hvm; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index c9b5213..2f6a763 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -627,9 +627,6 @@ static void cleanup(struct xc_sr_context *ctx) PERROR(Failed to clean up); } -#ifdef XG_LIBXL_HVM_COMPAT -extern int read_qemu(struct xc_sr_context *ctx); -#endif /* * Restore a domain. */ @@ -656,21 +653,6 @@ static int restore(struct xc_sr_context *ctx) goto err; } -#ifdef XG_LIBXL_HVM_COMPAT -if ( ctx-dominfo.hvm - (rec.type == REC_TYPE_END || rec.type == REC_TYPE_CHECKPOINT) ) -{ -rc = read_qemu(ctx); -if ( rc ) -{ -if ( ctx-restore.buffer_all_records ) -goto remus_failover; -else -goto err; -} -} -#endif - if ( ctx-restore.buffer_all_records rec.type != REC_TYPE_END rec.type != REC_TYPE_CHECKPOINT ) diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c index 6f5af0e..49d22c7 100644 --- a/tools/libxc/xc_sr_restore_x86_hvm.c +++ b/tools/libxc/xc_sr_restore_x86_hvm.c @@ -3,24 +3,6 @@ #include xc_sr_common_x86.h -#ifdef XG_LIBXL_HVM_COMPAT -static int handle_toolstack(struct xc_sr_context *ctx, struct xc_sr_record *rec) -{ -xc_interface *xch = ctx-xch; -int rc; - -if ( !ctx-restore.callbacks || !ctx-restore.callbacks-toolstack_restore ) -return 0; - -rc = ctx-restore.callbacks-toolstack_restore( -ctx-domid, rec-data, rec-length, ctx-restore.callbacks-data); - -if ( rc 0 ) -PERROR(restoring toolstack); -return rc; -} -#endif - /* * Process an HVM_CONTEXT record from the stream. */ @@ -93,98 +75,6 @@ static int handle_hvm_params(struct xc_sr_context *ctx, return 0; } -#ifdef XG_LIBXL_HVM_COMPAT -int read_qemu(struct xc_sr_context *ctx); -int read_qemu(struct xc_sr_context *ctx) -{ -xc_interface *xch = ctx-xch; -char qemusig[21]; -uint32_t qlen; -void *qbuf = NULL; -int rc = -1; - -if ( read_exact(ctx-fd, qemusig, sizeof(qemusig)) ) -{ -PERROR(Error reading QEMU signature); -goto out; -} - -if ( !memcmp(qemusig, DeviceModelRecord0002, sizeof(qemusig)) ) -{ -if ( read_exact(ctx-fd, qlen, sizeof(qlen)) ) -{ -PERROR(Error reading QEMU record length); -goto out; -} - -qbuf = malloc(qlen); -if ( !qbuf ) -{ -PERROR(no memory for device model state); -goto out; -} - -if ( read_exact(ctx-fd, qbuf, qlen) ) -{ -PERROR(Error reading device model state); -goto out; -} -} -else -{ -ERROR(Invalid device model state signature '%*.*s', - (int)sizeof(qemusig), (int)sizeof(qemusig), qemusig); -goto out; -} - -/* With Remus, this could be read many times */ -if ( ctx-x86_hvm.restore.qbuf ) -free(ctx-x86_hvm.restore.qbuf); -ctx-x86_hvm.restore.qbuf = qbuf; -ctx-x86_hvm.restore.qlen = qlen; -rc = 0; - -out: -if (rc) -free(qbuf); -return rc; -} - -static int handle_qemu(struct xc_sr_context *ctx) -{ -xc_interface *xch = ctx-xch; -char path[256]; -uint32_t qlen = ctx-x86_hvm.restore.qlen; -void *qbuf = ctx-x86_hvm.restore.qbuf; -int rc = -1; -FILE *fp = NULL; - -sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE.%u, ctx-domid); -fp = fopen(path, wb); -if ( !fp ) -{ -PERROR(Failed to open '%s' for writing, path); -goto out; -} - -DPRINTF(Writing %u bytes of QEMU data, qlen); -if ( fwrite(qbuf, 1, qlen, fp) != qlen ) -{ -PERROR(Failed to write %u bytes of QEMU data, qlen); -goto out; -} - -rc = 0; - - out: -if ( fp ) -fclose(fp);
[Xen-devel] [PATCH v2 14/27] tools/python: Conversion utility for legacy migration streams
This utility will take a legacy stream as in input, and produce a v2 stream as an output. It is exec()'d by libxl to provide backwards compatibility. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/python/Makefile |4 + tools/python/scripts/convert-legacy-stream | 678 2 files changed, 682 insertions(+) create mode 100755 tools/python/scripts/convert-legacy-stream diff --git a/tools/python/Makefile b/tools/python/Makefile index e933be8..df942a7 100644 --- a/tools/python/Makefile +++ b/tools/python/Makefile @@ -17,9 +17,13 @@ build: genwrap.py $(XEN_ROOT)/tools/libxl/libxl_types.idl \ .PHONY: install install: + $(INSTALL_DIR) $(DESTDIR)$(PRIVATE_BINDIR) + CC=$(CC) CFLAGS=$(PY_CFLAGS) $(PYTHON) setup.py install \ $(PYTHON_PREFIX_ARG) --root=$(DESTDIR) --force + $(INSTALL_PROG) scripts/convert-legacy-stream $(DESTDIR)$(PRIVATE_BINDIR) + .PHONY: test test: export LD_LIBRARY_PATH=$$(readlink -f ../libxc):$$(readlink -f ../xenstore); $(PYTHON) test.py -b -u diff --git a/tools/python/scripts/convert-legacy-stream b/tools/python/scripts/convert-legacy-stream new file mode 100755 index 000..d54fa22 --- /dev/null +++ b/tools/python/scripts/convert-legacy-stream @@ -0,0 +1,678 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- + + +Convert a legacy migration stream to a v2 stream. + + +import sys +import os, os.path +import syslog +import traceback + +from struct import calcsize, unpack, pack + +from xen.migration import legacy, public, libxc, libxl, xl + +__version__ = 1 + +fin = None # Input file/fd +fout = None# Output file/fd +twidth = 0 # Legacy toolstack bitness (32 or 64) +pv = None # Boolean (pv or hvm) +qemu = True# Boolean - process qemu record? +log_to_syslog = False # Boolean - Log to syslog instead of stdout/err? +verbose = False# Boolean - Summarise stream contents + +def stream_read(_ = None): +Read from the input +return fin.read(_) + +def stream_write(_): +Write to the output +return fout.write(_) + +def info(msg): +Info message, routed to appropriate destination +if verbose: +if log_to_syslog: +for line in msg.split(\n): +syslog.syslog(syslog.LOG_INFO, line) +else: +print msg + +def err(msg): +Error message, routed to appropriate destination +if log_to_syslog: +for line in msg.split(\n): +syslog.syslog(syslog.LOG_ERR, line) +print sys.stderr, msg + +class StreamError(StandardError): +Error with the incoming migration stream +pass + +class VM(object): +Container of VM parameters + +def __init__(self, fmt): +# Common +self.p2m_size = 0 + +# PV +self.max_vcpu_id = 0 +self.online_vcpu_map = [] +self.width = 0 +self.levels = 0 +self.basic_len = 0 +self.extd = False +self.xsave_len = 0 + +# libxl +self.libxl = fmt == libxl +self.xenstore = [] # Deferred toolstack records + +def write_libxc_ihdr(): +stream_write(pack(libxc.IHDR_FORMAT, + libxc.IHDR_MARKER, # Marker + libxc.IHDR_IDENT, # Ident + libxc.IHDR_VERSION, # Version + libxc.IHDR_OPT_LE, # Options + 0, 0)) # Reserved + +def write_libxc_dhdr(): +if pv: +dtype = libxc.DHDR_TYPE_x86_pv +else: +dtype = libxc.DHDR_TYPE_x86_hvm + +stream_write(pack(libxc.DHDR_FORMAT, + dtype,# Type + 12, # Page size + 0,# Reserved + 0,# Xen major (converted) + __version__)) # Xen minor (converted) + +def write_libxl_hdr(): +stream_write(pack(libxl.HDR_FORMAT, + libxl.HDR_IDENT, # Ident + libxl.HDR_VERSION, # Version 2 + libxl.HDR_OPT_LE | # Options + libxl.HDR_OPT_LEGACY # Little Endian and Legacy + )) + +def write_record(rt, *argl): +alldata = ''.join(argl) +length = len(alldata) + +record = pack(libxc.RH_FORMAT, rt, length) + alldata +plen = (8 - (length 7)) 7 +record += '\x00' * plen + +stream_write(record) + +def write_libxc_pv_info(vm): +write_record(libxc.REC_TYPE_x86_pv_info, + pack(libxc.X86_PV_INFO_FORMAT, + vm.width, vm.levels, 0, 0)) + +def write_libxc_pv_p2m_frames(vm, pfns): +write_record(libxc.REC_TYPE_x86_pv_p2m_frames, + pack(libxc.X86_PV_P2M_FRAMES_FORMAT, + 0, vm.p2m_size - 1), +
[Xen-devel] [PATCH v2 27/27] tools/libxl: Drop all knowledge of toolstack callbacks
Libxl has now been fully adjusted not to need them. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxl/libxl_dom.c|1 - tools/libxl/libxl_internal.h |2 -- tools/libxl/libxl_save_callout.c | 39 +--- tools/libxl/libxl_save_helper.c| 29 --- tools/libxl/libxl_save_msgs_gen.pl |7 ++- 5 files changed, 3 insertions(+), 75 deletions(-) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 9f5ddc9..3c765f4 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -2115,7 +2115,6 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss) callbacks-suspend = libxl__domain_suspend_callback; callbacks-switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; -dss-shs.callbacks.save.toolstack_save = libxl__toolstack_save; dss-sws.fd = dss-fd; dss-sws.ao = dss-ao; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 1b62f25..13e2493 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2699,8 +2699,6 @@ _hidden void libxl__datacopier_prefixdata(libxl__egc*, libxl__datacopier_state*, typedef struct libxl__srm_save_callbacks { libxl__srm_save_autogen_callbacks a; -int (*toolstack_save)(uint32_t domid, uint8_t **buf, - uint32_t *len, void *data); } libxl__srm_save_callbacks; typedef struct libxl__srm_restore_callbacks { diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index cd18cd2..2a6662f 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -78,41 +78,12 @@ void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs, void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss) { STATE_AO_GC(dss-ao); -int r, rc, toolstack_data_fd = -1; -uint32_t toolstack_data_len = 0; - -/* Resources we need to free */ -uint8_t *toolstack_data_buf = 0; unsigned cbflags = libxl__srm_callout_enumcallbacks_save (dss-shs.callbacks.save.a); -if (dss-shs.callbacks.save.toolstack_save) { -r = dss-shs.callbacks.save.toolstack_save -(dss-domid, toolstack_data_buf, toolstack_data_len, dss); -if (r) { rc = ERROR_FAIL; goto out; } - -dss-shs.toolstack_data_file = tmpfile(); -if (!dss-shs.toolstack_data_file) { -LOGE(ERROR, cannot create toolstack data tmpfile); -rc = ERROR_FAIL; -goto out; -} -toolstack_data_fd = fileno(dss-shs.toolstack_data_file); - -r = libxl_write_exactly(CTX, toolstack_data_fd, -toolstack_data_buf, toolstack_data_len, -toolstack data tmpfile, 0); -if (r) { rc = ERROR_FAIL; goto out; } - -/* file position must be reset before passing to libxl-save-helper. */ -r = lseek(toolstack_data_fd, 0, SEEK_SET); -if (r) { rc = ERROR_FAIL; goto out; } -} - const unsigned long argnums[] = { dss-domid, 0, 0, dss-xcflags, dss-hvm, -toolstack_data_fd, toolstack_data_len, cbflags, }; @@ -123,18 +94,10 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss) dss-shs.caller_state = dss; dss-shs.need_results = 0; -free(toolstack_data_buf); - run_helper(egc, dss-shs, --save-domain, dss-fd, - toolstack_data_fd, 1, + NULL, 0, argnums, ARRAY_SIZE(argnums)); return; - - out: -free(toolstack_data_buf); -if (dss-shs.toolstack_data_file) fclose(dss-shs.toolstack_data_file); - -libxl__xc_domain_save_done(egc, dss, rc, 0, 0); } diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index f196786..1622bb7 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -213,32 +213,8 @@ int helper_getreply(void *user) /*- other callbacks -*/ -static int toolstack_save_fd; -static uint32_t toolstack_save_len; static struct save_callbacks helper_save_callbacks; -static int toolstack_save_cb(uint32_t domid, uint8_t **buf, - uint32_t *len, void *data) -{ -int r; - -assert(toolstack_save_fd 0); - -/* This is a hack for remus */ -if (helper_save_callbacks.checkpoint) { -r = lseek(toolstack_save_fd, 0, SEEK_SET); -if (r) fail(errno,rewind toolstack data tmpfile); -} - -*buf = xmalloc(toolstack_save_len); -r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len); -if (r0) fail(errno,read toolstack data); -if (r==0) fail(0,read toolstack data eof); - -*len = toolstack_save_len; -return 0;
[Xen-devel] [PATCH v2 21/27] tools/libxc+libxl+xl: Save v2 streams
This is a complicated set of changes which must be done together for bisectability. * libxl-save-helper is updated to unconditionally use libxc migration v2. * libxl compatibility workarounds in libxc are disabled for save operations. * libxl__stream_write_start() is logically spliced into the event location where libxl__xc_domain_save() used to reside. * xl is updated to indicate that the stream is now v2 Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- tools/libxc/Makefile |2 -- tools/libxl/libxl.h |2 ++ tools/libxl/libxl_dom.c | 46 +- tools/libxl/libxl_save_helper.c |2 +- tools/libxl/libxl_stream_write.c | 35 +++-- tools/libxl/xl_cmdimpl.c |1 + 6 files changed, 47 insertions(+), 41 deletions(-) diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 2cd0b1a..1aec848 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -64,8 +64,6 @@ GUEST_SRCS-$(CONFIG_X86) += xc_sr_save_x86_hvm.c GUEST_SRCS-y += xc_sr_restore.c GUEST_SRCS-y += xc_sr_save.c GUEST_SRCS-y += xc_offline_page.c xc_compression.c -xc_sr_save_x86_hvm.o: CFLAGS += -DXG_LIBXL_HVM_COMPAT -xc_sr_save_x86_hvm.opic: CFLAGS += -DXG_LIBXL_HVM_COMPAT else GUEST_SRCS-y += xc_nomigrate.c endif diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index e64a606..4f24e5f 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -812,6 +812,8 @@ * * If this is defined, then the libxl_domain_create_restore() interface takes * a stream_version parameter and supports a value of 2. + * + * libxl_domain_suspend() will produce a v2 stream. */ #define LIBXL_HAVE_STREAM_V2 1 diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 8642192..de05124 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -1153,6 +1153,8 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *ptr, /* Domain suspend (save) */ +static void stream_done(libxl__egc *egc, +libxl__stream_write_state *sws, int rc); static void domain_suspend_done(libxl__egc *egc, libxl__domain_suspend_state *dss, int rc); static void domain_suspend_callback_common_done(libxl__egc *egc, @@ -2117,50 +2119,22 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss) callbacks-switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; dss-shs.callbacks.save.toolstack_save = libxl__toolstack_save; -libxl__xc_domain_save(egc, dss); +dss-sws.fd = dss-fd; +dss-sws.ao = dss-ao; +dss-sws.completion_callback = stream_done; + +libxl__stream_write_start(egc, dss-sws); return; out: domain_suspend_done(egc, dss, rc); } -void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void, -int rc, int retval, int errnoval) +static void stream_done(libxl__egc *egc, +libxl__stream_write_state *sws, int rc) { -libxl__domain_suspend_state *dss = dss_void; -STATE_AO_GC(dss-ao); - -/* Convenience aliases */ -const libxl_domain_type type = dss-type; - -if (rc) -goto out; - -if (retval) { -LOGEV(ERROR, errnoval, saving domain: %s, - dss-guest_responded ? - domain responded to suspend request : - domain did not respond to suspend request); -if ( !dss-guest_responded ) -rc = ERROR_GUEST_TIMEDOUT; -else if (dss-rc) -rc = dss-rc; -else -rc = ERROR_FAIL; -goto out; -} - -if (type == LIBXL_DOMAIN_TYPE_HVM) { -rc = libxl__domain_suspend_device_model(gc, dss); -if (rc) goto out; +libxl__domain_suspend_state *dss = CONTAINER_OF(sws, *dss, sws); -libxl__domain_save_device_model(egc, dss, domain_suspend_done); -return; -} - -rc = 0; - -out: domain_suspend_done(egc, dss, rc); } diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index efbe2eb..f196786 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -286,7 +286,7 @@ int main(int argc, char **argv) startup(save); setup_signals(save_signal_handler); -r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags, +r = xc_domain_save2(xch, io_fd, dom, max_iters, max_factor, flags, helper_save_callbacks, hvm); complete(r); diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c index bf568ad..331173f 100644 --- a/tools/libxl/libxl_stream_write.c +++ b/tools/libxl/libxl_stream_write.c @@ -276,8 +276,39 @@ static void
[Xen-devel] [PATCH v2 12/27] tools/python: Other migration infrastructure
Contains: * Reverse-engineered notes of the legacy format from xg_save_restore.h * Python implementation of the legacy format * Public HVM Params used in the legacy stream * XL header format Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- New in v2 - removes various many magic numbers from subsequent scripts --- tools/python/xen/migration/legacy.py | 279 ++ tools/python/xen/migration/public.py | 21 +++ tools/python/xen/migration/xl.py | 12 ++ 3 files changed, 312 insertions(+) create mode 100644 tools/python/xen/migration/legacy.py create mode 100644 tools/python/xen/migration/public.py create mode 100644 tools/python/xen/migration/xl.py diff --git a/tools/python/xen/migration/legacy.py b/tools/python/xen/migration/legacy.py new file mode 100644 index 000..2f2240a --- /dev/null +++ b/tools/python/xen/migration/legacy.py @@ -0,0 +1,279 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- + + +Libxc legacy migration streams + +Documentation and record structures for legacy migration + + + +SAVE/RESTORE/MIGRATE PROTOCOL += + +The general form of a stream of chunks is a header followed by a +body consisting of a variable number of chunks (terminated by a +chunk with type 0) followed by a trailer. + +For a rolling/checkpoint (e.g. remus) migration then the body and +trailer phases can be repeated until an external event +(e.g. failure) causes the process to terminate and commit to the +most recent complete checkpoint. + +HEADER +-- + +unsigned long: p2m_size + +extended-info (PV-only, optional): + + If first unsigned long == ~0UL then extended info is present, + otherwise unsigned long is part of p2m. Note that p2m_size above + does not include the length of the extended info. + + extended-info: + +unsigned long: signature == ~0UL +uint32_t : number of bytes remaining in extended-info + +1 or more extended-info blocks of form: +char[4] : block identifier +uint32_t : block data size +bytes: block data + +defined extended-info blocks: +vcpu : VCPU context info containing vcpu_guest_context_t. + The precise variant of the context structure + (e.g. 32 vs 64 bit) is distinguished by + the block size. +extv : Presence indicates use of extended VCPU context in + tail, data size is 0. + +p2m (PV-only): + + consists of p2m_size bytes comprising an array of xen_pfn_t sized entries. + +BODY PHASE - Format A (for live migration or Remus without compression) +-- + +A series of chunks with a common header: + int : chunk type + +If the chunk type is +ve then chunk contains guest memory data, and the +type contains the number of pages in the batch: + +unsigned long[] : PFN array, length == number of pages in batch + Each entry consists of XEN_DOMCTL_PFINFO_* + in bits 31-28 and the PFN number in bits 27-0. +page data: PAGE_SIZE bytes for each page marked present in PFN + array + +If the chunk type is -ve then chunk consists of one of a number of +metadata types. See definitions of XC_SAVE_ID_* below. + +If chunk type is 0 then body phase is complete. + + +BODY PHASE - Format B (for Remus with compression) +-- + +A series of chunks with a common header: + int : chunk type + +If the chunk type is +ve then chunk contains array of PFNs corresponding +to guest memory and type contains the number of PFNs in the batch: + +unsigned long[] : PFN array, length == number of pages in batch + Each entry consists of XEN_DOMCTL_PFINFO_* + in bits 31-28 and the PFN number in bits 27-0. + +If the chunk type is -ve then chunk consists of one of a number of +metadata types. See definitions of XC_SAVE_ID_* below. + +If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the +chunk consists of compressed page data, in the following format: + +unsigned long: Size of the compressed chunk to follow +compressed data : variable length data of size indicated above. + This chunk consists of compressed page data. + The number of pages in one chunk depends on + the amount of space available in the sender's + output buffer. + +Format of compressed data: + compressed_data = deltas* + delta = marker, run* + marker = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker] + RUNFLAG = 0 + SKIPFLAG= 1 7 + RUNLEN = 7-bit unsigned value indicating number of WORDS in the run + run =
[Xen-devel] [PATCH v2 20/27] tools/libxl: Infrastructure for writing a v2 stream
From: Ross Lagerwall ross.lagerw...@citrix.com This contains the event machinary and state machines to write non-checkpointed migration v2 stream (with the exception of the xc_domain_save() handling which is spliced later in a bisectable way). Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com --- As with the read side of things, this has undergone substantial changes in v2. --- tools/libxl/Makefile |2 +- tools/libxl/libxl_internal.h | 47 tools/libxl/libxl_stream_write.c | 451 ++ 3 files changed, 499 insertions(+), 1 deletion(-) create mode 100644 tools/libxl/libxl_stream_write.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 0ebc35a..7d44483 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -95,7 +95,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \ libxl_internal.o libxl_utils.o libxl_uuid.o \ libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o \ - libxl_stream_read.o \ + libxl_stream_read.o libxl_stream_write.o \ libxl_save_callout.o _libxl_save_msgs_callout.o \ libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y) LIBXL_OBJS += libxl_genid.o diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 1cf1884..2beb534 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2973,6 +2973,52 @@ typedef void libxl__domain_suspend_cb(libxl__egc*, typedef void libxl__save_device_model_cb(libxl__egc*, libxl__domain_suspend_state*, int rc); +/* State for writing a libxl migration v2 stream */ +typedef struct libxl__stream_write_state libxl__stream_write_state; + +typedef void (*sws_record_done_cb)(libxl__egc *egc, + libxl__stream_write_state *sws); + +struct libxl__stream_write_state { +/* filled by the user */ +libxl__ao *ao; +int fd; +uint32_t domid; +void (*completion_callback)(libxl__egc *egc, +libxl__stream_write_state *sws, +int rc); +/* Private */ +int rc; +bool running; + +/* Active-stuff handling */ +int joined_rc; + +/* Main stream-writing data */ +size_t padding; +libxl__datacopier_state dc; +sws_record_done_cb record_done_callback; + +/* Emulator blob handling */ +libxl__datacopier_state emu_dc; +libxl__carefd *emu_carefd; +libxl__sr_rec_hdr emu_rec_hdr; +void *emu_body; +}; + +_hidden void libxl__stream_write_start(libxl__egc *egc, + libxl__stream_write_state *stream); + +_hidden void libxl__stream_write_abort(libxl__egc *egc, + libxl__stream_write_state *stream, + int rc); + +static inline bool libxl__stream_write_inuse( +const libxl__stream_write_state *stream) +{ +return stream-running; +} + typedef struct libxl__logdirty_switch { const char *cmd; const char *cmd_path; @@ -3013,6 +3059,7 @@ struct libxl__domain_suspend_state { /* private for libxl__domain_save_device_model */ libxl__save_device_model_cb *save_dm_callback; libxl__datacopier_state save_dm_datacopier; +libxl__stream_write_state sws; }; diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c new file mode 100644 index 000..bf568ad --- /dev/null +++ b/tools/libxl/libxl_stream_write.c @@ -0,0 +1,451 @@ +/* + * Copyright (C) 2015 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include libxl_osdeps.h /* must come before any other headers */ + +#include libxl_internal.h + +/* + * Infrastructure for writing a domain to a libxl migration v2 stream. + * + * Entry points from outside: + * - libxl__stream_write_start() + * - Start writing a stream from the start. + * + * In normal operation, there are two tasks running at once; this stream + * processing, and the libxl-save-helper. check_stream_finished() is used to + * join all the tasks in both success and error