performance trouble
Hello, I use several kvm box, and no problem at all except for 1 application that have bad response time. The VM runs Windows 2008R2 and the application is an client-server app develop with progress software and talk to an Oracle databasei (on another server) and we access this app with RDS/TSE. The physical server runs Debian testing to have qemu-kvm 1.0 and linux kernel 3.1 and libvirt 0.9.8. We use virtio for disk and network and use the last driver for Windows (from RH). We have 2 references servers : one physical and one running Vmware. Response time : o physical = 7s o VM under vmware = 8s o VM under KVM = 12s (to complete with qemu-kvm 0.12.5 and kernel 2.6.32 we have 22s ...). I attach the libvirt xml of my vm. How can I see what's append ? Do you have idea to increase performance ? David. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42635] New: PCIe passthrough broken with AMD iommu after s2disk / resume
https://bugzilla.kernel.org/show_bug.cgi?id=42635 Summary: PCIe passthrough broken with AMD iommu after s2disk / resume Product: Virtualization Version: unspecified Kernel Version: 3.1 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: kmuel...@justmail.de Regression: No A PCIe ethernet device is passed through to the guest and runs fine. Some time later, the guest is shutdown (e.g. virsh shutdown guest) and the host is suspended (s2disk) and resumed again. The guest is started again (and it starts fine), but the device in the guest isn't working any more (it can be seen, it can be pinged itself but nothing is reachable outside the device. The rx or tx counters of the ifconfig output are always 0. If the guest is shutdown and the device is bound to the host, it's working as expected. If the device is afterwards bound to the guest again, it doesn't work as before in the guest after resume. kernel: 3.1 / 64bit / smp kvm: 1.0 board: GA-990XA-UD3 If you need more information, I can provide them - feel free to ask! Regards, Klaus -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42636] New: PCI passthrough does not work with AMD iommu for PCI device
https://bugzilla.kernel.org/show_bug.cgi?id=42636 Summary: PCI passthrough does not work with AMD iommu for PCI device Product: Virtualization Version: unspecified Kernel Version: 3.1 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: kmuel...@justmail.de Regression: No I want to passthrough this PCI deivce to a kvm guest: 05:07.0 Network controller: RaLink RT2800 802.11n PCI Subsystem: Linksys Device 0067 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow TAbort- TAbort- MAbort- SERR- PERR- INTx- Interrupt: pin A routed to IRQ 21 Region 0: Memory at fdae (32-bit, non-prefetchable) [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Unfortunately, I'm always getting an error during virsh start guest: Failed to assign device hostdev0 : Device or resource busy. qemu-kvm: -device pci-assign,host=05:06.0,id=hostdev0,configfd=20,bus=pci.0,addr=0x4: Device 'pci-assign' could not be initialized. If I'm adding this device (05:06.0) to the guest, too, I'm getting the exactly same error again. Of course, I unloaded the module of this additional device before trying to passthrough it to the guest. lspci -vvs 05:06.0 05:06.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 0c) Subsystem: Intel Corporation EtherExpress PRO/100 S Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 64 (2000ns min, 14000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 20 Region 0: Memory at fdaff000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at af00 [size=64] Region 2: Memory at fdaa (32-bit, non-prefetchable) [size=128K] [virtual] Expansion ROM at fd90 [disabled] [size=64K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME- lspci -vt -[:00]-+-00.0 ATI Technologies Inc RD890 PCI to PCI bridge (external gfx0 port B) +-00.2 ATI Technologies Inc Device 5a23 +-02.0-[01]--+-00.0 ATI Technologies Inc NI Turks [AMD Radeon HD 6500] |\-00.1 ATI Technologies Inc Device aa90 +-04.0-[02]00.0 Device 1b6f:7023 +-09.0-[03]00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller +-0a.0-[04]00.0 Device 1b6f:7023 +-11.0 ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] +-12.0 ATI Technologies Inc SB700/SB800 USB OHCI0 Controller +-12.2 ATI Technologies Inc SB700/SB800 USB EHCI Controller +-13.0 ATI Technologies Inc SB700/SB800 USB OHCI0 Controller +-13.2 ATI Technologies Inc SB700/SB800 USB EHCI Controller +-14.0 ATI Technologies Inc SBx00 SMBus Controller +-14.1 ATI Technologies Inc SB700/SB800 IDE Controller +-14.2 ATI Technologies Inc SBx00 Azalia (Intel HDA) +-14.3 ATI Technologies Inc SB700/SB800 LPC host controller +-14.4-[05]--+-06.0 Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 |\-07.0 RaLink RT2800 802.11n PCI +-14.5 ATI Technologies Inc SB700/SB800 USB OHCI2 Controller +-15.0-[06]-- +-16.0 ATI Technologies Inc SB700/SB800 USB OHCI0 Controller +-16.2 ATI Technologies Inc SB700/SB800 USB EHCI Controller +-18.0 Advanced Micro Devices [AMD] Device 1600 +-18.1 Advanced Micro Devices [AMD] Device 1601 +-18.2 Advanced Micro Devices [AMD] Device 1602 +-18.3 Advanced Micro Devices [AMD] Device 1603 +-18.4 Advanced Micro Devices [AMD] Device 1604 \-18.5 Advanced Micro Devices [AMD] Device 1605 dmesg | grep AMD-Vi [0.610182] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300 [0.610184] AMD-Vi:mmio-addr: fec3 [0.610359] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:00.0 flags: 00 [0.610360] AMD-Vi: DEV_RANGE_END devid: 00:00.2 [0.610362] AMD-Vi: DEV_SELECT devid: 00:02.0 flags: 00 [0.610363] AMD-Vi:
Re: performance trouble
Le Mon, Jan 23, 2012 at 09:28:37AM +0100, David Cure ecrivait : I attach the libvirt xml of my vm. I forget to attach ;) David. rds.xml Description: XML document signature.asc Description: Digital signature
[PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count
The last one is an RFC patch: I think it is better to refactor the rmap things, if needed, before other architectures than x86 starts large pages support. Takuya arch/ia64/kvm/kvm-ia64.c|8 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++-- arch/x86/kvm/mmu.c | 24 ++-- arch/x86/kvm/mmu_audit.c|4 +--- arch/x86/kvm/x86.c |4 ++-- include/linux/kvm_host.h| 10 -- virt/kvm/kvm_main.c | 29 + 8 files changed, 47 insertions(+), 42 deletions(-) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: MMU: Use gfn_to_rmap() in audit_write_protection()
We want to eliminate direct access to the rmap array. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu_audit.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 6eabae3..e62fa4f 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -190,15 +190,13 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp) { - struct kvm_memory_slot *slot; unsigned long *rmapp; u64 *spte; if (sp-role.direct || sp-unsync || sp-role.invalid) return; - slot = gfn_to_memslot(kvm, sp-gfn); - rmapp = slot-rmap[sp-gfn - slot-base_gfn]; + rmapp = gfn_to_rmap(kvm, sp-gfn, PT_PAGE_TABLE_LEVEL); spte = rmap_next(rmapp, NULL); while (spte) { -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: MMU: Use __gfn_to_rmap() in kvm_handle_hva()
We can hide the implementation details and treat every level uniformly. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 844fcce..0e82d9d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1133,14 +1133,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, gfn_t gfn_offset = (hva - start) PAGE_SHIFT; gfn_t gfn = memslot-base_gfn + gfn_offset; - ret = handler(kvm, memslot-rmap[gfn_offset], data); + ret = 0; - for (j = 0; j KVM_NR_PAGE_SIZES - 1; ++j) { - struct kvm_lpage_info *linfo; + for (j = PT_PAGE_TABLE_LEVEL; +j PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) { + unsigned long *rmapp; - linfo = lpage_info_slot(gfn, memslot, - PT_DIRECTORY_LEVEL + j); - ret |= handler(kvm, linfo-rmap_pde, data); + rmapp = __gfn_to_rmap(gfn, j, memslot); + ret |= handler(kvm, rmapp, data); } trace_kvm_age_page(hva, memslot, ret); retval |= ret; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: Introduce gfn_to_index() which returns the index for a given level
We can also use this for PT_PAGE_TABLE_LEVEL to treat every level uniformly. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c |3 +-- include/linux/kvm_host.h |7 +++ virt/kvm/kvm_main.c |4 +--- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 0e82d9d..12f5c99 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -688,8 +688,7 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn, { unsigned long idx; - idx = (gfn KVM_HPAGE_GFN_SHIFT(level)) - - (slot-base_gfn KVM_HPAGE_GFN_SHIFT(level)); + idx = gfn_to_index(gfn, slot-base_gfn, level); return slot-lpage_info[level - 2][idx]; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index eada8e6..06d4e41 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -656,6 +656,13 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn) return gfn_to_memslot(kvm, gfn)-id; } +static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level) +{ + /* KVM_HPAGE_GFN_SHIFT(PT_PAGE_TABLE_LEVEL) must be 0. */ + return (gfn KVM_HPAGE_GFN_SHIFT(level)) - + (base_gfn KVM_HPAGE_GFN_SHIFT(level)); +} + static inline unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9f32bff..4f2574f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -803,9 +803,7 @@ int __kvm_set_memory_region(struct kvm *kvm, if (new.lpage_info[i]) continue; - lpages = 1 + ((base_gfn + npages - 1) - KVM_HPAGE_GFN_SHIFT(level)); - lpages -= base_gfn KVM_HPAGE_GFN_SHIFT(level); + lpages = gfn_to_index(base_gfn + npages - 1, base_gfn, level) + 1; new.lpage_info[i] = vzalloc(lpages * sizeof(*new.lpage_info[i])); -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/4] KVM: Decouple rmap_pde from lpage_info write_count
Though we have one rmap array for every level, those for large pages, called rmap_pde, are coupled with write_count information and constitute lpage_info arrays. To hide this implementation details, we are now using __gfn_to_rmap() which includes likely(level == PT_PAGE_TABLE_LEVEL) heuristics; this is not good because we know that it always fails for higher levels. Furthermore, when we traverse rmap arrays to write protect pages during dirty logging, the current layout reduces the locality of their elements by placing write_count next to rmap_pde in lpage_info. This patch mitigates this problem by decoupling rmap_pde from lpage_info write_count and making the rmap array two dimensional which holds the old rmap_pde elements in it. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/ia64/kvm/kvm-ia64.c|8 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++-- arch/x86/kvm/mmu.c |9 +++-- arch/x86/kvm/x86.c |4 ++-- include/linux/kvm_host.h|3 +-- virt/kvm/kvm_main.c | 25 - 7 files changed, 31 insertions(+), 28 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 8ca7261..b17eaa1 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1376,8 +1376,8 @@ static void kvm_release_vm_pages(struct kvm *kvm) kvm_for_each_memslot(memslot, slots) { base_gfn = memslot-base_gfn; for (j = 0; j memslot-npages; j++) { - if (memslot-rmap[j]) - put_page((struct page *)memslot-rmap[j]); + if (memslot-rmap[0][j]) + put_page((struct page *)memslot-rmap[0][j]); } } } @@ -1591,12 +1591,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, kvm_set_pmt_entry(kvm, base_gfn + i, pfn PAGE_SHIFT, _PAGE_AR_RWX | _PAGE_MA_WB); - memslot-rmap[i] = (unsigned long)pfn_to_page(pfn); + memslot-rmap[0][i] = (unsigned long)pfn_to_page(pfn); } else { kvm_set_pmt_entry(kvm, base_gfn + i, GPFN_PHYS_MMIO | (pfn PAGE_SHIFT), _PAGE_MA_UC); - memslot-rmap[i] = 0; + memslot-rmap[0][i] = 0; } } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 783cd35..81f9036 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -631,7 +631,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, goto out_unlock; hpte[0] = (hpte[0] ~HPTE_V_ABSENT) | HPTE_V_VALID; - rmap = memslot-rmap[gfn - memslot-base_gfn]; + rmap = memslot-rmap[0][gfn - memslot-base_gfn]; lock_rmap(rmap); /* Check if we might have been invalidated; let the guest retry if so */ @@ -693,7 +693,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, if (hva = start hva end) { gfn_t gfn_offset = (hva - start) PAGE_SHIFT; - ret = handler(kvm, memslot-rmap[gfn_offset], + ret = handler(kvm, memslot-rmap[0][gfn_offset], memslot-base_gfn + gfn_offset); retval |= ret; } @@ -928,7 +928,7 @@ long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) unsigned long *rmapp, *map; preempt_disable(); - rmapp = memslot-rmap; + rmapp = memslot-rmap[0]; map = memslot-dirty_bitmap; for (i = 0; i memslot-npages; ++i) { if (kvm_test_clear_dirty(kvm, rmapp)) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 5f3c60b..4df9b4a 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -103,7 +103,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, if (!memslot || (memslot-flags KVM_MEMSLOT_INVALID)) return; - rmap = real_vmalloc_addr(memslot-rmap[gfn - memslot-base_gfn]); + rmap = real_vmalloc_addr(memslot-rmap[0][gfn - memslot-base_gfn]); lock_rmap(rmap); head = *rmap KVMPPC_RMAP_INDEX; @@ -199,7 +199,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, if (!slot_is_aligned(memslot, psize)) return H_PARAMETER; slot_fn = gfn - memslot-base_gfn; - rmap = memslot-rmap[slot_fn]; + rmap = memslot-rmap[0][slot_fn]; if
[Bug 42636] PCI passthrough does not work with AMD iommu for PCI device
https://bugzilla.kernel.org/show_bug.cgi?id=42636 Alex Williamson alex.william...@redhat.com changed: What|Removed |Added CC||alex.william...@redhat.com --- Comment #1 from Alex Williamson alex.william...@redhat.com 2012-01-23 14:56:55 --- +-14.4-[05]--+-06.0 Intel Corporation 82557/8/9/0/1 Ethernet Pro100 |\-07.0 RaLink RT2800 802.11n PCI [0.610382] AMD-Vi: DEV_SELECT devid: 00:14.4 flags:00 [0.610384] AMD-Vi: DEV_ALIAS_RANGE devid: 05:00.0 flags:00 devid_to: 00:14.4 [0.610385] AMD-Vi: DEV_RANGE_END devid: 05:1f.7 The devices are behind a PCIe-to-PCI bridge (00:14.4), so both devices get aliased to the same devices. You'll need to either add both devices to the guest or sequester the other device by binding it to pci-stub. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42635] PCIe passthrough broken with AMD iommu after s2disk / resume
https://bugzilla.kernel.org/show_bug.cgi?id=42635 Alex Williamson alex.william...@redhat.com changed: What|Removed |Added CC||alex.william...@redhat.com --- Comment #1 from Alex Williamson alex.william...@redhat.com 2012-01-23 15:01:29 --- Between the first and second paragraph, you seem to be saying that the device never works in the guest on the second invocation of the guest. Is that true? What is the device? -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 0/4] VM86 testcase and run_tests.sh
This adds a test case for taskswitches into/out of VM86. This test case currently fails on KVM, it passes with TCG. I'll send out KVM fixes together with this series. I also included a small shell script that just runs tests and prints a PASS/FAIL message for each. I've been using this script locally for a while, but maybe someone else finds it handy, too. Kevin Wolf (4): Add run_tests.sh Add taskswitch testcases to unittest.cfg Fix i386 build x86/taskswitch_vm86: Task switches into/out of VM86 config-i386.mak |3 +- lib/x86/desc.c| 39 +- lib/x86/desc.h| 36 lib/x86/vm.c |4 +- lib/x86/vm.h |1 + run_tests.sh | 107 + x86/taskswitch_vm86.c | 59 +++ x86/unittests.cfg | 18 8 files changed, 227 insertions(+), 40 deletions(-) create mode 100755 run_tests.sh create mode 100644 x86/taskswitch_vm86.c -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 1/4] Add run_tests.sh
This adds a convenient way to run all tests without having to set up Autotest. Signed-off-by: Kevin Wolf kw...@redhat.com --- run_tests.sh | 107 ++ 1 files changed, 107 insertions(+), 0 deletions(-) create mode 100755 run_tests.sh diff --git a/run_tests.sh b/run_tests.sh new file mode 100755 index 000..8d152b0 --- /dev/null +++ b/run_tests.sh @@ -0,0 +1,107 @@ +#!/bin/bash + +testroot=x86 +config=$testroot/unittests.cfg +qemu=${qemu:-qemu-system-x86_64} +verbose=0 + +function run() +{ +local testname=$1 +local groups=$2 +local smp=$3 +local kernel=$4 +local opts=$5 + +if [ -z $testname ]; then +return +fi + +if [ -n $only_group ] ! grep -q $only_group $groups; then +return +fi + +cmdline=$qemu -display none -enable-kvm -device testdev,chardev=testlog -chardev stdio,id=testlog -kernel $kernel -smp $smp $opts +if [ $verbose != 0 ]; then +echo $cmdline +fi + +# extra_params in the config file may contain backticks that need to be +# expanded, so use eval to start qemu +eval $cmdline test.log + +if [ $? == 0 ]; then +echo PASS $1 +else +echo FAIL $1 +fi +} + +function run_all() +{ +local config=$1 +local testname +local smp +local kernel +local opts +local groups + +exec {config_fd}$config + +while read -u $config_fd line; do +if [[ $line =~ ^\[(.*)\]$ ]]; then +run $testname $groups $smp $kernel $opts +testname=${BASH_REMATCH[1]} +smp=1 +kernel= +opts= +groups= +elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then +kernel=$testroot/${BASH_REMATCH[1]} +elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then +smp=${BASH_REMATCH[1]} +elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then +opts=${BASH_REMATCH[1]} +elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then +groups=${BASH_REMATCH[1]} +fi +done + +run $testname $groups $smp $kernel $opts + +exec {config_fd}- +} + +function usage() +{ +cat EOF + +Usage: $0 [-g group] [-h] [-v] + +-g: Only execute tests in the given group +-h: Output this help text +-v: Enables verbose mode + +EOF +} + +echo test.log +while getopts g:hv opt; do +case $opt in +g) +only_group=$OPTARG +;; +h) +usage +exit +;; +v) +verbose=1 +;; +*) +exit +;; +esac +done + +run_all $config -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/4] Add taskswitch testcases to unittest.cfg
Signed-off-by: Kevin Wolf kw...@redhat.com --- x86/unittests.cfg | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/x86/unittests.cfg b/x86/unittests.cfg index 065020a..dac7d44 100644 --- a/x86/unittests.cfg +++ b/x86/unittests.cfg @@ -64,6 +64,18 @@ file = svm.flat smp = 2 extra_params = -cpu qemu64,-svm +[taskswitch] +file = taskswitch.flat +smp = 2 +extra_params = -cpu qemu64,-svm +groups = task + +[taskswitch2] +file = taskswitch2.flat +smp = 2 +extra_params = -cpu qemu64,-svm +groups = task + [kvmclock_test] file = kvmclock_test.flat smp = 2 -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 3/4] Fix i386 build
Commit 1d946e07 removed idt, but left a reference to idt in i386-only code. Signed-off-by: Kevin Wolf kw...@redhat.com --- lib/x86/desc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/x86/desc.c b/lib/x86/desc.c index c268955..770c250 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -329,7 +329,7 @@ void setup_gdt(void) static void set_idt_task_gate(int vec, u16 sel) { -idt_entry_t *e = idt[vec]; +idt_entry_t *e = boot_idt[vec]; memset(e, 0, sizeof *e); -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86
This adds a test case that jumps into VM86 by iret-ing to a TSS and back to Protected Mode using a task gate in the IDT. Signed-off-by: Kevin Wolf kw...@redhat.com --- config-i386.mak |3 +- lib/x86/desc.c| 37 +- lib/x86/desc.h| 36 + lib/x86/vm.c |4 +- lib/x86/vm.h |1 + x86/taskswitch_vm86.c | 59 + x86/unittests.cfg |6 + 7 files changed, 107 insertions(+), 39 deletions(-) create mode 100644 x86/taskswitch_vm86.c diff --git a/config-i386.mak b/config-i386.mak index de52f3d..b5c3b9c 100644 --- a/config-i386.mak +++ b/config-i386.mak @@ -5,9 +5,10 @@ ldarch = elf32-i386 CFLAGS += -D__i386__ CFLAGS += -I $(KERNELDIR)/include -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat $(TEST_DIR)/taskswitch_vm86.flat include config-x86-common.mak $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o diff --git a/lib/x86/desc.c b/lib/x86/desc.c index 770c250..c4a3607 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -27,41 +27,6 @@ typedef struct { u8 base_high; } gdt_entry_t; -typedef struct { - u16 prev; - u16 res1; - u32 esp0; - u16 ss0; - u16 res2; - u32 esp1; - u16 ss1; - u16 res3; - u32 esp2; - u16 ss2; - u16 res4; - u32 cr3; - u32 eip; - u32 eflags; - u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; - u16 es; - u16 res5; - u16 cs; - u16 res6; - u16 ss; - u16 res7; - u16 ds; - u16 res8; - u16 fs; - u16 res9; - u16 gs; - u16 res10; - u16 ldt; - u16 res11; - u16 t:1; - u16 res12:15; - u16 iomap_base; -} tss32_t; - extern idt_entry_t boot_idt[256]; void set_idt_entry(int vec, void *addr, int dpl) @@ -327,7 +292,7 @@ void setup_gdt(void) .Lflush2: ::r(0x10)); } -static void set_idt_task_gate(int vec, u16 sel) +void set_idt_task_gate(int vec, u16 sel) { idt_entry_t *e = boot_idt[vec]; diff --git a/lib/x86/desc.h b/lib/x86/desc.h index 0b4897c..f819452 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -24,6 +24,41 @@ struct ex_regs { unsigned long rflags; }; +typedef struct { + u16 prev; + u16 res1; + u32 esp0; + u16 ss0; + u16 res2; + u32 esp1; + u16 ss1; + u16 res3; + u32 esp2; + u16 ss2; + u16 res4; + u32 cr3; + u32 eip; + u32 eflags; + u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; + u16 es; + u16 res5; + u16 cs; + u16 res6; + u16 ss; + u16 res7; + u16 ds; + u16 res8; + u16 fs; + u16 res9; + u16 gs; + u16 res10; + u16 ldt; + u16 res11; + u16 t:1; + u16 res12:15; + u16 iomap_base; +} tss32_t; + #define ASM_TRY(catch) \ movl $0, %%gs:4 \n\t \ .pushsection .data.ex \n\t\ @@ -44,6 +79,7 @@ unsigned exception_error_code(void); void set_idt_entry(int vec, void *addr, int dpl); void set_idt_sel(int vec, u16 sel); void set_gdt_entry(int num, u32 base, u32 limit, u8 access, u8 gran); +void set_idt_task_gate(int vec, u16 sel); void set_intr_task_gate(int e, void *fn); void print_current_tss_info(void); void handle_exception(u8 v, void (*func)(struct ex_regs *regs)); diff --git a/lib/x86/vm.c b/lib/x86/vm.c index abbb0c9..aae044a 100644 --- a/lib/x86/vm.c +++ b/lib/x86/vm.c @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0); +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_PSE, 0); } void install_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0); +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0); } diff --git a/lib/x86/vm.h b/lib/x86/vm.h index bf8fd52..aebc5c3 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -13,6 +13,7 @@ #define PTE_PRESENT (1ull 0) #define PTE_PSE (1ull 7) #define PTE_WRITE (1ull 1) +#define PTE_USER(1ull 2) #define PTE_ADDR(0xff000ull) void setup_vm(); diff --git a/x86/taskswitch_vm86.c b/x86/taskswitch_vm86.c new file mode 100644 index 000..363cb00 --- /dev/null +++ b/x86/taskswitch_vm86.c @@ -0,0 +1,59 @@ +#include libcflat.h +#include desc.h +#include
[PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks
Currently, all task switches check privileges against the DPL of the TSS. This is only correct for jmp/call to a TSS. If a task gate is used, the DPL of this take gate is used for the check instead. Exceptions, external interrupts and iret shouldn't perform any check. This patch fixes the problem for VMX. For SVM, the logic used to determine the source of the task switch is buggy, so we can't pass useful information to the emulator there and just disable the check in all cases. Signed-off-by: Kevin Wolf kw...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |2 +- arch/x86/include/asm/kvm_host.h|4 +- arch/x86/kvm/emulate.c | 51 +++- arch/x86/kvm/svm.c |3 +- arch/x86/kvm/vmx.c |5 ++- arch/x86/kvm/x86.c |6 ++-- 6 files changed, 55 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index ab4092e..c8a9cf3 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt *ctxt); #define EMULATION_INTERCEPTED 2 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt); int emulator_task_switch(struct x86_emulate_ctxt *ctxt, -u16 tss_selector, int reason, +u16 tss_selector, int idt_index, int reason, bool has_error_code, u32 error_code); int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq); #endif /* _ASM_X86_KVM_X86_EMULATE_H */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 52d6640..0533fc4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); -int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, - bool has_error_code, u32 error_code); +int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index, + int reason, bool has_error_code, u32 error_code); int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 05a562b..1b98a2c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, return 1; } +static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt, +u16 index, struct kvm_desc_struct *desc) +{ + struct kvm_desc_ptr dt; + ulong addr; + + ctxt-ops-get_idt(ctxt, dt); + + if (dt.size index * 8 + 7) + return emulate_gp(ctxt, index 3 | 0x2); + + addr = dt.address + index * 8; + return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc, + ctxt-exception); +} + static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, u16 selector, struct desc_ptr *dt) { @@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt, } static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt, - u16 tss_selector, int reason, + u16 tss_selector, int idt_index, int reason, bool has_error_code, u32 error_code) { struct x86_emulate_ops *ops = ctxt-ops; @@ -2360,6 +2376,7 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt, ulong old_tss_base = ops-get_cached_segment_base(ctxt, VCPU_SREG_TR); u32 desc_limit; + int dpl; /* FIXME: old_tss_base == ~0 ? */ @@ -2372,12 +2389,32 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt, /* FIXME: check that next_tss_desc is tss */ - if (reason != TASK_SWITCH_IRET) { - if ((tss_selector 3) next_tss_desc.dpl || - ops-cpl(ctxt) next_tss_desc.dpl) - return emulate_gp(ctxt, 0); + /* +* Check privileges. The three cases are task switch caused by... +* +* 1. Software interrupt: Check against DPL of the task gate +* 2. Exception/IRQ/iret: No check is performed +* 3. jmp/call: Check agains DPL of the TSS +*/ + dpl = -1; + if (reason == TASK_SWITCH_GATE) { + if (idt_index != -1) { + struct kvm_desc_struct task_gate_desc; + + ret = read_interrupt_descriptor(ctxt, idt_index, + task_gate_desc); + if
[PATCH 2/3] KVM: x86 emulator: VM86 segments must have DPL 3
Setting the segment DPL to 0 for at least the VM86 code segment makes the VM entry fail on VMX. Signed-off-by: Kevin Wolf kw...@redhat.com --- arch/x86/kvm/emulate.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1b98a2c..833969e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1243,6 +1243,8 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, seg_desc.type = 3; seg_desc.p = 1; seg_desc.s = 1; + if (ctxt-mode == X86EMUL_MODE_VM86) + seg_desc.dpl = 3; goto load; } -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch
Task switches can switch between Protected Mode and VM86. The current mode must be updated during the task switch emulation so that the new segment selectors are interpreted correctly and privilege checks succeed. Signed-off-by: Kevin Wolf kw...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c | 17 + arch/x86/kvm/x86.c |6 ++ 3 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index c8a9cf3..4a21c7d 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -176,6 +176,7 @@ struct x86_emulate_ops { void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr); int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val); + void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val); int (*cpl)(struct x86_emulate_ctxt *ctxt); int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest); int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 833969e..52fce89 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2273,6 +2273,23 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt, return emulate_gp(ctxt, 0); ctxt-_eip = tss-eip; ctxt-eflags = tss-eflags | 2; + + /* +* If we're switching between Protected Mode and VM86, we need to make +* sure to update the mode before loading the segment descriptors so +* that the selectors are interpreted correctly. +* +* Need to get it to the vcpu struct immediately because it influences +* the CPL which is checked when loading the segment descriptors. +*/ + if (ctxt-eflags X86_EFLAGS_VM) + ctxt-mode = X86EMUL_MODE_VM86; + else + ctxt-mode = X86EMUL_MODE_PROT32; + + ctxt-ops-set_rflags(ctxt, ctxt-eflags); + + /* General purpose registers */ ctxt-regs[VCPU_REGS_RAX] = tss-eax; ctxt-regs[VCPU_REGS_RCX] = tss-ecx; ctxt-regs[VCPU_REGS_RDX] = tss-edx; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dc3e945..502b5c3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4040,6 +4040,11 @@ static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val) return res; } +static void emulator_set_rflags(struct x86_emulate_ctxt *ctxt, ulong val) +{ + kvm_set_rflags(emul_to_vcpu(ctxt), val); +} + static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt) { return kvm_x86_ops-get_cpl(emul_to_vcpu(ctxt)); @@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = { .set_idt = emulator_set_idt, .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, + .set_rflags = emulator_set_rflags, .cpl = emulator_get_cpl, .get_dr = emulator_get_dr, .set_dr = emulator_set_dr, -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86
On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote: This adds a test case that jumps into VM86 by iret-ing to a TSS and back to Protected Mode using a task gate in the IDT. Can you add the test case to taskswitch2.c? Signed-off-by: Kevin Wolf kw...@redhat.com --- config-i386.mak |3 +- lib/x86/desc.c| 37 +- lib/x86/desc.h| 36 + lib/x86/vm.c |4 +- lib/x86/vm.h |1 + x86/taskswitch_vm86.c | 59 + x86/unittests.cfg |6 + 7 files changed, 107 insertions(+), 39 deletions(-) create mode 100644 x86/taskswitch_vm86.c diff --git a/config-i386.mak b/config-i386.mak index de52f3d..b5c3b9c 100644 --- a/config-i386.mak +++ b/config-i386.mak @@ -5,9 +5,10 @@ ldarch = elf32-i386 CFLAGS += -D__i386__ CFLAGS += -I $(KERNELDIR)/include -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat $(TEST_DIR)/taskswitch_vm86.flat include config-x86-common.mak $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o diff --git a/lib/x86/desc.c b/lib/x86/desc.c index 770c250..c4a3607 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -27,41 +27,6 @@ typedef struct { u8 base_high; } gdt_entry_t; -typedef struct { - u16 prev; - u16 res1; - u32 esp0; - u16 ss0; - u16 res2; - u32 esp1; - u16 ss1; - u16 res3; - u32 esp2; - u16 ss2; - u16 res4; - u32 cr3; - u32 eip; - u32 eflags; - u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; - u16 es; - u16 res5; - u16 cs; - u16 res6; - u16 ss; - u16 res7; - u16 ds; - u16 res8; - u16 fs; - u16 res9; - u16 gs; - u16 res10; - u16 ldt; - u16 res11; - u16 t:1; - u16 res12:15; - u16 iomap_base; -} tss32_t; - extern idt_entry_t boot_idt[256]; void set_idt_entry(int vec, void *addr, int dpl) @@ -327,7 +292,7 @@ void setup_gdt(void) .Lflush2: ::r(0x10)); } -static void set_idt_task_gate(int vec, u16 sel) +void set_idt_task_gate(int vec, u16 sel) { idt_entry_t *e = boot_idt[vec]; diff --git a/lib/x86/desc.h b/lib/x86/desc.h index 0b4897c..f819452 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -24,6 +24,41 @@ struct ex_regs { unsigned long rflags; }; +typedef struct { + u16 prev; + u16 res1; + u32 esp0; + u16 ss0; + u16 res2; + u32 esp1; + u16 ss1; + u16 res3; + u32 esp2; + u16 ss2; + u16 res4; + u32 cr3; + u32 eip; + u32 eflags; + u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; + u16 es; + u16 res5; + u16 cs; + u16 res6; + u16 ss; + u16 res7; + u16 ds; + u16 res8; + u16 fs; + u16 res9; + u16 gs; + u16 res10; + u16 ldt; + u16 res11; + u16 t:1; + u16 res12:15; + u16 iomap_base; +} tss32_t; + #define ASM_TRY(catch) \ movl $0, %%gs:4 \n\t \ .pushsection .data.ex \n\t\ @@ -44,6 +79,7 @@ unsigned exception_error_code(void); void set_idt_entry(int vec, void *addr, int dpl); void set_idt_sel(int vec, u16 sel); void set_gdt_entry(int num, u32 base, u32 limit, u8 access, u8 gran); +void set_idt_task_gate(int vec, u16 sel); void set_intr_task_gate(int e, void *fn); void print_current_tss_info(void); void handle_exception(u8 v, void (*func)(struct ex_regs *regs)); diff --git a/lib/x86/vm.c b/lib/x86/vm.c index abbb0c9..aae044a 100644 --- a/lib/x86/vm.c +++ b/lib/x86/vm.c @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0); +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_PSE, 0); } void install_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0); +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0); } diff --git a/lib/x86/vm.h b/lib/x86/vm.h index bf8fd52..aebc5c3 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -13,6 +13,7 @@ #define PTE_PRESENT (1ull 0) #define PTE_PSE (1ull 7) #define PTE_WRITE (1ull 1) +#define PTE_USER(1ull 2) #define PTE_ADDR(0xff000ull) void setup_vm(); diff --git a/x86/taskswitch_vm86.c b/x86/taskswitch_vm86.c
Continuous reboots on qemu-kvm master
Hi all, I get continuous reboots on my guest system, including these dmesg entries: [ 31.770538] device tap0 entered promiscuous mode [ 31.770554] br0: port 2(tap0) entering learning state [ 39.259921] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 39.259936] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 39.259946] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 44.870691] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 44.870801] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 44.870901] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 46.727081] br0: port 2(tap0) entering forwarding state [ 50.481469] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 50.481583] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 50.481685] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 55.827950] br0: port 2(tap0) entering disabled state [ 55.828110] device tap0 left promiscuous mode [ 55.828200] br0: port 2(tap0) entering disabled state My ./configure is: ./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png --disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl --audio-card-list= --audio-drv-list= --enable-sdl --disable-xen --disable-brlapi --disable-bluez --disable-nptl --disable-curl --disable-guest-agent --disable-guest-base --disable-werror --disable-attr My qemu cmdline is: /usr/X11R6/bin/qemu-system-x86_64 -serial /dev/ttyS2 -readconfig /etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0 -device usb-tablet -drive file=/dev/sda2,cache=off -m 1536 -net nic -net tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -L /usr/X11R6/share/qemu -boot c -localtime -enable-kvm Was fine with qemu-kvm-1.0 and the same options! Best regards, Erik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86
Am 23.01.2012 17:10, schrieb Gleb Natapov: On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote: This adds a test case that jumps into VM86 by iret-ing to a TSS and back to Protected Mode using a task gate in the IDT. Can you add the test case to taskswitch2.c? That's actually what I intended to do at first, but there's nothing to share and having a clean environment that can't interfere with other tests feels nicer. What would we gain from merging the files? Kevin Signed-off-by: Kevin Wolf kw...@redhat.com --- config-i386.mak |3 +- lib/x86/desc.c| 37 +- lib/x86/desc.h| 36 + lib/x86/vm.c |4 +- lib/x86/vm.h |1 + x86/taskswitch_vm86.c | 59 + x86/unittests.cfg |6 + 7 files changed, 107 insertions(+), 39 deletions(-) create mode 100644 x86/taskswitch_vm86.c diff --git a/config-i386.mak b/config-i386.mak index de52f3d..b5c3b9c 100644 --- a/config-i386.mak +++ b/config-i386.mak @@ -5,9 +5,10 @@ ldarch = elf32-i386 CFLAGS += -D__i386__ CFLAGS += -I $(KERNELDIR)/include -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat $(TEST_DIR)/taskswitch_vm86.flat include config-x86-common.mak $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o diff --git a/lib/x86/desc.c b/lib/x86/desc.c index 770c250..c4a3607 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -27,41 +27,6 @@ typedef struct { u8 base_high; } gdt_entry_t; -typedef struct { -u16 prev; -u16 res1; -u32 esp0; -u16 ss0; -u16 res2; -u32 esp1; -u16 ss1; -u16 res3; -u32 esp2; -u16 ss2; -u16 res4; -u32 cr3; -u32 eip; -u32 eflags; -u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; -u16 es; -u16 res5; -u16 cs; -u16 res6; -u16 ss; -u16 res7; -u16 ds; -u16 res8; -u16 fs; -u16 res9; -u16 gs; -u16 res10; -u16 ldt; -u16 res11; -u16 t:1; -u16 res12:15; -u16 iomap_base; -} tss32_t; - extern idt_entry_t boot_idt[256]; void set_idt_entry(int vec, void *addr, int dpl) @@ -327,7 +292,7 @@ void setup_gdt(void) .Lflush2: ::r(0x10)); } -static void set_idt_task_gate(int vec, u16 sel) +void set_idt_task_gate(int vec, u16 sel) { idt_entry_t *e = boot_idt[vec]; diff --git a/lib/x86/desc.h b/lib/x86/desc.h index 0b4897c..f819452 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -24,6 +24,41 @@ struct ex_regs { unsigned long rflags; }; +typedef struct { +u16 prev; +u16 res1; +u32 esp0; +u16 ss0; +u16 res2; +u32 esp1; +u16 ss1; +u16 res3; +u32 esp2; +u16 ss2; +u16 res4; +u32 cr3; +u32 eip; +u32 eflags; +u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; +u16 es; +u16 res5; +u16 cs; +u16 res6; +u16 ss; +u16 res7; +u16 ds; +u16 res8; +u16 fs; +u16 res9; +u16 gs; +u16 res10; +u16 ldt; +u16 res11; +u16 t:1; +u16 res12:15; +u16 iomap_base; +} tss32_t; + #define ASM_TRY(catch) \ movl $0, %%gs:4 \n\t \ .pushsection .data.ex \n\t\ @@ -44,6 +79,7 @@ unsigned exception_error_code(void); void set_idt_entry(int vec, void *addr, int dpl); void set_idt_sel(int vec, u16 sel); void set_gdt_entry(int num, u32 base, u32 limit, u8 access, u8 gran); +void set_idt_task_gate(int vec, u16 sel); void set_intr_task_gate(int e, void *fn); void print_current_tss_info(void); void handle_exception(u8 v, void (*func)(struct ex_regs *regs)); diff --git a/lib/x86/vm.c b/lib/x86/vm.c index abbb0c9..aae044a 100644 --- a/lib/x86/vm.c +++ b/lib/x86/vm.c @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0); +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_PSE, 0); } void install_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0); +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0); } diff --git a/lib/x86/vm.h b/lib/x86/vm.h index bf8fd52..aebc5c3 100644 --- a/lib/x86/vm.h +++ b/lib/x86/vm.h @@ -13,6 +13,7 @@ #define PTE_PRESENT (1ull 0) #define PTE_PSE (1ull 7)
[Bug 42635] PCIe passthrough broken with AMD iommu after s2disk / resume
https://bugzilla.kernel.org/show_bug.cgi?id=42635 --- Comment #2 from Klaus Mueller kmuel...@justmail.de 2012-01-23 16:18:33 --- The device never works in the guest after s2disk/resume cycle of the host. But it always works if it's used by the host itself - even after a s2disk/resume cycle. The device is (as seen by the host): 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06) Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 45 Region 0: I/O ports at ee00 [size=256] Region 2: Memory at fdcff000 (64-bit, prefetchable) [size=4K] Region 4: Memory at fdcf8000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: fee0f00c Data: 4122 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s 512ns, L1 64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 unlimited, L1 64us ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Vector table: BAR=4 offset= PBA: BAR=4 offset=0800 Capabilities: [d0] Vital Product Data Unknown small resource type 00, will not decode more. Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb:Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0:Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 12-34-56-78-12-34-56-78 Kernel driver in use: pci-stub One more thing: the device can't be used in the guest anymore, too, after a reboot of the guest by rebooting the guest itself. I have to shutdown the guest with virsh shutdown and virsh start to get it working again. But this is useless, if there was a suspend / resume cycle of the host between. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86
On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote: Am 23.01.2012 17:10, schrieb Gleb Natapov: On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote: This adds a test case that jumps into VM86 by iret-ing to a TSS and back to Protected Mode using a task gate in the IDT. Can you add the test case to taskswitch2.c? Running one test to check all aspects of taskswitch emulation. That's actually what I intended to do at first, but there's nothing to share and having a clean environment that can't interfere with other tests feels nicer. What would we gain from merging the files? Kevin Signed-off-by: Kevin Wolf kw...@redhat.com --- config-i386.mak |3 +- lib/x86/desc.c| 37 +- lib/x86/desc.h| 36 + lib/x86/vm.c |4 +- lib/x86/vm.h |1 + x86/taskswitch_vm86.c | 59 + x86/unittests.cfg |6 + 7 files changed, 107 insertions(+), 39 deletions(-) create mode 100644 x86/taskswitch_vm86.c diff --git a/config-i386.mak b/config-i386.mak index de52f3d..b5c3b9c 100644 --- a/config-i386.mak +++ b/config-i386.mak @@ -5,9 +5,10 @@ ldarch = elf32-i386 CFLAGS += -D__i386__ CFLAGS += -I $(KERNELDIR)/include -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat $(TEST_DIR)/taskswitch_vm86.flat include config-x86-common.mak $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o diff --git a/lib/x86/desc.c b/lib/x86/desc.c index 770c250..c4a3607 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -27,41 +27,6 @@ typedef struct { u8 base_high; } gdt_entry_t; -typedef struct { - u16 prev; - u16 res1; - u32 esp0; - u16 ss0; - u16 res2; - u32 esp1; - u16 ss1; - u16 res3; - u32 esp2; - u16 ss2; - u16 res4; - u32 cr3; - u32 eip; - u32 eflags; - u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; - u16 es; - u16 res5; - u16 cs; - u16 res6; - u16 ss; - u16 res7; - u16 ds; - u16 res8; - u16 fs; - u16 res9; - u16 gs; - u16 res10; - u16 ldt; - u16 res11; - u16 t:1; - u16 res12:15; - u16 iomap_base; -} tss32_t; - extern idt_entry_t boot_idt[256]; void set_idt_entry(int vec, void *addr, int dpl) @@ -327,7 +292,7 @@ void setup_gdt(void) .Lflush2: ::r(0x10)); } -static void set_idt_task_gate(int vec, u16 sel) +void set_idt_task_gate(int vec, u16 sel) { idt_entry_t *e = boot_idt[vec]; diff --git a/lib/x86/desc.h b/lib/x86/desc.h index 0b4897c..f819452 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -24,6 +24,41 @@ struct ex_regs { unsigned long rflags; }; +typedef struct { + u16 prev; + u16 res1; + u32 esp0; + u16 ss0; + u16 res2; + u32 esp1; + u16 ss1; + u16 res3; + u32 esp2; + u16 ss2; + u16 res4; + u32 cr3; + u32 eip; + u32 eflags; + u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; + u16 es; + u16 res5; + u16 cs; + u16 res6; + u16 ss; + u16 res7; + u16 ds; + u16 res8; + u16 fs; + u16 res9; + u16 gs; + u16 res10; + u16 ldt; + u16 res11; + u16 t:1; + u16 res12:15; + u16 iomap_base; +} tss32_t; + #define ASM_TRY(catch) \ movl $0, %%gs:4 \n\t \ .pushsection .data.ex \n\t\ @@ -44,6 +79,7 @@ unsigned exception_error_code(void); void set_idt_entry(int vec, void *addr, int dpl); void set_idt_sel(int vec, u16 sel); void set_gdt_entry(int num, u32 base, u32 limit, u8 access, u8 gran); +void set_idt_task_gate(int vec, u16 sel); void set_intr_task_gate(int e, void *fn); void print_current_tss_info(void); void handle_exception(u8 v, void (*func)(struct ex_regs *regs)); diff --git a/lib/x86/vm.c b/lib/x86/vm.c index abbb0c9..aae044a 100644 --- a/lib/x86/vm.c +++ b/lib/x86/vm.c @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0); +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_PSE, 0); } void install_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0); +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0); } diff --git
[Bug 42636] PCI passthrough does not work with AMD iommu for PCI device
https://bugzilla.kernel.org/show_bug.cgi?id=42636 --- Comment #2 from Klaus Mueller kmuel...@justmail.de 2012-01-23 16:26:49 --- Well, I did exactly what you proposed, but I got the same error again, as I tried to apply both devices. That's the relevant part of the xml file: hostdev mode='subsystem' type='pci' managed='yes' source address domain='0x' bus='0x05' slot='0x06' function='0x0'/ /source /hostdev hostdev mode='subsystem' type='pci' managed='yes' source address domain='0x' bus='0x05' slot='0x07' function='0x0'/ /source /hostdev Did I make a mistake? -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86
Am 23.01.2012 17:22, schrieb Gleb Natapov: On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote: Am 23.01.2012 17:10, schrieb Gleb Natapov: On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote: This adds a test case that jumps into VM86 by iret-ing to a TSS and back to Protected Mode using a task gate in the IDT. Can you add the test case to taskswitch2.c? Running one test to check all aspects of taskswitch emulation. (We all know that top-posting is disliked, but middle-posting looks even crazier!) Does having one test provide any value in and of itself? It's just an implementation detail of the test suite. When testing the KVM patches I ran all three test cases with './run_tests.sh -g task', which is hopefully easy enough. Kevin That's actually what I intended to do at first, but there's nothing to share and having a clean environment that can't interfere with other tests feels nicer. What would we gain from merging the files? Kevin Signed-off-by: Kevin Wolf kw...@redhat.com --- config-i386.mak |3 +- lib/x86/desc.c| 37 +- lib/x86/desc.h| 36 + lib/x86/vm.c |4 +- lib/x86/vm.h |1 + x86/taskswitch_vm86.c | 59 + x86/unittests.cfg |6 + 7 files changed, 107 insertions(+), 39 deletions(-) create mode 100644 x86/taskswitch_vm86.c diff --git a/config-i386.mak b/config-i386.mak index de52f3d..b5c3b9c 100644 --- a/config-i386.mak +++ b/config-i386.mak @@ -5,9 +5,10 @@ ldarch = elf32-i386 CFLAGS += -D__i386__ CFLAGS += -I $(KERNELDIR)/include -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat $(TEST_DIR)/taskswitch_vm86.flat include config-x86-common.mak $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o diff --git a/lib/x86/desc.c b/lib/x86/desc.c index 770c250..c4a3607 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -27,41 +27,6 @@ typedef struct { u8 base_high; } gdt_entry_t; -typedef struct { - u16 prev; - u16 res1; - u32 esp0; - u16 ss0; - u16 res2; - u32 esp1; - u16 ss1; - u16 res3; - u32 esp2; - u16 ss2; - u16 res4; - u32 cr3; - u32 eip; - u32 eflags; - u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; - u16 es; - u16 res5; - u16 cs; - u16 res6; - u16 ss; - u16 res7; - u16 ds; - u16 res8; - u16 fs; - u16 res9; - u16 gs; - u16 res10; - u16 ldt; - u16 res11; - u16 t:1; - u16 res12:15; - u16 iomap_base; -} tss32_t; - extern idt_entry_t boot_idt[256]; void set_idt_entry(int vec, void *addr, int dpl) @@ -327,7 +292,7 @@ void setup_gdt(void) .Lflush2: ::r(0x10)); } -static void set_idt_task_gate(int vec, u16 sel) +void set_idt_task_gate(int vec, u16 sel) { idt_entry_t *e = boot_idt[vec]; diff --git a/lib/x86/desc.h b/lib/x86/desc.h index 0b4897c..f819452 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -24,6 +24,41 @@ struct ex_regs { unsigned long rflags; }; +typedef struct { + u16 prev; + u16 res1; + u32 esp0; + u16 ss0; + u16 res2; + u32 esp1; + u16 ss1; + u16 res3; + u32 esp2; + u16 ss2; + u16 res4; + u32 cr3; + u32 eip; + u32 eflags; + u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; + u16 es; + u16 res5; + u16 cs; + u16 res6; + u16 ss; + u16 res7; + u16 ds; + u16 res8; + u16 fs; + u16 res9; + u16 gs; + u16 res10; + u16 ldt; + u16 res11; + u16 t:1; + u16 res12:15; + u16 iomap_base; +} tss32_t; + #define ASM_TRY(catch) \ movl $0, %%gs:4 \n\t \ .pushsection .data.ex \n\t\ @@ -44,6 +79,7 @@ unsigned exception_error_code(void); void set_idt_entry(int vec, void *addr, int dpl); void set_idt_sel(int vec, u16 sel); void set_gdt_entry(int num, u32 base, u32 limit, u8 access, u8 gran); +void set_idt_task_gate(int vec, u16 sel); void set_intr_task_gate(int e, void *fn); void print_current_tss_info(void); void handle_exception(u8 v, void (*func)(struct ex_regs *regs)); diff --git a/lib/x86/vm.c b/lib/x86/vm.c index abbb0c9..aae044a 100644 --- a/lib/x86/vm.c +++ b/lib/x86/vm.c @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3, unsigned long phys, void *virt) { -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0); +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | PTE_PSE, 0); } void install_page(unsigned long *cr3, unsigned long phys,
Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86
On Mon, Jan 23, 2012 at 05:32:59PM +0100, Kevin Wolf wrote: Am 23.01.2012 17:22, schrieb Gleb Natapov: On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote: Am 23.01.2012 17:10, schrieb Gleb Natapov: On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote: This adds a test case that jumps into VM86 by iret-ing to a TSS and back to Protected Mode using a task gate in the IDT. Can you add the test case to taskswitch2.c? Running one test to check all aspects of taskswitch emulation. (We all know that top-posting is disliked, but middle-posting looks even crazier!) Inserting replies at random places is a new cool thing! Does having one test provide any value in and of itself? It's just an implementation detail of the test suite. When testing the KVM patches I ran all three test cases with './run_tests.sh -g task', which is hopefully easy enough. I think it does. I do not have to use external script to combine tests on the same topic or even remember that such script exists. We do not create separate tests to test each instruction emulation either. And I usually run qemu not on the same machine I compile it on, so I need special tricks to make those test script work. Of course if putting this code into existing test file is hard separate test is OK, but is this really the case here? Kevin That's actually what I intended to do at first, but there's nothing to share and having a clean environment that can't interfere with other tests feels nicer. What would we gain from merging the files? Kevin Signed-off-by: Kevin Wolf kw...@redhat.com --- config-i386.mak |3 +- lib/x86/desc.c| 37 +- lib/x86/desc.h| 36 + lib/x86/vm.c |4 +- lib/x86/vm.h |1 + x86/taskswitch_vm86.c | 59 + x86/unittests.cfg |6 + 7 files changed, 107 insertions(+), 39 deletions(-) create mode 100644 x86/taskswitch_vm86.c diff --git a/config-i386.mak b/config-i386.mak index de52f3d..b5c3b9c 100644 --- a/config-i386.mak +++ b/config-i386.mak @@ -5,9 +5,10 @@ ldarch = elf32-i386 CFLAGS += -D__i386__ CFLAGS += -I $(KERNELDIR)/include -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat $(TEST_DIR)/taskswitch_vm86.flat include config-x86-common.mak $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o diff --git a/lib/x86/desc.c b/lib/x86/desc.c index 770c250..c4a3607 100644 --- a/lib/x86/desc.c +++ b/lib/x86/desc.c @@ -27,41 +27,6 @@ typedef struct { u8 base_high; } gdt_entry_t; -typedef struct { -u16 prev; -u16 res1; -u32 esp0; -u16 ss0; -u16 res2; -u32 esp1; -u16 ss1; -u16 res3; -u32 esp2; -u16 ss2; -u16 res4; -u32 cr3; -u32 eip; -u32 eflags; -u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; -u16 es; -u16 res5; -u16 cs; -u16 res6; -u16 ss; -u16 res7; -u16 ds; -u16 res8; -u16 fs; -u16 res9; -u16 gs; -u16 res10; -u16 ldt; -u16 res11; -u16 t:1; -u16 res12:15; -u16 iomap_base; -} tss32_t; - extern idt_entry_t boot_idt[256]; void set_idt_entry(int vec, void *addr, int dpl) @@ -327,7 +292,7 @@ void setup_gdt(void) .Lflush2: ::r(0x10)); } -static void set_idt_task_gate(int vec, u16 sel) +void set_idt_task_gate(int vec, u16 sel) { idt_entry_t *e = boot_idt[vec]; diff --git a/lib/x86/desc.h b/lib/x86/desc.h index 0b4897c..f819452 100644 --- a/lib/x86/desc.h +++ b/lib/x86/desc.h @@ -24,6 +24,41 @@ struct ex_regs { unsigned long rflags; }; +typedef struct { +u16 prev; +u16 res1; +u32 esp0; +u16 ss0; +u16 res2; +u32 esp1; +u16 ss1; +u16 res3; +u32 esp2; +u16 ss2; +u16 res4; +u32 cr3; +u32 eip; +u32 eflags; +u32 eax, ecx, edx, ebx, esp, ebp, esi, edi; +u16 es; +u16 res5; +u16 cs; +u16 res6; +u16 ss; +u16 res7; +u16 ds; +u16 res8; +u16 fs; +u16 res9; +u16 gs; +u16 res10; +u16 ldt; +u16 res11; +u16 t:1; +u16 res12:15; +u16 iomap_base; +} tss32_t; + #define
[PATCH v2 0/5] VFIO core framework
This series includes the core framework for the VFIO driver. VFIO is a userspace driver interface meant to replace both the KVM device assignment code as well as interfaces like UIO. Please see patch 1/5 for a complete description of VFIO, what it can do, and how it's designed. This series can also be found here: git://github.com/awilliam/linux-vfio.git vfio-next This plus the PCI VFIO bus driver for exposing PCI devices to userspace can be found here: git://github.com/awilliam/linux-vfio.git vfio-next-staging or here for a linux-2.6.git based tree: git://github.com/awilliam/linux-vfio.git vfio-linux-staging A fully functional qemu driver for doing non-KVM based PCI device assignment can be found here: git://github.com/awilliam/qemu-vfio.git vfio-ng I'd like to propose VFIO for inclusion in Linux 3.4, starting with this core framework series. Once we have agreement on these, I'll split up and post the VFIO PCI bus driver for inclusion as well. I can also host the above vfio-next branch for inclusion in linux-next. Please review and comment. Thanks, Alex v2: Interrupt setup ioctl rework based on comments by Konrad. The interrupt ioctls are no longer exclusively targeted at eventfds, allowing for more flexibility of other vfio bus drivers making use of alternate mechanisms. Also updated vfio_iommu_info to report common IOMMU geometry fields that we know we're going to need for Freescale PAMU. Additional ioctls and fields to be added via flags as they're implemented in the IOMMU API. --- Alex Williamson (5): vfio: VFIO core Kconfig and Makefile vfio: VFIO core IOMMU mapping support vfio: VFIO core group interface vfio: VFIO core header vfio: Introduce documentation for VFIO driver Documentation/ioctl/ioctl-number.txt |1 Documentation/vfio.txt | 359 ++ MAINTAINERS |8 drivers/Kconfig |2 drivers/Makefile |1 drivers/vfio/Kconfig |8 drivers/vfio/Makefile|3 drivers/vfio/vfio_iommu.c| 611 + drivers/vfio/vfio_main.c | 1248 ++ drivers/vfio/vfio_private.h | 36 + include/linux/vfio.h | 395 +++ 11 files changed, 2672 insertions(+), 0 deletions(-) create mode 100644 Documentation/vfio.txt create mode 100644 drivers/vfio/Kconfig create mode 100644 drivers/vfio/Makefile create mode 100644 drivers/vfio/vfio_iommu.c create mode 100644 drivers/vfio/vfio_main.c create mode 100644 drivers/vfio/vfio_private.h create mode 100644 include/linux/vfio.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] vfio: Introduce documentation for VFIO driver
Including rationale for design, example usage and API description. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/vfio.txt | 359 1 files changed, 359 insertions(+), 0 deletions(-) create mode 100644 Documentation/vfio.txt diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt new file mode 100644 index 000..4dfccf6 --- /dev/null +++ b/Documentation/vfio.txt @@ -0,0 +1,359 @@ +VFIO - Virtual Function I/O[1] +--- +Many modern system now provide DMA and interrupt remapping facilities +to help ensure I/O devices behave within the boundaries they've been +allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, +POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC +systems such as Freescale PAMU. The VFIO driver is an IOMMU/device +agnostic framework for exposing direct device access to userspace, in +a secure, IOMMU protected environment. In other words, this allows +safe[2], non-privileged, userspace drivers. + +Why do we want that? Virtual machines often make use of direct device +access (device assignment) when configured for the highest possible +I/O performance. From a device and host perspective, this simply +turns the VM into a userspace driver, with the benefits of +significantly reduced latency, higher bandwidth, and direct use of +bare-metal device drivers[3]. + +Some applications, particularly in the high performance computing +field, also benefit from low-overhead, direct device access from +userspace. Examples include network adapters (often non-TCP/IP based) +and compute accelerators. Prior to VFIO, these drivers had to either +go through the full development cycle to become proper upstream +driver, be maintained out of tree, or make use of the UIO framework, +which has no notion of IOMMU protection, limited interrupt support, +and requires root privileges to access things like PCI configuration +space. + +The VFIO driver framework intends to unify these, replacing both the +KVM PCI specific device assignment code as well as provide a more +secure, more featureful userspace driver environment than UIO. + +Groups, Devices, and IOMMUs +--- + +Userspace drivers are primarily concerned with manipulating individual +devices and setting up mappings in the IOMMU for those devices. +Unfortunately, the IOMMU doesn't always have the granularity to track +mappings for an individual device. Sometimes this is a topology +barrier, such as a PCIe-to-PCI bridge interposing the device and +IOMMU, other times this is an IOMMU limitation. In any case, the +reality is that devices are not always independent with respect to the +IOMMU. Translations setup for one device can be used by another +device in these scenarios. + +The IOMMU API exposes these relationships by identifying an IOMMU +group for these dependent devices. Devices on the same bus with the +same IOMMU group (or just group for this document) are not isolated +from each other with respect to DMA mappings. For userspace usage, +this logically means that instead of being able to grant ownership of +an individual device, we must grant ownership of a group, which may +contain one or more devices. + +These groups therefore become a fundamental component of VFIO and the +working unit we use for exposing devices and granting permissions to +userspace. In addition, VFIO make efforts to ensure the integrity of +the group for user access. This includes ensuring that all devices +within the group are controlled by VFIO (vs native host drivers) +before allowing a user to access any member of the group or the IOMMU +mappings, as well as maintaining the group viability as devices are +dynamically added or removed from the system. + +To access a device through VFIO, a user must open a character device +for the group that the device belongs to and then issue an ioctl to +retrieve a file descriptor for the individual device. This ensures +that the user has permissions to the group (file based access to the +/dev entry) and allows a check point at which VFIO can deny access to +the device if the group is not viable (all devices within the group +controlled by VFIO). A file descriptor for the IOMMU is obtain in the +same fashion. + +VFIO defines a standard set of APIs for access to devices and a +modular interface for adding new, bus-specific VFIO device drivers. +We call these VFIO bus drivers. The vfio-pci module is an example +of a bus driver for exposing PCI devices. When the bus driver module +is loaded it enumerates all of the devices for it's bus, registering +each device with the vfio core along with a set of callbacks. For +buses that support hotplug, the bus driver also adds itself to the +notification chain for such events. The callbacks registered with +each device
[PATCH v2 2/5] vfio: VFIO core header
This defines both the user and bus driver APIs. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/ioctl/ioctl-number.txt |1 include/linux/vfio.h | 395 ++ 2 files changed, 396 insertions(+), 0 deletions(-) create mode 100644 include/linux/vfio.h diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 2550754..79c5ef8 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -88,6 +88,7 @@ Code Seq#(hex) Include FileComments and kernel/power/user.c '8'all SNP8023 advanced NIC card mailto:m...@solidum.com +';'64-83 linux/vfio.h '@'00-0F linux/radeonfb.hconflict! '@'00-0F drivers/video/aty/aty128fb.cconflict! 'A'00-1F linux/apm_bios.hconflict! diff --git a/include/linux/vfio.h b/include/linux/vfio.h new file mode 100644 index 000..797dbe4 --- /dev/null +++ b/include/linux/vfio.h @@ -0,0 +1,395 @@ +/* + * VFIO API definition + * + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson alex.william...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#ifndef VFIO_H +#define VFIO_H + +#include linux/types.h + +#ifdef __KERNEL__ /* Internal VFIO-core/bus driver API */ + +/** + * struct vfio_device_ops - VFIO bus driver device callbacks + * + * @match: Return true if buf describes the device + * @claim: Force driver to attach to device + * @open: Called when userspace receives file descriptor for device + * @release: Called when userspace releases file descriptor for device + * @read: Perform read(2) on device file descriptor + * @write: Perform write(2) on device file descriptor + * @ioctl: Perform ioctl(2) on device file descriptor, supporting VFIO_DEVICE_* + * operations documented below + * @mmap: Perform mmap(2) on a region of the device file descriptor + */ +struct vfio_device_ops { + bool(*match)(struct device *dev, const char *buf); + int (*claim)(struct device *dev); + int (*open)(void *device_data); + void(*release)(void *device_data); + ssize_t (*read)(void *device_data, char __user *buf, + size_t count, loff_t *ppos); + ssize_t (*write)(void *device_data, const char __user *buf, +size_t count, loff_t *size); + long(*ioctl)(void *device_data, unsigned int cmd, +unsigned long arg); + int (*mmap)(void *device_data, struct vm_area_struct *vma); +}; + +/** + * vfio_group_add_dev() - Add a device to the vfio-core + * + * @dev: Device to add + * @ops: VFIO bus driver callbacks for device + * + * This registration makes the VFIO core aware of the device, creates + * groups objects as required and exposes chardevs under /dev/vfio. + * + * Return 0 on success, errno on failure. + */ +extern int vfio_group_add_dev(struct device *dev, + const struct vfio_device_ops *ops); + +/** + * vfio_group_del_dev() - Remove a device from the vfio-core + * + * @dev: Device to remove + * + * Remove a device previously added to the VFIO core, removing groups + * and chardevs as necessary. + */ +extern void vfio_group_del_dev(struct device *dev); + +/** + * vfio_bind_dev() - Indicate device is bound to the VFIO bus driver and + * register private data structure for ops callbacks. + * + * @dev: Device being bound + * @device_data: VFIO bus driver private data + * + * This registration indicate that a device previously registered with + * vfio_group_add_dev() is now available for use by the VFIO core. When + * all devices within a group are available, the group is viable and my + * be used by userspace drivers. Typically called from VFIO bus driver + * probe function. + * + * Return 0 on success, errno on failure + */ +extern int vfio_bind_dev(struct device *dev, void *device_data); + +/** + * vfio_unbind_dev() - Indicate device is unbinding from VFIO bus driver + * + * @dev: Device being unbound + * + * De-registration of the device previously registered with vfio_bind_dev() + * from VFIO. Upon completion, the device is no longer available for use by + * the VFIO core. Typically called from the VFIO bus driver remove function. + * The VFIO core will attempt to release the device from users and may take + * measures to free the device and/or block as necessary. + * + * Returns pointer to private device_data structure registered with + * vfio_bind_dev(). + */ +extern void *vfio_unbind_dev(struct device *dev); + + +/** + * offsetofend(TYPE, MEMBER) + * + * @TYPE: The type of the structure + * @MEMBER: The member within the
[PATCH v2 3/5] vfio: VFIO core group interface
This provides the base group management with conduits to the IOMMU driver and VFIO bus drivers. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/vfio_main.c| 1248 +++ drivers/vfio/vfio_private.h | 36 + 2 files changed, 1284 insertions(+), 0 deletions(-) create mode 100644 drivers/vfio/vfio_main.c create mode 100644 drivers/vfio/vfio_private.h diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c new file mode 100644 index 000..fcd6476 --- /dev/null +++ b/drivers/vfio/vfio_main.c @@ -0,0 +1,1248 @@ +/* + * VFIO framework + * + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson alex.william...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, p...@cisco.com + */ + +#include linux/cdev.h +#include linux/compat.h +#include linux/device.h +#include linux/file.h +#include linux/anon_inodes.h +#include linux/fs.h +#include linux/idr.h +#include linux/iommu.h +#include linux/mm.h +#include linux/module.h +#include linux/slab.h +#include linux/string.h +#include linux/uaccess.h +#include linux/vfio.h +#include linux/wait.h + +#include vfio_private.h + +#define DRIVER_VERSION 0.2 +#define DRIVER_AUTHOR Alex Williamson alex.william...@redhat.com +#define DRIVER_DESCVFIO - User Level meta-driver + +static struct vfio { + dev_t devt; + struct cdev cdev; + struct list_headgroup_list; + struct mutexlock; + struct kref kref; + struct class*class; + struct idr idr; + wait_queue_head_t release_q; +} vfio; + +static const struct file_operations vfio_group_fops; + +struct vfio_group { + dev_t devt; + unsigned intgroupid; + struct bus_type *bus; + struct vfio_iommu *iommu; + struct list_headdevice_list; + struct list_headiommu_next; + struct list_headgroup_next; + struct device *dev; + struct kobject *devices_kobj; + int refcnt; + booltainted; +}; + +struct vfio_device { + struct device *dev; + const struct vfio_device_ops*ops; + struct vfio_group *group; + struct list_headdevice_next; + boolattached; + booldeleteme; + int refcnt; + void*device_data; +}; + +/* + * Helper functions called under vfio.lock + */ + +/* Return true if any devices within a group are opened */ +static bool __vfio_group_devs_inuse(struct vfio_group *group) +{ + struct list_head *pos; + + list_for_each(pos, group-device_list) { + struct vfio_device *device; + + device = list_entry(pos, struct vfio_device, device_next); + if (device-refcnt) + return true; + } + return false; +} + +/* + * Return true if any of the groups attached to an iommu are opened. + * We can only tear apart merged groups when nothing is left open. + */ +static bool __vfio_iommu_groups_inuse(struct vfio_iommu *iommu) +{ + struct list_head *pos; + + list_for_each(pos, iommu-group_list) { + struct vfio_group *group; + + group = list_entry(pos, struct vfio_group, iommu_next); + if (group-refcnt) + return true; + } + return false; +} + +/* + * An iommu is in use if it has a file descriptor open or if any of + * the groups assigned to the iommu have devices open. + */ +static bool __vfio_iommu_inuse(struct vfio_iommu *iommu) +{ + struct list_head *pos; + + if (iommu-refcnt) + return true; + + list_for_each(pos, iommu-group_list) { + struct vfio_group *group; + + group = list_entry(pos, struct vfio_group, iommu_next); + + if (__vfio_group_devs_inuse(group)) + return true; + } + return false; +} + +static void __vfio_group_set_iommu(struct vfio_group *group, + struct vfio_iommu *iommu) +{ + if (group-iommu) + list_del(group-iommu_next); + if (iommu) + list_add(group-iommu_next, iommu-group_list); + + group-iommu = iommu; +} + +static void __vfio_iommu_detach_dev(struct vfio_iommu *iommu, + struct vfio_device *device) +{ + if (WARN_ON(!iommu-domain
[PATCH v2 4/5] vfio: VFIO core IOMMU mapping support
Backing for operations on the IOMMU object, including DMA mapping and unmapping. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/vfio_iommu.c | 611 + 1 files changed, 611 insertions(+), 0 deletions(-) create mode 100644 drivers/vfio/vfio_iommu.c diff --git a/drivers/vfio/vfio_iommu.c b/drivers/vfio/vfio_iommu.c new file mode 100644 index 000..49e6b2d --- /dev/null +++ b/drivers/vfio/vfio_iommu.c @@ -0,0 +1,611 @@ +/* + * VFIO: IOMMU DMA mapping support + * + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson alex.william...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, p...@cisco.com + */ + +#include linux/compat.h +#include linux/device.h +#include linux/fs.h +#include linux/iommu.h +#include linux/module.h +#include linux/mm.h +#include linux/sched.h +#include linux/slab.h +#include linux/uaccess.h +#include linux/vfio.h +#include linux/workqueue.h + +#include vfio_private.h + +struct vfio_dma_map_entry { + struct list_headlist; + dma_addr_t iova; /* Device address */ + unsigned long vaddr; /* Process virtual addr */ + longnpage; /* Number of pages */ + int prot; /* IOMMU_READ/WRITE */ +}; + +/* + * This code handles mapping and unmapping of user data buffers + * into DMA'ble space using the IOMMU + */ + +#define NPAGE_TO_SIZE(npage) ((size_t)(npage) PAGE_SHIFT) + +struct vwork { + struct mm_struct*mm; + longnpage; + struct work_struct work; +}; + +/* delayed decrement/increment for locked_vm */ +static void vfio_lock_acct_bg(struct work_struct *work) +{ + struct vwork *vwork = container_of(work, struct vwork, work); + struct mm_struct *mm; + + mm = vwork-mm; + down_write(mm-mmap_sem); + mm-locked_vm += vwork-npage; + up_write(mm-mmap_sem); + mmput(mm); + kfree(vwork); +} + +static void vfio_lock_acct(long npage) +{ + struct vwork *vwork; + struct mm_struct *mm; + + if (!current-mm) + return; /* process exited */ + + if (down_write_trylock(current-mm-mmap_sem)) { + current-mm-locked_vm += npage; + up_write(current-mm-mmap_sem); + return; + } + + /* +* Couldn't get mmap_sem lock, so must setup to update +* mm-locked_vm later. If locked_vm were atomic, we +* wouldn't need this silliness +*/ + vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL); + if (!vwork) + return; + mm = get_task_mm(current); + if (!mm) { + kfree(vwork); + return; + } + INIT_WORK(vwork-work, vfio_lock_acct_bg); + vwork-mm = mm; + vwork-npage = npage; + schedule_work(vwork-work); +} + +/* + * Some mappings aren't backed by a struct page, for example an mmap'd + * MMIO range for our own or another device. These use a different + * pfn conversion and shouldn't be tracked as locked pages. + */ +static bool is_invalid_reserved_pfn(unsigned long pfn) +{ + if (pfn_valid(pfn)) { + bool reserved; + struct page *tail = pfn_to_page(pfn); + struct page *head = compound_trans_head(tail); + reserved = !!(PageReserved(head)); + if (head != tail) { + /* +* head is not a dangling pointer +* (compound_trans_head takes care of that) +* but the hugepage may have been split +* from under us (and we may not hold a +* reference count on the head page so it can +* be reused before we run PageReferenced), so +* we've to check PageTail before returning +* what we just read. +*/ + smp_rmb(); + if (PageTail(tail)) + return reserved; + } + return PageReserved(tail); + } + + return true; +} + +static int put_pfn(unsigned long pfn, int prot) +{ + if (!is_invalid_reserved_pfn(pfn)) { + struct page *page = pfn_to_page(pfn); + if (prot IOMMU_WRITE) + SetPageDirty(page); + put_page(page); + return 1; + } + return 0; +} + +/* Unmap DMA region */ +static long __vfio_dma_do_unmap(struct vfio_iommu *iommu, dma_addr_t
[PATCH v2 5/5] vfio: VFIO core Kconfig and Makefile
Enable the base code. Signed-off-by: Alex Williamson alex.william...@redhat.com --- MAINTAINERS |8 drivers/Kconfig |2 ++ drivers/Makefile |1 + drivers/vfio/Kconfig |8 drivers/vfio/Makefile |3 +++ 5 files changed, 22 insertions(+), 0 deletions(-) create mode 100644 drivers/vfio/Kconfig create mode 100644 drivers/vfio/Makefile diff --git a/MAINTAINERS b/MAINTAINERS index df8cb66..2f3a5c8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7129,6 +7129,14 @@ S: Maintained F: Documentation/filesystems/vfat.txt F: fs/fat/ +VFIO DRIVER +M: Alex Williamson alex.william...@redhat.com +L: kvm@vger.kernel.org +S: Maintained +F: Documentation/vfio.txt +F: drivers/vfio/ +F: include/linux/vfio.h + VIDEOBUF2 FRAMEWORK M: Pawel Osciak pa...@osciak.com M: Marek Szyprowski m.szyprow...@samsung.com diff --git a/drivers/Kconfig b/drivers/Kconfig index d5138e6..f168bf3 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -114,6 +114,8 @@ source drivers/auxdisplay/Kconfig source drivers/uio/Kconfig +source drivers/vfio/Kconfig + source drivers/vlynq/Kconfig source drivers/virtio/Kconfig diff --git a/drivers/Makefile b/drivers/Makefile index 71a1f16..6be03a1 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_ATM) += atm/ obj-$(CONFIG_FUSION) += message/ obj-y += firewire/ obj-$(CONFIG_UIO) += uio/ +obj-$(CONFIG_VFIO) += vfio/ obj-y += cdrom/ obj-y += auxdisplay/ obj-$(CONFIG_PCCARD) += pcmcia/ diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig new file mode 100644 index 000..9acb1e7 --- /dev/null +++ b/drivers/vfio/Kconfig @@ -0,0 +1,8 @@ +menuconfig VFIO + tristate VFIO Non-Privileged userspace driver framework + depends on IOMMU_API + help + VFIO provides a framework for secure userspace device drivers. + See Documentation/vfio.txt for more details. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile new file mode 100644 index 000..088faf1 --- /dev/null +++ b/drivers/vfio/Makefile @@ -0,0 +1,3 @@ +vfio-y := vfio_main.o vfio_iommu.o + +obj-$(CONFIG_VFIO) := vfio.o -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for Tuesday 24
Please send in any agenda items you are interested in covering. Cheers, Markus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/20] [PULL] qemu-kvm.git uq/master queue
On 01/20/2012 11:26 AM, Marcelo Tosatti wrote: The following changes since commit 8c4ec5c0269bda18bb777a64b2008088d1c632dc: pxa2xx_keypad: fix unbalanced parenthesis. (2012-01-17 02:14:42 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master Applied. Thanks. Regards, Anthony Liguori Jan Kiszka (18): msi: Generalize msix_supported to msi_supported kvm: Move kvmclock into hw/kvm folder apic: Stop timer on reset apic: Inject external NMI events via LINT1 apic: Introduce apic_report_irq_delivered apic: Factor out base class for KVM reuse apic: Open-code timer save/restore i8259: Completely privatize PicState i8259: Factor out base class for KVM reuse ioapic: Drop post-load irr initialization ioapic: Factor out base class for KVM reuse memory: Introduce memory_region_init_reservation kvm: Introduce core services for in-kernel irqchip support kvm: x86: Establish IRQ0 override control kvm: x86: Add user space part for in-kernel APIC kvm: x86: Add user space part for in-kernel i8259 kvm: x86: Add user space part for in-kernel IOAPIC kvm: Activate in-kernel irqchip support Vadim Rozenfeld (2): hyper-v: introduce Hyper-V support infrastructure. hyper-v: initialize Hyper-V CPUID leaves. Makefile.objs |2 +- Makefile.target|8 +- configure |1 + cpus.c |6 +- hw/apic.c | 356 ++-- hw/apic.h |1 + hw/apic_common.c | 302 ++ hw/apic_internal.h | 115 + hw/i8259.c | 163 -- hw/i8259_common.c | 147 + hw/i8259_internal.h| 76 + hw/ioapic.c| 142 ++-- hw/ioapic_common.c | 104 hw/ioapic_internal.h | 97 +++ hw/kvm/apic.c | 138 hw/{kvmclock.c = kvm/clock.c} |4 +- hw/{kvmclock.h = kvm/clock.h} |0 hw/kvm/i8259.c | 128 ++ hw/kvm/ioapic.c| 114 + hw/msi.c |8 + hw/msi.h |2 + hw/msix.c |9 +- hw/msix.h |2 - hw/pc.c| 20 ++- hw/pc.h|8 +- hw/pc_piix.c | 69 +++- kvm-all.c | 154 + kvm-stub.c |5 + kvm.h | 14 ++ memory.c | 36 memory.h | 16 ++ qemu-config.c |4 + qemu-options.hx|5 +- sysemu.h |1 - target-i386/cpuid.c| 14 ++ target-i386/hyperv.c | 64 +++ target-i386/hyperv.h | 43 + target-i386/kvm.c | 114 +- trace-events |2 +- vl.c |1 - 40 files changed, 1902 insertions(+), 593 deletions(-) create mode 100644 hw/apic_common.c create mode 100644 hw/apic_internal.h create mode 100644 hw/i8259_common.c create mode 100644 hw/i8259_internal.h create mode 100644 hw/ioapic_common.c create mode 100644 hw/ioapic_internal.h create mode 100644 hw/kvm/apic.c rename hw/{kvmclock.c = kvm/clock.c} (98%) rename hw/{kvmclock.h = kvm/clock.h} (100%) create mode 100644 hw/kvm/i8259.c create mode 100644 hw/kvm/ioapic.c create mode 100644 target-i386/hyperv.c create mode 100644 target-i386/hyperv.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Factor out kvm_vcpu_kick to arch-generic code
On Thu, Jan 19, 2012 at 10:22:41PM -0500, Christoffer Dall wrote: The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns vcpu-wq; This patch applies to Linus' tree on the Linux 3.3-rc1 tag. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/ia64/include/asm/kvm_host.h|1 + arch/ia64/kvm/kvm-ia64.c| 15 --- arch/powerpc/include/asm/kvm_host.h |6 ++ arch/powerpc/kvm/powerpc.c | 12 ++-- arch/x86/kvm/x86.c | 17 - include/linux/kvm_host.h|8 virt/kvm/kvm_main.c | 23 +++ 7 files changed, 40 insertions(+), 42 deletions(-) diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index 2689ee5..06a5e91 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -365,6 +365,7 @@ struct thash_cb { }; struct kvm_vcpu_stat { + u32 halt_wakeup; }; struct kvm_vcpu_arch { diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 43f4c92..f22ffb6 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1851,21 +1851,6 @@ void kvm_arch_hardware_unsetup(void) { } -void kvm_vcpu_kick(struct kvm_vcpu *vcpu) -{ - int me; - int cpu = vcpu-cpu; - - if (waitqueue_active(vcpu-wq)) - wake_up_interruptible(vcpu-wq); - - me = get_cpu(); - if (cpu != me (unsigned) cpu nr_cpu_ids cpu_online(cpu)) - if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests)) - smp_send_reschedule(cpu); - put_cpu(); -} - int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) { return __apic_accept_irq(vcpu, irq-vector); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index bf8af5d..b687444 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -438,4 +438,10 @@ struct kvm_vcpu_arch { #define KVMPPC_VCPU_BUSY_IN_HOST 1 #define KVMPPC_VCPU_RUNNABLE 2 +#define __KVM_HAVE_ARCH_VCPU_GET_WQ 1 +static inline wait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.wqp; +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 607fbdf..30cd621 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -311,10 +311,7 @@ static void kvmppc_decrementer_func(unsigned long data) kvmppc_core_queue_dec(vcpu); - if (waitqueue_active(vcpu-arch.wqp)) { - wake_up_interruptible(vcpu-arch.wqp); - vcpu-stat.halt_wakeup++; - } + kvm_vcpu_kick(vcpu); } /* @@ -572,12 +569,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) kvmppc_core_queue_external(vcpu, irq); - if (waitqueue_active(vcpu-arch.wqp)) { - wake_up_interruptible(vcpu-arch.wqp); - vcpu-stat.halt_wakeup++; - } else if (vcpu-cpu != -1) { - smp_send_reschedule(vcpu-cpu); - } + kvm_vcpu_kick(vcpu); return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c38efd7..6de0af8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6688,23 +6688,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) kvm_cpu_has_interrupt(vcpu)); } -void kvm_vcpu_kick(struct kvm_vcpu *vcpu) -{ - int me; - int cpu = vcpu-cpu; - - if (waitqueue_active(vcpu-wq)) { - wake_up_interruptible(vcpu-wq); - ++vcpu-stat.halt_wakeup; - } - - me = get_cpu(); - if (cpu != me (unsigned)cpu nr_cpu_ids cpu_online(cpu)) - if (kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE) - smp_send_reschedule(cpu); - put_cpu(); -} - int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu) { return kvm_x86_ops-interrupt_allowed(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d526231..301ae34 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -407,6 +407,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn); void kvm_vcpu_block(struct kvm_vcpu *vcpu); +void kvm_vcpu_kick(struct kvm_vcpu *vcpu); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); void kvm_resched(struct kvm_vcpu *vcpu); void kvm_load_guest_fpu(struct
Re: [PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count
On Mon, Jan 23, 2012 at 07:42:04PM +0900, Takuya Yoshikawa wrote: The last one is an RFC patch: I think it is better to refactor the rmap things, if needed, before other architectures than x86 starts large pages support. Takuya Looks good to me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Continuous reboots on qemu-kvm master
On Mon, Jan 23, 2012 at 05:12:07PM +0100, erik.r...@rdsoftware.de wrote: Hi all, I get continuous reboots on my guest system, including these dmesg entries: [ 31.770538] device tap0 entered promiscuous mode [ 31.770554] br0: port 2(tap0) entering learning state [ 39.259921] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 39.259936] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 39.259946] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 44.870691] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 44.870801] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 44.870901] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 46.727081] br0: port 2(tap0) entering forwarding state [ 50.481469] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 50.481583] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 50.481685] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 55.827950] br0: port 2(tap0) entering disabled state [ 55.828110] device tap0 left promiscuous mode [ 55.828200] br0: port 2(tap0) entering disabled state My ./configure is: ./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png --disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl --audio-card-list= --audio-drv-list= --enable-sdl --disable-xen --disable-brlapi --disable-bluez --disable-nptl --disable-curl --disable-guest-agent --disable-guest-base --disable-werror --disable-attr My qemu cmdline is: /usr/X11R6/bin/qemu-system-x86_64 -serial /dev/ttyS2 -readconfig /etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0 -device usb-tablet -drive file=/dev/sda2,cache=off -m 1536 -net nic -net tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -L /usr/X11R6/share/qemu -boot c -localtime -enable-kvm Was fine with qemu-kvm-1.0 and the same options! Best regards, Erik Erik, Can you bisect to find the culprit, please? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Continuous reboots on qemu-kvm master
Marcelo Tosatti wrote: On Mon, Jan 23, 2012 at 05:12:07PM +0100, erik.r...@rdsoftware.de wrote: Hi all, I get continuous reboots on my guest system, including these dmesg entries: [ 31.770538] device tap0 entered promiscuous mode [ 31.770554] br0: port 2(tap0) entering learning state [ 39.259921] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 39.259936] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 39.259946] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 44.870691] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 44.870801] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 44.870901] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 46.727081] br0: port 2(tap0) entering forwarding state [ 50.481469] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 400080532d74 [ 50.481583] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 50.481685] kvm: 1517: cpu0 unhandled wrmsr: 0x0 data 0 [ 55.827950] br0: port 2(tap0) entering disabled state [ 55.828110] device tap0 left promiscuous mode [ 55.828200] br0: port 2(tap0) entering disabled state My ./configure is: ./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png --disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl --audio-card-list= --audio-drv-list= --enable-sdl --disable-xen --disable-brlapi --disable-bluez --disable-nptl --disable-curl --disable-guest-agent --disable-guest-base --disable-werror --disable-attr My qemu cmdline is: /usr/X11R6/bin/qemu-system-x86_64 -serial /dev/ttyS2 -readconfig /etc/ich9-ehci-uhci.cfg -device usb-host,bus=ehci.0 -device usb-tablet -drive file=/dev/sda2,cache=off -m 1536 -net nic -net tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -L /usr/X11R6/share/qemu -boot c -localtime -enable-kvm Was fine with qemu-kvm-1.0 and the same options! Best regards, Erik Erik, Can you bisect to find the culprit, please? I will try to do that. Currently I have to find another issue between 0.15.0 and 1.0 :-) After having found that, I will continue bisecting here :-) Best regards, Erik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Resolve unneeded diffs to upstream in pc-bios
None of those files have any meaning for today's qemu-kvm. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Note that I removed the binary patch to delete pc-bios/openbios-sparc. It seemms to have caused troubles getting this on the list. pc-bios/bios-vista.diff | 17 - pc-bios/bochs-manifest | 24 pc-bios/openbios-sparc | Bin 506966 - 0 bytes 3 files changed, 0 insertions(+), 41 deletions(-) delete mode 100644 pc-bios/bios-vista.diff delete mode 100644 pc-bios/bochs-manifest delete mode 100644 pc-bios/openbios-sparc diff --git a/pc-bios/bios-vista.diff b/pc-bios/bios-vista.diff deleted file mode 100644 index 684a310..000 --- a/pc-bios/bios-vista.diff +++ /dev/null @@ -1,17 +0,0 @@ -Index: rombios32.c -=== -RCS file: /cvsroot/bochs/bochs/bios/rombios32.c,v -retrieving revision 1.9 -diff -u -w -r1.9 rombios32.c rombios32.c 20 Feb 2007 09:36:55 - 1.9 -+++ rombios32.c 2 May 2007 06:07:31 - -@@ -1191,7 +1191,7 @@ - { - memcpy(h-signature, sig, 4); - h-length = cpu_to_le32(len); --h-revision = 0; -+h-revision = 1; - #ifdef BX_QEMU - memcpy(h-oem_id, QEMU , 6); - memcpy(h-oem_table_id, QEMU, 4); - diff --git a/pc-bios/bochs-manifest b/pc-bios/bochs-manifest deleted file mode 100644 index 1b25aa4..000 --- a/pc-bios/bochs-manifest +++ /dev/null @@ -1,24 +0,0 @@ -.cvsignore 1.2 -BIOS-bochs-latest1.145 -BIOS-bochs-legacy1.9 -Makefile.in 1.26 -VGABIOS-elpin-2.40 1.4 -VGABIOS-elpin-LICENSE1.3 -VGABIOS-lgpl-README 1.9 -VGABIOS-lgpl-latest 1.13 -VGABIOS-lgpl-latest-cirrus 1.5 -VGABIOS-lgpl-latest-cirrus-debug 1.5 -VGABIOS-lgpl-latest-debug1.9 -acpi-dsdt.dsl1.1 -acpi-dsdt.hex1.1 -apmbios.S1.5 -bios_usage 1.1 -biossums.c 1.3 -makesym.perl 1.1 -notes1.1 -rombios.c1.178 -rombios.h1.4 -rombios32.c 1.9 -rombios32.ld 1.1 -rombios32start.S 1.3 -usage.cc 1.4 diff --git a/pc-bios/openbios-sparc b/pc-bios/openbios-sparc deleted file mode 100644 index 7a729aa81ba39b3ed037ac7fad1db4616818738b.. GIT binary patch [ removed to shrink posting size ] -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Remove icache flush from cpu_physical_memory_rw
On 01/19/2012 12:04 PM, Jan Kiszka wrote: On 2012-01-19 18:54, Marcelo Tosatti wrote: On Thu, Jan 19, 2012 at 01:39:24PM +0100, Jan Kiszka wrote: This is at best a PPC topi but according to [1] even there unneeded. In any case, remove this diff to upstream, it should be handled there if actually needed. [1] ? Oops. 8- This is at best a PPC topi but according to [1] even there unneeded. In any case, remove this diff to upstream, it should be handled there if actually needed. [1] http://thread.gmane.org/gmane.comp.emulators.qemu/119022/focus=119086 That says that it's unneeded on (some?) IBM Power systems. We need it on Freescale chips. I submitted an upstream-QEMU patch to do this flush (referenced in that thread, still not applied) because I was seeing cache problems when loading images. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Videos for kvm forum 2010
Hello All, Non-development question, apologies if I am posting to the wrong list, but I cannot seem to find linux kvm forum 2010 videos at the following link: http://www.linux-kvm.org/page/KVM_Forum_2010 Is there some place else where they might be present ? Nick -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Hi, I think I've tracked down the bug that causes KVM_GET_SUPPORTED_CPUID failed: Argument list too long errors when using the kvm tool. Basically, this (possibly squished) code seems to be to blame: case 0xd: { int i; entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent maxnent i 64; ++i) { if (entry[i].eax == 0) continue; do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent; } break; } You can see there's a check whether entry[i].eax is 0, but it isn't until the next line that entry[i] is actually filled in. That means that whether or not an entry is filled in for the 0xd function is essentially random, and that can lead to the loss of valid entries. It also means that nent may be incremented too often, and since all 64 entries are iterated over, that can fill up the available storage and cause that error. I tested my theory by commenting out the if (100% failure rate) and moving it after do_cpuid_1_ent (100% success rate). Since this is a non-deterministic failure that isn't really conclusive, but I'm fairly confident my fix is correct. I don't know exactly what your procedure is for submitting patches, but one is attached. Gabe diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 77c9d86..35d7ae0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2414,9 +2414,9 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent maxnent i 64; ++i) { + do_cpuid_1_ent(entry[i], function, i); if (entry[i].eax == 0) continue; - do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent;
Fix for bug that causes KVM_GET_SUPPORTED_CPUID failed errors.
Sorry, forgot to add a subject. Gabe On Mon, Jan 23, 2012 at 9:18 PM, Gabe Black gabebl...@google.com wrote: Hi, I think I've tracked down the bug that causes KVM_GET_SUPPORTED_CPUID failed: Argument list too long errors when using the kvm tool. Basically, this (possibly squished) code seems to be to blame: case 0xd: { int i; entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent maxnent i 64; ++i) { if (entry[i].eax == 0) continue; do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent; } break; } You can see there's a check whether entry[i].eax is 0, but it isn't until the next line that entry[i] is actually filled in. That means that whether or not an entry is filled in for the 0xd function is essentially random, and that can lead to the loss of valid entries. It also means that nent may be incremented too often, and since all 64 entries are iterated over, that can fill up the available storage and cause that error. I tested my theory by commenting out the if (100% failure rate) and moving it after do_cpuid_1_ent (100% success rate). Since this is a non-deterministic failure that isn't really conclusive, but I'm fairly confident my fix is correct. I don't know exactly what your procedure is for submitting patches, but one is attached. Gabe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fix for bug that causes KVM_GET_SUPPORTED_CPUID failed errors.
The GET_SUPPORTED_CPUID bug has been fixed and shouldn't be happening from v3.2 onwards. Do you still see the issue in older versions? On Mon, 2012-01-23 at 21:20 -0800, Gabe Black wrote: Sorry, forgot to add a subject. Gabe On Mon, Jan 23, 2012 at 9:18 PM, Gabe Black gabebl...@google.com wrote: Hi, I think I've tracked down the bug that causes KVM_GET_SUPPORTED_CPUID failed: Argument list too long errors when using the kvm tool. Basically, this (possibly squished) code seems to be to blame: case 0xd: { int i; entry-flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; for (i = 1; *nent maxnent i 64; ++i) { if (entry[i].eax == 0) continue; do_cpuid_1_ent(entry[i], function, i); entry[i].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; ++*nent; } break; } You can see there's a check whether entry[i].eax is 0, but it isn't until the next line that entry[i] is actually filled in. That means that whether or not an entry is filled in for the 0xd function is essentially random, and that can lead to the loss of valid entries. It also means that nent may be incremented too often, and since all 64 entries are iterated over, that can fill up the available storage and cause that error. I tested my theory by commenting out the if (100% failure rate) and moving it after do_cpuid_1_ent (100% success rate). Since this is a non-deterministic failure that isn't really conclusive, but I'm fairly confident my fix is correct. I don't know exactly what your procedure is for submitting patches, but one is attached. Gabe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count
The last one is an RFC patch: I think it is better to refactor the rmap things, if needed, before other architectures than x86 starts large pages support. Takuya arch/ia64/kvm/kvm-ia64.c|8 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++-- arch/x86/kvm/mmu.c | 24 ++-- arch/x86/kvm/mmu_audit.c|4 +--- arch/x86/kvm/x86.c |4 ++-- include/linux/kvm_host.h| 10 -- virt/kvm/kvm_main.c | 29 + 8 files changed, 47 insertions(+), 42 deletions(-) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: MMU: Use gfn_to_rmap() in audit_write_protection()
We want to eliminate direct access to the rmap array. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu_audit.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 6eabae3..e62fa4f 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -190,15 +190,13 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp) { - struct kvm_memory_slot *slot; unsigned long *rmapp; u64 *spte; if (sp-role.direct || sp-unsync || sp-role.invalid) return; - slot = gfn_to_memslot(kvm, sp-gfn); - rmapp = slot-rmap[sp-gfn - slot-base_gfn]; + rmapp = gfn_to_rmap(kvm, sp-gfn, PT_PAGE_TABLE_LEVEL); spte = rmap_next(rmapp, NULL); while (spte) { -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: MMU: Use __gfn_to_rmap() in kvm_handle_hva()
We can hide the implementation details and treat every level uniformly. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 844fcce..0e82d9d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1133,14 +1133,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, gfn_t gfn_offset = (hva - start) PAGE_SHIFT; gfn_t gfn = memslot-base_gfn + gfn_offset; - ret = handler(kvm, memslot-rmap[gfn_offset], data); + ret = 0; - for (j = 0; j KVM_NR_PAGE_SIZES - 1; ++j) { - struct kvm_lpage_info *linfo; + for (j = PT_PAGE_TABLE_LEVEL; +j PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) { + unsigned long *rmapp; - linfo = lpage_info_slot(gfn, memslot, - PT_DIRECTORY_LEVEL + j); - ret |= handler(kvm, linfo-rmap_pde, data); + rmapp = __gfn_to_rmap(gfn, j, memslot); + ret |= handler(kvm, rmapp, data); } trace_kvm_age_page(hva, memslot, ret); retval |= ret; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/4] KVM: Decouple rmap_pde from lpage_info write_count
Though we have one rmap array for every level, those for large pages, called rmap_pde, are coupled with write_count information and constitute lpage_info arrays. To hide this implementation details, we are now using __gfn_to_rmap() which includes likely(level == PT_PAGE_TABLE_LEVEL) heuristics; this is not good because we know that it always fails for higher levels. Furthermore, when we traverse rmap arrays to write protect pages during dirty logging, the current layout reduces the locality of their elements by placing write_count next to rmap_pde in lpage_info. This patch mitigates this problem by decoupling rmap_pde from lpage_info write_count and making the rmap array two dimensional which holds the old rmap_pde elements in it. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/ia64/kvm/kvm-ia64.c|8 arch/powerpc/kvm/book3s_64_mmu_hv.c |6 +++--- arch/powerpc/kvm/book3s_hv_rm_mmu.c |4 ++-- arch/x86/kvm/mmu.c |9 +++-- arch/x86/kvm/x86.c |4 ++-- include/linux/kvm_host.h|3 +-- virt/kvm/kvm_main.c | 25 - 7 files changed, 31 insertions(+), 28 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 8ca7261..b17eaa1 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1376,8 +1376,8 @@ static void kvm_release_vm_pages(struct kvm *kvm) kvm_for_each_memslot(memslot, slots) { base_gfn = memslot-base_gfn; for (j = 0; j memslot-npages; j++) { - if (memslot-rmap[j]) - put_page((struct page *)memslot-rmap[j]); + if (memslot-rmap[0][j]) + put_page((struct page *)memslot-rmap[0][j]); } } } @@ -1591,12 +1591,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, kvm_set_pmt_entry(kvm, base_gfn + i, pfn PAGE_SHIFT, _PAGE_AR_RWX | _PAGE_MA_WB); - memslot-rmap[i] = (unsigned long)pfn_to_page(pfn); + memslot-rmap[0][i] = (unsigned long)pfn_to_page(pfn); } else { kvm_set_pmt_entry(kvm, base_gfn + i, GPFN_PHYS_MMIO | (pfn PAGE_SHIFT), _PAGE_MA_UC); - memslot-rmap[i] = 0; + memslot-rmap[0][i] = 0; } } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 783cd35..81f9036 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -631,7 +631,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, goto out_unlock; hpte[0] = (hpte[0] ~HPTE_V_ABSENT) | HPTE_V_VALID; - rmap = memslot-rmap[gfn - memslot-base_gfn]; + rmap = memslot-rmap[0][gfn - memslot-base_gfn]; lock_rmap(rmap); /* Check if we might have been invalidated; let the guest retry if so */ @@ -693,7 +693,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, if (hva = start hva end) { gfn_t gfn_offset = (hva - start) PAGE_SHIFT; - ret = handler(kvm, memslot-rmap[gfn_offset], + ret = handler(kvm, memslot-rmap[0][gfn_offset], memslot-base_gfn + gfn_offset); retval |= ret; } @@ -928,7 +928,7 @@ long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) unsigned long *rmapp, *map; preempt_disable(); - rmapp = memslot-rmap; + rmapp = memslot-rmap[0]; map = memslot-dirty_bitmap; for (i = 0; i memslot-npages; ++i) { if (kvm_test_clear_dirty(kvm, rmapp)) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 5f3c60b..4df9b4a 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -103,7 +103,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, if (!memslot || (memslot-flags KVM_MEMSLOT_INVALID)) return; - rmap = real_vmalloc_addr(memslot-rmap[gfn - memslot-base_gfn]); + rmap = real_vmalloc_addr(memslot-rmap[0][gfn - memslot-base_gfn]); lock_rmap(rmap); head = *rmap KVMPPC_RMAP_INDEX; @@ -199,7 +199,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags, if (!slot_is_aligned(memslot, psize)) return H_PARAMETER; slot_fn = gfn - memslot-base_gfn; - rmap = memslot-rmap[slot_fn]; + rmap = memslot-rmap[0][slot_fn]; if
Re: [PATCH] KVM: Factor out kvm_vcpu_kick to arch-generic code
On Thu, Jan 19, 2012 at 10:22:41PM -0500, Christoffer Dall wrote: The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns vcpu-wq; This patch applies to Linus' tree on the Linux 3.3-rc1 tag. Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com --- arch/ia64/include/asm/kvm_host.h|1 + arch/ia64/kvm/kvm-ia64.c| 15 --- arch/powerpc/include/asm/kvm_host.h |6 ++ arch/powerpc/kvm/powerpc.c | 12 ++-- arch/x86/kvm/x86.c | 17 - include/linux/kvm_host.h|8 virt/kvm/kvm_main.c | 23 +++ 7 files changed, 40 insertions(+), 42 deletions(-) diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index 2689ee5..06a5e91 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -365,6 +365,7 @@ struct thash_cb { }; struct kvm_vcpu_stat { + u32 halt_wakeup; }; struct kvm_vcpu_arch { diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 43f4c92..f22ffb6 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1851,21 +1851,6 @@ void kvm_arch_hardware_unsetup(void) { } -void kvm_vcpu_kick(struct kvm_vcpu *vcpu) -{ - int me; - int cpu = vcpu-cpu; - - if (waitqueue_active(vcpu-wq)) - wake_up_interruptible(vcpu-wq); - - me = get_cpu(); - if (cpu != me (unsigned) cpu nr_cpu_ids cpu_online(cpu)) - if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests)) - smp_send_reschedule(cpu); - put_cpu(); -} - int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) { return __apic_accept_irq(vcpu, irq-vector); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index bf8af5d..b687444 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -438,4 +438,10 @@ struct kvm_vcpu_arch { #define KVMPPC_VCPU_BUSY_IN_HOST 1 #define KVMPPC_VCPU_RUNNABLE 2 +#define __KVM_HAVE_ARCH_VCPU_GET_WQ 1 +static inline wait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.wqp; +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 607fbdf..30cd621 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -311,10 +311,7 @@ static void kvmppc_decrementer_func(unsigned long data) kvmppc_core_queue_dec(vcpu); - if (waitqueue_active(vcpu-arch.wqp)) { - wake_up_interruptible(vcpu-arch.wqp); - vcpu-stat.halt_wakeup++; - } + kvm_vcpu_kick(vcpu); } /* @@ -572,12 +569,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) kvmppc_core_queue_external(vcpu, irq); - if (waitqueue_active(vcpu-arch.wqp)) { - wake_up_interruptible(vcpu-arch.wqp); - vcpu-stat.halt_wakeup++; - } else if (vcpu-cpu != -1) { - smp_send_reschedule(vcpu-cpu); - } + kvm_vcpu_kick(vcpu); return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c38efd7..6de0af8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6688,23 +6688,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) kvm_cpu_has_interrupt(vcpu)); } -void kvm_vcpu_kick(struct kvm_vcpu *vcpu) -{ - int me; - int cpu = vcpu-cpu; - - if (waitqueue_active(vcpu-wq)) { - wake_up_interruptible(vcpu-wq); - ++vcpu-stat.halt_wakeup; - } - - me = get_cpu(); - if (cpu != me (unsigned)cpu nr_cpu_ids cpu_online(cpu)) - if (kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE) - smp_send_reschedule(cpu); - put_cpu(); -} - int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu) { return kvm_x86_ops-interrupt_allowed(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d526231..301ae34 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -407,6 +407,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn); void kvm_vcpu_block(struct kvm_vcpu *vcpu); +void kvm_vcpu_kick(struct kvm_vcpu *vcpu); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); void kvm_resched(struct kvm_vcpu *vcpu); void kvm_load_guest_fpu(struct
Re: [PATCH 0/4] KVM: Decouple rmap_pde from lpage_info write_count
On Mon, Jan 23, 2012 at 07:42:04PM +0900, Takuya Yoshikawa wrote: The last one is an RFC patch: I think it is better to refactor the rmap things, if needed, before other architectures than x86 starts large pages support. Takuya Looks good to me. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html