Re: [PATCH] KVM: VMX: Translate interrupt shadow when waiting on NMI window
On Wed, Apr 21, 2010 at 05:14:04PM +0200, Jan Kiszka wrote: No you don't. I was told that software should be prepared to handle NMI after MOV SS. What part of SDM does this contradict? I found nothing in latest SDM. [ updated to March 2010 version ] To sum up the scenario again, I think it started with • If the “NMI-window exiting” VM-execution control is 1, a VM exit occurs before execution of any instruction if there is no virtual-NMI blocking and there is no blocking of events by MOV SS (see Table 21-3). (A logical processor may also prevent such a VM exit if there is blocking of events by STI.) Such a VM exit occurs immediately after VM entry if the above conditions are true (see Section 23.6.6). We included STI into the NMI shadow, but we /may/ get early exits on some processors according to the statement above. According to your latest info, we can also get that when the MOV SS shadow is on!? But simply allowing NMI injection under MOV SS is not possible: 23.3 CHECKING AND LOADING GUEST STATE 23.3.1.5 Checks on Guest Non-Register State • Interruptibility state. ... — Bit 1 (blocking by MOV-SS) must be 0 if the valid bit (bit 31) in the VM-entry interruption-information field is 1 and the interruption type (bits 10:8) in that field has value 2, indicating non-maskable interrupt (NMI). And doing this for STI sounds risky too: — A processor may require bit 0 (blocking by STI) to be 0 if the valid bit (bit 31) in the VM-entry interruption-information field is 1 and the interruption type (bits 10:8) in that field has value 2, indicating NMI. Other processors may not make this requirement. Should we start stepping over the shadow like we do for svm? Intel's answer is that text above describes model-specific behaviour in bare metal which can block NMI for one instruction after STI/MOV SS, but since software should not rely on model-specific behaviour we can safely inject NMI after STI/MOV SS (clearing blocked by STI/MOV SS bit before injecting). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.0.12.2 aborts on linux
On Sun, May 02, 2010 at 08:09:56AM -0700, K D wrote: After I added code to raise ulimits for qemu, I don't see any memory related issue. I'm trying to spawn VM on an embedded linux with no window manager etc. with '-curses' it goes into 'VGA Blank Mode' and it stops there. not sure what it is doing. any clues? Haven't used '-curses' option for a long time. Have you provided bootable disk? Does your guest boots into graphical mode or text mode? thanks for help. From: Gleb Natapov g...@redhat.com To: K D kdca...@yahoo.com Cc: a...@redhat.com; mtosa...@redhat.com; kvm@vger.kernel.org Sent: Sat, May 1, 2010 10:23:16 PM Subject: Re: qemu-kvm.0.12.2 aborts on linux On Thu, Apr 29, 2010 at 07:35:02AM -0700, K D wrote: Here are my ulimits $ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 71680 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) 204800 Your virtual memory is limited to 200M. Make it unlimited. file locks (-x) unlimited $ I'm using monta vista distribution. I went through qemu code and there is no place to raise rlimits. didn't want to touch it. thanks for looking. From: Gleb Natapov g...@redhat.com To: K D kdca...@yahoo.com Cc: a...@redhat.com; mtosa...@redhat.com; kvm@vger.kernel.org Sent: Thu, April 29, 2010 3:17:06 AM Subject: Re: qemu-kvm.0.12.2 aborts on linux On Wed, Apr 28, 2010 at 10:19:37AM -0700, K D wrote: Am using yahoo mail and my mails to this mailer gets rejected every time saying message has HTML content etc. Should I use some other mail tool? Below is my issue. I am trying to get KVM/qemu running on linux. I compiled 2.6.27.10 by enabling KVM, KVM for intel options at configure time. My box is running with this KVM enabled inside kernel. I also built qemu-kvm-0.12.2 using above kernel headers etc. I enabled virtualization in BIOS. I didn't try to install any guest from CD etc. I made a hard disk image, installed grub on it, copied kernel, initrd onto it. Now when I try to create Vm as below, it crashes with following backtrace. What could be going wrong? Also when I try to say -m 256 malloc (or posix_memalign) fails with ENOMEM. So right now -m 128, which is default, works. Why is that? I have 4G RAM in my setup and my native linux is using less than 1G. Is there some rlimits for qemu that I need to raise? Sounds like I'm doing some basic stuff wrong. I'm using bios, vapic, pxe-rtl bin straight from the qemu-kvm dir. If I don't do '-nographic' its running into some malloc failure inside some vga routine. I pasted that backtrace too below. What's your Linux distribution? First trace bellow shows that ptheard_create() failed which is strange. What ulimit -a shows? Appreciate your help. thanks qemu-system-x86_64 -hda /dev/shm/vmhd.img -bios ./bios.bin --option-rom ./vapic.bin -curses -nographic -vga none -option-rom ./pxe-rtl8139.bin #0 0x414875a6 in raise () from /lib/libc.so.6 (gdb) bt #0 0x414875a6 in raise () from /lib/libc.so.6 #1 0x4148ad18 in abort () from /lib/libc.so.6 #2 0x080b4cb3 in die2 (err=value optimized out, what=0x81f2662 pthread_create) at posix-aio-compat.c:80 #3 0x080b5682 in thread_create (arg=value optimized out, start_routine=value optimized out, attr=value optimized out, thread=value optimized out) at posix-aio-compat.c:118 #4 spawn_thread () at posix-aio-compat.c:379 #5 qemu_paio_submit (aiocb=0x846b550) at posix-aio-compat.c:390 #6 0x080b57cb in paio_submit (bs=0x843c008, fd=5, sector_num=0, qiov=0x84cefb8, nb_sectors=512, cb=0x81cb950 dma_bdrv_cb, opaque=0x84cef80, type=1) at posix-aio-compat.c:584 #7 0x080cc7b8 in raw_aio_submit (type=value optimized out, opaque=value optimized out, cb=value optimized out, nb_sectors=value optimized out, qiov=value optimized out, sector_num=value optimized out, bs=value optimized out) at block/raw-posix.c:562 #8 raw_aio_readv (bs=0x843c008, sector_num=0, qiov=0x84cefb8, nb_sectors=1, cb=0x81cb950 dma_bdrv_cb, opaque=0x84cef80) at block/raw-posix.c:570 #9 0x080b0593 in bdrv_aio_readv (bs=0x843c008, sector_num=0, qiov=0x84cefb8, nb_sectors=1, cb=0x81cb950 dma_bdrv_cb,
Re: Booting/installing WindowsNT
Michael Tokarev wrote: 02.05.2010 14:04, Avi Kivity wrote: On 05/01/2010 12:40 AM, Michael Tokarev wrote: 01.05.2010 00:59, Michael Tokarev wrote: Apparently with current kvm stable (0.12.3) Windows NT 4.0 does not install anymore. With default -cpu, it boots, displays the Inspecting your hardware configuration message and BSODs with STOP: 0x003E error as shown here: http://www.corpit.ru/mjt/winnt4_1.gif With -cpu pentium the situation is a bit better, it displays: Microsoft (R) Windows NT (TM) Version 4.0 (Build 1381). 1 System Processor [512 MB Memory] Multiprocessor Kernel Michael, can you try -cpu kvm64? This should be somewhat in between -cpu host and -cpu qemu64. Also look in dmesg for uncatched rd/wrmsrs. In case you find something there, please try: # modprobe kvm ignore_msrs=1 (You have to unload the modules first) Regards, Andre. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
On 05/03/2010 11:24 AM, Andre Przywara wrote: can you try -cpu kvm64? This should be somewhat in between -cpu host and -cpu qemu64. Also look in dmesg for uncatched rd/wrmsrs. In case you find something there, please try: # modprobe kvm ignore_msrs=1 (You have to unload the modules first) It's unlikely NT 4 will touch msrs, it's used to running on very old hardware (pre-msr, even). I expect it's a problem with the bios or vendor/model/blah. (worth trying anyway) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
Avi Kivity wrote: On 05/03/2010 11:24 AM, Andre Przywara wrote: can you try -cpu kvm64? This should be somewhat in between -cpu host and -cpu qemu64. Also look in dmesg for uncatched rd/wrmsrs. In case you find something there, please try: # modprobe kvm ignore_msrs=1 (You have to unload the modules first) It's unlikely NT 4 will touch msrs, it's used to running on very old hardware (pre-msr, even). I expect it's a problem with the bios or vendor/model/blah. That is probably right, although I could be that it uses some MSRs if it finds the CPUID bit set. But I just wanted to make sure that not the issue. I made a diff of the CPUID flags between qemu64 and Michael's host, the difference is: only on qemu64: up hypervisor up is a Linux pseudo flag for a SMP kernel running on an UP host, the hypervisor bit has been introduced much later than even SP6. I guess that is OK, but note that Linux supposedly recognizes the single VCPU guest with -cpu host as SMP. But I guess that is not an issue for NT4, since it does not know about multi-core CPUs and CPUID based topology detection. only on host: vme ht mmxext fxsr_opt rdtscp 3dnowext 3dnow rep_good extd_apicid cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv Most of the features are far younger than NT4, only vme sticks out. This is pretty old flag, but I am not sure if that matters. My documentation reads: VME: virtual-mode enhancements. CR4.VME, CR4.PVI, software interrupt indirection, expansion of the TSS with the software, indirection bitmap, EFLAGS.VIF, EFLAGS.VIP. Let's just rule that out: Michael, can you try to use -cpu host,-vme and see if that makes a difference? Thanks, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().
2010/4/23 Avi Kivity a...@redhat.com: On 04/23/2010 04:22 PM, Anthony Liguori wrote: I currently don't have data, but I'll prepare it. There were two things I wanted to avoid. 1. Pages to be copied to QEMUFile buf through qemu_put_buffer. 2. Calling write() everytime even when we want to send multiple pages at once. I think 2 may be neglectable. But 1 seems to be problematic if we want make to the latency as small as possible, no? Copying often has strange CPU characteristics depending on whether the data is already in cache. It's better to drive these sort of optimizations through performance measurement because changes are not always obvious. Copying always introduces more cache pollution, so even if the data is in the cache, it is worthwhile (not disagreeing with the need to measure). Anthony, I measure how long it takes to send all guest pages during migration, and I would like to share the information in this message. For convenience, I modified the code to do migration not live migration which means buffered file is not used here. In summary, the performance improvement using writev instead of write/send when we used GbE seems to be neglectable, however, when the underlying network was fast (InfiniBand with IPoIB in this case), writev performed 17% faster than write/send, and therefore, it may be worthwhile to introduce vectors. Since QEMU compresses pages, I copied a junk file to tmpfs to dirty pages to let QEMU to transfer fine number of pages. After setting up the guest, I used cpu_get_real_ticks() to measure the time during the while loop calling ram_save_block() in ram_save_live(). I removed the qemu_file_rate_limit() to disable the function of buffered file, and all of the pages would be transfered at the first round. I measure 10 times for each, and took average and standard deviation. Considering the results, I think the trial number was enough. In addition to time duration, number of writev/write and number of pages which were compressed (dup)/not compressed (nodup) are demonstrated. Test Environment: CPU: 2x Intel Xeon Dual Core 3GHz Mem size: 6GB Network: GbE, InfiniBand (IPoIB) Host OS: Fedora 11 (kernel 2.6.34-rc1) Guest OS: Fedora 11 (kernel 2.6.33) Guest Mem size: 512MB * GbE writev time (sec): 35.732 (std 0.002) write count: 4 (std 0) writev count: 8269 (std 1) dup count: 36157 (std 124) nodup count: 1016808 (std 147) * GbE write time (sec): 35.780 (std 0.164) write count: 127367 (21) writev count: 0 (std 0) dup count: 36134 (std 108) nodup count: 1016853 (std 165) * IPoIB writev time (sec): 13.889 (std 0.155) write count: 4 (std 0) writev count: 8267 (std 1) dup count: 36147 (std 105) nodup count: 1016838 (std 111) * IPoIB write time (sec): 16.777 (std 0.239) write count: 127364 (24) writev count: 0 (std 0) dup count: 36173 (std 169) nodup count: 1016840 (std 190) Although the improvement wasn't obvious when the network wan GbE, introducing writev may be worthwhile when we focus on faster networks like InfiniBand/10GE. I agree that separating this optimization from the main logic of Kemari since this modification must be done widely and carefully at the same time. Thanks, Yoshi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7] add mergeable buffers support to vhost_net
On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). This is both with and without my patch on top. Without patch: [...@tuck ~]$ sh runtest 21 | tee ser-meregeable-disabled-kernel-only-tun-only.log Starting netserver at port 12865 set_up_server could not establish a listen endpoint for port 12865 with family AF_UNSPEC TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638410.00 9107.26 89.2033.850.802 2.436 With patch: TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.0.0.4 (11.0.0.4) port 0 AF_INET : demo Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638410.0035.00 2.21 0.62 5.181 11.575 For ease of testing, I put this on my tree git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-broken Please take a look. Thanks! -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[qemu-kvm tests PATCH] qemu-kvm tests: fix linker script problem
This is a fix to a previous patch by me. It's on 'next' branch, as of now. commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script itself (flat.lds) to become part of the linked objects, which messed the output file, one such problem is that symbol edata is not the last symbol anymore. Signed-off-by: Naphtali Sprei nsp...@redhat.com --- kvm/user/config-x86-common.mak |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index 61cc2f0..ad7aeac 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -19,7 +19,7 @@ CFLAGS += -m$(bits) libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name) FLATLIBS = test/lib/libcflat.a $(libgcc) -%.flat: %.o $(FLATLIBS) flat.lds +%.flat: %.o $(FLATLIBS) $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().
On 05/03/2010 04:32 AM, Yoshiaki Tamura wrote: 2010/4/23 Avi Kivitya...@redhat.com: On 04/23/2010 04:22 PM, Anthony Liguori wrote: I currently don't have data, but I'll prepare it. There were two things I wanted to avoid. 1. Pages to be copied to QEMUFile buf through qemu_put_buffer. 2. Calling write() everytime even when we want to send multiple pages at once. I think 2 may be neglectable. But 1 seems to be problematic if we want make to the latency as small as possible, no? Copying often has strange CPU characteristics depending on whether the data is already in cache. It's better to drive these sort of optimizations through performance measurement because changes are not always obvious. Copying always introduces more cache pollution, so even if the data is in the cache, it is worthwhile (not disagreeing with the need to measure). Anthony, I measure how long it takes to send all guest pages during migration, and I would like to share the information in this message. For convenience, I modified the code to do migration not live migration which means buffered file is not used here. In summary, the performance improvement using writev instead of write/send when we used GbE seems to be neglectable, however, when the underlying network was fast (InfiniBand with IPoIB in this case), writev performed 17% faster than write/send, and therefore, it may be worthwhile to introduce vectors. Since QEMU compresses pages, I copied a junk file to tmpfs to dirty pages to let QEMU to transfer fine number of pages. After setting up the guest, I used cpu_get_real_ticks() to measure the time during the while loop calling ram_save_block() in ram_save_live(). I removed the qemu_file_rate_limit() to disable the function of buffered file, and all of the pages would be transfered at the first round. I measure 10 times for each, and took average and standard deviation. Considering the results, I think the trial number was enough. In addition to time duration, number of writev/write and number of pages which were compressed (dup)/not compressed (nodup) are demonstrated. Test Environment: CPU: 2x Intel Xeon Dual Core 3GHz Mem size: 6GB Network: GbE, InfiniBand (IPoIB) Host OS: Fedora 11 (kernel 2.6.34-rc1) Guest OS: Fedora 11 (kernel 2.6.33) Guest Mem size: 512MB * GbE writev time (sec): 35.732 (std 0.002) write count: 4 (std 0) writev count: 8269 (std 1) dup count: 36157 (std 124) nodup count: 1016808 (std 147) * GbE write time (sec): 35.780 (std 0.164) write count: 127367 (21) writev count: 0 (std 0) dup count: 36134 (std 108) nodup count: 1016853 (std 165) * IPoIB writev time (sec): 13.889 (std 0.155) write count: 4 (std 0) writev count: 8267 (std 1) dup count: 36147 (std 105) nodup count: 1016838 (std 111) * IPoIB write time (sec): 16.777 (std 0.239) write count: 127364 (24) writev count: 0 (std 0) dup count: 36173 (std 169) nodup count: 1016840 (std 190) Although the improvement wasn't obvious when the network wan GbE, introducing writev may be worthwhile when we focus on faster networks like InfiniBand/10GE. I agree that separating this optimization from the main logic of Kemari since this modification must be done widely and carefully at the same time. Okay. It looks like it's clear that it's a win so let's split it out of the main series and we'll treat it separately. I imagine we'll see even more positive results on 10 gbit and particularly if we move migration out into a separate thread. Regards, Anthony Liguori Thanks, Yoshi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2] KVM MMU: make kvm_mmu_zap_page() return the number of zapped sp in total.
Marcelo Tosatti wrote: On Fri, Apr 23, 2010 at 01:58:22PM +0800, Gui Jianfeng wrote: Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is performed after calling kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. Because root sp won't be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return total number of reclaimed sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate whether the top page is reclaimed. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 53 +++ 1 files changed, 36 insertions(+), 17 deletions(-) Gui, There will be only a few pinned roots, and there is no need for kvm_mmu_change_mmu_pages to be precise at that level (pages will be reclaimed through kvm_unmap_hva eventually). Hi Marcelo Actually, it doesn't only affect kvm_mmu_change_mmu_pages() but also affects kvm_mmu_remove_some_alloc_mmu_pages() which is called by mmu shrink routine. This will induce upper layer get a wrong number, so i think this should be fixed. Here is a updated version. --- From: Gui Jianfeng guijianf...@cn.fujitsu.com Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is performed after calling kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. Because root sp won't be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return total number of reclaimed sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate whether the top page is reclaimed. kvm_mmu_remove_some_alloc_mmu_pages() also rely on kvm_mmu_zap_page() to return a total relcaimed number. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 53 +++ 1 files changed, 36 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 51eb6d6..e545da8 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1194,12 +1194,13 @@ static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) --kvm-stat.mmu_unsync; } -static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp); +static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, + int *self_deleted); static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { if (sp-role.cr4_pae != !!is_pae(vcpu)) { - kvm_mmu_zap_page(vcpu-kvm, sp); + kvm_mmu_zap_page(vcpu-kvm, sp, NULL); return 1; } @@ -1207,7 +1208,7 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) kvm_flush_remote_tlbs(vcpu-kvm); kvm_unlink_unsync_page(vcpu-kvm, sp); if (vcpu-arch.mmu.sync_page(vcpu, sp)) { - kvm_mmu_zap_page(vcpu-kvm, sp); + kvm_mmu_zap_page(vcpu-kvm, sp, NULL); return 1; } @@ -1478,7 +1479,7 @@ static int mmu_zap_unsync_children(struct kvm *kvm, struct kvm_mmu_page *sp; for_each_sp(pages, sp, parents, i) { - kvm_mmu_zap_page(kvm, sp); + kvm_mmu_zap_page(kvm, sp, NULL); mmu_pages_clear_parents(parents); zapped++; } @@ -1488,7 +1489,8 @@ static int mmu_zap_unsync_children(struct kvm *kvm, return zapped; } -static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) +static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, + int *self_deleted) { int ret; @@ -1505,11 +1507,16 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) if (!sp-root_count) { hlist_del(sp-hash_link); kvm_mmu_free_page(kvm, sp); + /* Count self */ + ret++; + if (self_deleted) + *self_deleted = 1; } else { sp-role.invalid = 1; list_move(sp-link, kvm-arch.active_mmu_pages); kvm_reload_remote_mmus(kvm); } + kvm_mmu_reset_last_pte_updated(kvm); return ret; } @@ -1538,8 +1545,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages) page = container_of(kvm-arch.active_mmu_pages.prev, struct kvm_mmu_page, link); - used_pages -= kvm_mmu_zap_page(kvm, page); - used_pages--; + used_pages -= kvm_mmu_zap_page(kvm, page, NULL); } kvm_nr_mmu_pages = used_pages; kvm-arch.n_free_mmu_pages = 0; @@ -1558,6 +1564,8 @@ static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn) struct kvm_mmu_page *sp; struct hlist_node
[PATCH] KVM: Get rid of KVM_REQ_KICK
KVM_REQ_KICK poisons vcpu-requests by having a bit set during normal operation. This causes the fast path check for a clear vcpu-requests to fail all the time, triggering tons of atomic operations. Fix by replacing KVM_REQ_KICK with a vcpu-guest_mode atomic. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c | 17 ++--- include/linux/kvm_host.h |1 + 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..307094a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4499,13 +4499,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (vcpu-fpu_active) kvm_load_guest_fpu(vcpu); - local_irq_disable(); + atomic_set(vcpu-guest_mode, 1); + smp_wmb(); - clear_bit(KVM_REQ_KICK, vcpu-requests); - smp_mb__after_clear_bit(); + local_irq_disable(); - if (vcpu-requests || need_resched() || signal_pending(current)) { - set_bit(KVM_REQ_KICK, vcpu-requests); + if (!atomic_read(vcpu-guest_mode) || vcpu-requests + || need_resched() || signal_pending(current)) { + atomic_set(vcpu-guest_mode, 0); + smp_wmb(); local_irq_enable(); preempt_enable(); r = 1; @@ -4550,7 +4552,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (hw_breakpoint_active()) hw_breakpoint_restore(); - set_bit(KVM_REQ_KICK, vcpu-requests); + atomic_set(vcpu-guest_mode, 0); + smp_wmb(); local_irq_enable(); ++vcpu-stat.exits; @@ -5470,7 +5473,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu) me = get_cpu(); if (cpu != me (unsigned)cpu nr_cpu_ids cpu_online(cpu)) - if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests)) + if (atomic_xchg(vcpu-guest_mode, 0)) smp_send_reschedule(cpu); put_cpu(); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ce027d5..a020fa2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -81,6 +81,7 @@ struct kvm_vcpu { int vcpu_id; struct mutex mutex; int cpu; + atomic_t guest_mode; struct kvm_run *run; unsigned long requests; unsigned long guest_debug; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Avoid writing HOST_CR0 every entry
cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each entry. That is slow, so instead, set HOST_CR0 to have TS set unconditionally (which is a safe value), and issue a clts() just before exiting vcpu context if the task indeed owns the fpu. Saves ~50 cycles/exit. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/vmx.c |9 +++-- 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 875b785..9cfdc0e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -753,6 +753,8 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); } #endif + if (current_thread_info()-status TS_USEDFPU) + clts(); } static void vmx_load_host_state(struct vcpu_vmx *vmx) @@ -2441,7 +2443,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf); vmcs_write32(CR3_TARGET_COUNT, 0); /* 22.2.1 */ - vmcs_writel(HOST_CR0, read_cr0()); /* 22.2.3 */ + vmcs_writel(HOST_CR0, read_cr0() | X86_CR0_TS); /* 22.2.3 */ vmcs_writel(HOST_CR4, read_cr4()); /* 22.2.3, 22.2.5 */ vmcs_writel(HOST_CR3, read_cr3()); /* 22.2.3 FIXME: shadow tables */ @@ -3794,11 +3796,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vmx_set_interrupt_shadow(vcpu, 0); - /* -* Loading guest fpu may have cleared host cr0.ts -*/ - vmcs_writel(HOST_CR0, read_cr0()); - asm( /* Store host registers */ push %%Rdx; push %%Rbp; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().
2010/5/3 Anthony Liguori aligu...@linux.vnet.ibm.com: On 05/03/2010 04:32 AM, Yoshiaki Tamura wrote: 2010/4/23 Avi Kivitya...@redhat.com: On 04/23/2010 04:22 PM, Anthony Liguori wrote: I currently don't have data, but I'll prepare it. There were two things I wanted to avoid. 1. Pages to be copied to QEMUFile buf through qemu_put_buffer. 2. Calling write() everytime even when we want to send multiple pages at once. I think 2 may be neglectable. But 1 seems to be problematic if we want make to the latency as small as possible, no? Copying often has strange CPU characteristics depending on whether the data is already in cache. It's better to drive these sort of optimizations through performance measurement because changes are not always obvious. Copying always introduces more cache pollution, so even if the data is in the cache, it is worthwhile (not disagreeing with the need to measure). Anthony, I measure how long it takes to send all guest pages during migration, and I would like to share the information in this message. For convenience, I modified the code to do migration not live migration which means buffered file is not used here. In summary, the performance improvement using writev instead of write/send when we used GbE seems to be neglectable, however, when the underlying network was fast (InfiniBand with IPoIB in this case), writev performed 17% faster than write/send, and therefore, it may be worthwhile to introduce vectors. Since QEMU compresses pages, I copied a junk file to tmpfs to dirty pages to let QEMU to transfer fine number of pages. After setting up the guest, I used cpu_get_real_ticks() to measure the time during the while loop calling ram_save_block() in ram_save_live(). I removed the qemu_file_rate_limit() to disable the function of buffered file, and all of the pages would be transfered at the first round. I measure 10 times for each, and took average and standard deviation. Considering the results, I think the trial number was enough. In addition to time duration, number of writev/write and number of pages which were compressed (dup)/not compressed (nodup) are demonstrated. Test Environment: CPU: 2x Intel Xeon Dual Core 3GHz Mem size: 6GB Network: GbE, InfiniBand (IPoIB) Host OS: Fedora 11 (kernel 2.6.34-rc1) Guest OS: Fedora 11 (kernel 2.6.33) Guest Mem size: 512MB * GbE writev time (sec): 35.732 (std 0.002) write count: 4 (std 0) writev count: 8269 (std 1) dup count: 36157 (std 124) nodup count: 1016808 (std 147) * GbE write time (sec): 35.780 (std 0.164) write count: 127367 (21) writev count: 0 (std 0) dup count: 36134 (std 108) nodup count: 1016853 (std 165) * IPoIB writev time (sec): 13.889 (std 0.155) write count: 4 (std 0) writev count: 8267 (std 1) dup count: 36147 (std 105) nodup count: 1016838 (std 111) * IPoIB write time (sec): 16.777 (std 0.239) write count: 127364 (24) writev count: 0 (std 0) dup count: 36173 (std 169) nodup count: 1016840 (std 190) Although the improvement wasn't obvious when the network wan GbE, introducing writev may be worthwhile when we focus on faster networks like InfiniBand/10GE. I agree that separating this optimization from the main logic of Kemari since this modification must be done widely and carefully at the same time. Okay. It looks like it's clear that it's a win so let's split it out of the main series and we'll treat it separately. I imagine we'll see even more positive results on 10 gbit and particularly if we move migration out into a separate thread. Great! I also wanted to test with 10GE but I'm physically away from my office now, and can't set up the test environment. I'll measure the numbers w/ 10GE next week. BTW, I was thinking to write a patch to separate threads for both sender and receiver of migration. Kemari especially needs a separate thread receiver, so that monitor can accepts commands from other HA tools. Is someone already working on this? If not, I would add it to my task list :-) Thanks, Yoshi Regards, Anthony Liguori Thanks, Yoshi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7] add mergeable buffers support to vhost_net
Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM: On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). Actually, I wouldn't expect it to work at all; the qemu-kvm patch (particularly the feature bit setting bug fix) is required. Without it, I think the existing code will tell the guest to use mergeable buffers while turning it off in vhost. That was completely non-functional for me -- what version of qemu-kvm are you using? What I did to test w/o mergeable buffers is turn off the bit in VHOST_FEATURES. I'll recheck these, but qemu-kvm definitely must be updated; the original doesn't correctly handle feature bits. +-DLS -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[qemu-kvm tests PATCH] qemu-kvm tests: merged stringio into emulator
based on 'next' branch. Changed test-case stringio into C code and merged into emulator test-case. Removed traces of stringio test-case. Signed-off-by: Naphtali Sprei nsp...@redhat.com --- kvm/user/config-x86-common.mak |2 -- kvm/user/config-x86_64.mak |4 ++-- kvm/user/test/x86/README |1 - kvm/user/test/x86/emulator.c | 31 --- kvm/user/test/x86/stringio.S | 36 5 files changed, 30 insertions(+), 44 deletions(-) delete mode 100644 kvm/user/test/x86/stringio.S diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index 61cc2f0..0ff9425 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -56,8 +56,6 @@ $(TEST_DIR)/realmode.flat: $(TEST_DIR)/realmode.o $(TEST_DIR)/realmode.o: bits = 32 -$(TEST_DIR)/stringio.flat: $(cstart.o) $(TEST_DIR)/stringio.o - $(TEST_DIR)/msr.flat: $(cstart.o) $(TEST_DIR)/msr.o arch_clean: diff --git a/kvm/user/config-x86_64.mak b/kvm/user/config-x86_64.mak index 03c91f2..3403dc3 100644 --- a/kvm/user/config-x86_64.mak +++ b/kvm/user/config-x86_64.mak @@ -5,7 +5,7 @@ ldarch = elf64-x86-64 CFLAGS += -D__x86_64__ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/sieve.flat \ -$(TEST_DIR)/stringio.flat $(TEST_DIR)/emulator.flat \ -$(TEST_DIR)/hypercall.flat $(TEST_DIR)/apic.flat + $(TEST_DIR)/emulator.flat $(TEST_DIR)/hypercall.flat \ + $(TEST_DIR)/apic.flat include config-x86-common.mak diff --git a/kvm/user/test/x86/README b/kvm/user/test/x86/README index e2ede5c..ab5a2ae 100644 --- a/kvm/user/test/x86/README +++ b/kvm/user/test/x86/README @@ -10,6 +10,5 @@ realmode: goes back to realmode, shld, push/pop, mov immediate, cmp immediate, a io, eflags instructions (clc, cli, etc.), jcc short, jcc near, call, long jmp, xchg sieve: heavy memory access with no paging and with paging static and with paging vmalloc'ed smptest: run smp_id() on every cpu and compares return value to number -stringio: outs forward and backward tsc: write to tsc(0) and write to tsc(1000) and read it back vmexit: long loops for each: cpuid, vmcall, mov_from_cr8, mov_to_cr8, inl_pmtimer, ipi, ipi+halt diff --git a/kvm/user/test/x86/emulator.c b/kvm/user/test/x86/emulator.c index 4967d1f..5406062 100644 --- a/kvm/user/test/x86/emulator.c +++ b/kvm/user/test/x86/emulator.c @@ -3,6 +3,7 @@ #include libcflat.h #define memset __builtin_memset +#define TESTDEV_IO_PORT 0xe0 int fails, tests; @@ -17,6 +18,29 @@ void report(const char *name, int result) } } +static char str[] = abcdefghijklmnop; + +void test_stringio() +{ + unsigned char r = 0; + asm volatile(cld \n\t +movw %0, %%dx \n\t +rep outsb \n\t +: : i((short)TESTDEV_IO_PORT), + S(str), c(sizeof(str) - 1)); + asm volatile(inb %1, %0\n\t : =a(r) : i((short)TESTDEV_IO_PORT)); + report(outsb up, r == str[sizeof(str) - 2]); /* last char */ + + asm volatile(std \n\t +movw %0, %%dx \n\t +rep outsb \n\t +: : i((short)TESTDEV_IO_PORT), + S(str + sizeof(str) - 2), c(sizeof(str) - 1)); + asm volatile(cld \n\t : : ); + asm volatile(in %1, %0\n\t : =a(r) : i((short)TESTDEV_IO_PORT)); + report(outsb down, r == str[0]); +} + void test_cmps_one(unsigned char *m1, unsigned char *m3) { void *rsi, *rdi; @@ -93,8 +117,8 @@ void test_cmps(void *mem) m1[i] = m2[i] = m3[i] = i; for (int i = 100; i 200; ++i) m1[i] = (m3[i] = m2[i] = i) + 1; -test_cmps_one(m1, m3); -test_cmps_one(m1, m2); + test_cmps_one(m1, m3); + test_cmps_one(m1, m2); } void test_cr8(void) @@ -271,7 +295,8 @@ int main() test_smsw(); test_lmsw(); -test_ljmp(mem); + test_ljmp(mem); + test_stringio(); printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; diff --git a/kvm/user/test/x86/stringio.S b/kvm/user/test/x86/stringio.S deleted file mode 100644 index 461621c..000 --- a/kvm/user/test/x86/stringio.S +++ /dev/null @@ -1,36 +0,0 @@ - -.data - -.macro str name, value - -\name :.long 1f-2f -2: .ascii \value -1: -.endm - -TESTDEV_PORT = 0xf1 - - str forward, forward - str backward, backward - -.text - -.global main -main: - cld - movl forward, %ecx - lea 4+forward, %rsi - movw $TESTDEV_PORT, %dx - rep outsb - - std - movl backward, %ecx - lea 4+backward-1(%rcx), %rsi - movw $TESTDEV_PORT, %dx - rep outsb - - mov $0, %rsi - call exit - - - -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH v2 0/7] Pvclock fixes , version two
Here is a new version of this series. I am definitely leaving any warp calculations out, as Jeremy wisely points out that Chuck Norris should perish before we warp. Also, in this series, I am using KVM_GET_SUPPORTED_CPUID to export our features to userspace, as avi suggets. (patch 4/7), and starting to use the flag that indicates possible tsc stability (patch 7/7), although it is still off by default. Glauber Costa (7): Enable pvclock flags in vcpu_time_info structure Add a global synchronization point for pvclock change msr numbers for kvmclock export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID Try using new kvm clock msrs don't compute pvclock adjustments if we trust the tsc Tell the guest we'll warn it about tsc stability arch/x86/include/asm/kvm_para.h| 13 arch/x86/include/asm/pvclock-abi.h |4 ++- arch/x86/include/asm/pvclock.h |1 + arch/x86/kernel/kvmclock.c | 56 ++- arch/x86/kernel/pvclock.c | 37 +++ arch/x86/kvm/x86.c | 37 +++- 6 files changed, 125 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
Right now, we were using individual KVM_CAP entities to communicate userspace about which cpuids we support. This is suboptimal, since it generates a delay between the feature arriving in the host, and being available at the guest. A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID. This makes userspace automatically aware of what we provide. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_para.h |4 arch/x86/kvm/x86.c | 27 +++ 2 files changed, 31 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 9734808..f019f8c 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,6 +16,10 @@ #define KVM_FEATURE_CLOCKSOURCE0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 +/* This indicates that the new set of kvmclock msrs + * are available. The use of 0x11 and 0x12 is deprecated + */ +#define KVM_FEATURE_CLOCKSOURCE23 #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index eb84947..8a7cdda 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1971,6 +1971,20 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, } break; } + case 0x4000: { + char signature[] = KVMKVMKVM; + u32 *sigptr = (u32 *)signature; + entry-eax = 1; + entry-ebx = sigptr[0]; + entry-ecx = sigptr[1]; + entry-edx = sigptr[2]; + break; + } + case 0x4001: + entry-eax = (1 KVM_FEATURE_CLOCKSOURCE) | + (1 KVM_FEATURE_NOP_IO_DELAY) | + (1 KVM_FEATURE_CLOCKSOURCE2); + break; case 0x8000: entry-eax = min(entry-eax, 0x801a); break; @@ -2017,6 +2031,19 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, for (func = 0x8001; func = limit nent cpuid-nent; ++func) do_cpuid_ent(cpuid_entries[nent], func, 0, nent, cpuid-nent); + + + + r = -E2BIG; + if (nent = cpuid-nent) + goto out_free; + + do_cpuid_ent(cpuid_entries[nent], 0x4000, 0, nent, cpuid-nent); + limit = cpuid_entries[nent - 1].eax; + for (func = 0x4001; func = limit nent cpuid-nent; ++func) + do_cpuid_ent(cpuid_entries[nent], func, 0, +nent, cpuid-nent); + r = -E2BIG; if (nent = cpuid-nent) goto out_free; -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/7] Tell the guest we'll warn it about tsc stability
This patch puts up the flag that tells the guest that we'll warn it about the tsc being trustworthy or not. By now, we also say it is not. --- arch/x86/kvm/x86.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8a7cdda..63a2acc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -855,6 +855,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v) vcpu-hv_clock.system_time = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec) + v-kvm-arch.kvmclock_offset; + vcpu-hv_clock.flags = 0; + /* * The interface expects us to write an even number signaling that the * update is finished. Since the guest won't see the intermediate @@ -1983,7 +1985,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, case 0x4001: entry-eax = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_NOP_IO_DELAY) | - (1 KVM_FEATURE_CLOCKSOURCE2); + (1 KVM_FEATURE_CLOCKSOURCE2) | + (1 KVM_FEATURE_CLOCKSOURCE_STABLE_TSC); break; case 0x8000: entry-eax = min(entry-eax, 0x801a); -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc
If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_para.h|5 + arch/x86/include/asm/pvclock-abi.h |1 + arch/x86/kernel/kvmclock.c |3 +++ arch/x86/kernel/pvclock.c |4 4 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index f019f8c..6f1b878 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -21,6 +21,11 @@ */ #define KVM_FEATURE_CLOCKSOURCE23 +/* The last 8 bits are used to indicate how to interpret the flags field + * in pvclock structure. If no bits are set, all flags are ignored. + */ +#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC 24 + #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index ec5c41a..b123bd7 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -39,5 +39,6 @@ struct pvclock_wall_clock { u32 nsec; } __attribute__((__packed__)); +#define PVCLOCK_STABLE_TSC (1 0) #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PVCLOCK_ABI_H */ diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 59c740f..518de01 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -215,4 +215,7 @@ void __init kvmclock_init(void) clocksource_register(kvm_clock); pv_info.paravirt_enabled = 1; pv_info.name = KVM; + + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_TSC)) + pvclock_set_flags(PVCLOCK_STABLE_TSC); } diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 3dff5fa..0d4fe0c 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) barrier(); } while (version != src-version); + if ((valid_flags PVCLOCK_STABLE_TSC) + (shadow.flags PVCLOCK_STABLE_TSC)) + return ret; + /* * Assumption here is that last_value, a global accumulator, always goes * forward. If we are less than that, we should not be much smaller. -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/7] Try using new kvm clock msrs
We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens very early, and if we ever try to write to a non-existant MSR, we raise a lethal #GP, since our idt handlers are not in place yet. So this patch tests for a cpuid feature exported by the host to decide which set of msrs are supported. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/kernel/kvmclock.c | 53 ++- 1 files changed, 32 insertions(+), 21 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index feaeb0d..59c740f 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -29,6 +29,8 @@ #define KVM_SCALE 22 static int kvmclock = 1; +static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME; +static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK; static int parse_no_kvmclock(char *arg) { @@ -54,7 +56,8 @@ static unsigned long kvm_get_wallclock(void) low = (int)__pa_symbol(wall_clock); high = ((u64)__pa_symbol(wall_clock) 32); - native_write_msr(MSR_KVM_WALL_CLOCK, low, high); + + native_write_msr(msr_kvm_wall_clock, low, high); vcpu_time = get_cpu_var(hv_clock); pvclock_read_wallclock(wall_clock, vcpu_time, ts); @@ -130,7 +133,8 @@ static int kvm_register_clock(char *txt) high = ((u64)__pa(per_cpu(hv_clock, cpu)) 32); printk(KERN_INFO kvm-clock: cpu %d, msr %x:%x, %s\n, cpu, high, low, txt); - return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high); + + return native_write_msr_safe(msr_kvm_system_time, low, high); } #ifdef CONFIG_X86_LOCAL_APIC @@ -165,14 +169,14 @@ static void __init kvm_smp_prepare_boot_cpu(void) #ifdef CONFIG_KEXEC static void kvm_crash_shutdown(struct pt_regs *regs) { - native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0); + native_write_msr(msr_kvm_system_time, 0, 0); native_machine_crash_shutdown(regs); } #endif static void kvm_shutdown(void) { - native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0); + native_write_msr(msr_kvm_system_time, 0, 0); native_machine_shutdown(); } @@ -181,27 +185,34 @@ void __init kvmclock_init(void) if (!kvm_para_available()) return; - if (kvmclock kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) { - if (kvm_register_clock(boot clock)) - return; - pv_time_ops.sched_clock = kvm_clock_read; - x86_platform.calibrate_tsc = kvm_get_tsc_khz; - x86_platform.get_wallclock = kvm_get_wallclock; - x86_platform.set_wallclock = kvm_set_wallclock; + if (kvmclock kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { + msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; + msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; + } else if (!(kvmclock kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))) + return; + + printk(KERN_INFO kvm-clock: Using msrs %x and %x, + msr_kvm_system_time, msr_kvm_wall_clock); + + if (kvm_register_clock(boot clock)) + return; + pv_time_ops.sched_clock = kvm_clock_read; + x86_platform.calibrate_tsc = kvm_get_tsc_khz; + x86_platform.get_wallclock = kvm_get_wallclock; + x86_platform.set_wallclock = kvm_set_wallclock; #ifdef CONFIG_X86_LOCAL_APIC - x86_cpuinit.setup_percpu_clockev = - kvm_setup_secondary_clock; + x86_cpuinit.setup_percpu_clockev = + kvm_setup_secondary_clock; #endif #ifdef CONFIG_SMP - smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; + smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; #endif - machine_ops.shutdown = kvm_shutdown; + machine_ops.shutdown = kvm_shutdown; #ifdef CONFIG_KEXEC - machine_ops.crash_shutdown = kvm_crash_shutdown; + machine_ops.crash_shutdown = kvm_crash_shutdown; #endif - kvm_get_preset_lpj(); - clocksource_register(kvm_clock); - pv_info.paravirt_enabled = 1; - pv_info.name = KVM; - } + kvm_get_preset_lpj(); + clocksource_register(kvm_clock); + pv_info.paravirt_enabled = 1; + pv_info.name = KVM; } -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/7] change msr numbers for kvmclock
Avi pointed out a while ago that those MSRs falls into the pentium PMU range. So the idea here is to add new ones, and after a while, deprecate the old ones. Signed-off-by: Glauber Costa glom...@redhat.com --- arch/x86/include/asm/kvm_para.h |4 arch/x86/kvm/x86.c |7 ++- 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index ffae142..9734808 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -20,6 +20,10 @@ #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 +/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */ +#define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 +#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 + #define KVM_MAX_MMU_OP_BATCH 32 /* Operations for KVM_HC_MMU_OP */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..eb84947 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -664,9 +664,10 @@ static inline u32 bit(int bitno) * kvm-specific. Those are put in the beginning of the list. */ -#define KVM_SAVE_MSRS_BEGIN5 +#define KVM_SAVE_MSRS_BEGIN7 static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, + MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, @@ -1192,10 +1193,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) case MSR_IA32_MISC_ENABLE: vcpu-arch.ia32_misc_enable_msr = data; break; + case MSR_KVM_WALL_CLOCK_NEW: case MSR_KVM_WALL_CLOCK: vcpu-kvm-arch.wall_clock = data; kvm_write_wall_clock(vcpu-kvm, data); break; + case MSR_KVM_SYSTEM_TIME_NEW: case MSR_KVM_SYSTEM_TIME: { if (vcpu-arch.time_page) { kvm_release_page_dirty(vcpu-arch.time_page); @@ -1467,9 +1470,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) data = vcpu-arch.efer; break; case MSR_KVM_WALL_CLOCK: + case MSR_KVM_WALL_CLOCK_NEW: data = vcpu-kvm-arch.wall_clock; break; case MSR_KVM_SYSTEM_TIME: + case MSR_KVM_SYSTEM_TIME_NEW: data = vcpu-arch.time; break; case MSR_IA32_P5_MC_ADDR: -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/7] Add a global synchronization point for pvclock
In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means *BIG*), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa glom...@redhat.com CC: Jeremy Fitzhardinge jer...@goop.org CC: Avi Kivity a...@redhat.com CC: Marcelo Tosatti mtosa...@redhat.com CC: Zachary Amsden zams...@redhat.com --- arch/x86/kernel/pvclock.c | 24 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index aa2262b..3dff5fa 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) return pv_tsc_khz; } +static atomic64_t last_value = ATOMIC64_INIT(0); + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; unsigned version; cycle_t ret, offset; + u64 last; do { version = pvclock_get_time_values(shadow, src); @@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) barrier(); } while (version != src-version); + /* +* Assumption here is that last_value, a global accumulator, always goes +* forward. If we are less than that, we should not be much smaller. +* We assume there is an error marging we're inside, and then the correction +* does not sacrifice accuracy. +* +* For reads: global may have changed between test and return, +* but this means someone else updated poked the clock at a later time. +* We just need to make sure we are not seeing a backwards event. +* +* For updates: last_value = ret is not enough, since two vcpus could be +* updating at the same time, and one of them could be slightly behind, +* making the assumption that last_value always go forward fail to hold. +*/ + last = atomic64_read(last_value); + do { + if (ret last) + return last; + last = atomic64_cmpxchg(last_value, last, ret); + } while (unlikely(last != ret)); + return ret; } -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure
This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa glom...@redhat.com CC: Jeremy Fitzhardinge jer...@goop.org --- arch/x86/include/asm/pvclock-abi.h |3 ++- arch/x86/include/asm/pvclock.h |1 + arch/x86/kernel/pvclock.c |9 + 3 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 6d93508..ec5c41a 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info { u64 system_time; u32 tsc_to_system_mul; s8tsc_shift; - u8pad[3]; + u8flags; + u8pad[2]; } __attribute__((__packed__)); /* 32 bytes */ struct pvclock_wall_clock { diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index 53235fd..cd02f32 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -6,6 +6,7 @@ /* some helper functions for xen and kvm pv clock sources */ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src); +void pvclock_set_flags(u8 flags); unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 03801f2..aa2262b 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -31,8 +31,16 @@ struct pvclock_shadow_time { u32 tsc_to_nsec_mul; int tsc_shift; u32 version; + u8 flags; }; +static u8 valid_flags = 0; + +void pvclock_set_flags(u8 flags) +{ + valid_flags = flags; +} + /* * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, * yielding a 64-bit result. @@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst, dst-system_timestamp = src-system_time; dst-tsc_to_nsec_mul = src-tsc_to_system_mul; dst-tsc_shift = src-tsc_shift; + dst-flags = src-flags; rmb(); /* test version after fetching data */ } while ((src-version 1) || (dst-version != src-version)); -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7] add mergeable buffers support to vhost_net
On Mon, May 03, 2010 at 08:39:08AM -0700, David Stevens wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM: On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). Actually, I wouldn't expect it to work at all; the qemu-kvm patch (particularly the feature bit setting bug fix) is required. Which bugfix is that? Without it, I think the existing code will tell the guest to use mergeable buffers while turning it off in vhost. It should not do that, specifically vhost_net_get_features does: features = ~(1 VIRTIO_NET_F_MRG_RXBUF); unconditionally. This was added with: 5751995a20e77cd9d61d00f7390401895fa172a6 I forced mergeable buffers off with -global virtio-net-pci.mrg_rxbuf=off and it did not seem to help either. That was completely non-functional for me -- what version of qemu-kvm are you using? 992cc816c42f2e93db033919a9ddbfcd1da4 or later should work well AFAIK. What I did to test w/o mergeable buffers is turn off the bit in VHOST_FEATURES. It should be enough to force mergeable buffers to off by qemu command line: -global virtio-net-pci.mrg_rxbuf=off I'll recheck these, but qemu-kvm definitely must be updated; the original doesn't correctly handle feature bits. +-DLS Hmm, I don't see the bug. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().
On 05/03/2010 10:36 AM, Yoshiaki Tamura wrote: Great! I also wanted to test with 10GE but I'm physically away from my office now, and can't set up the test environment. I'll measure the numbers w/ 10GE next week. BTW, I was thinking to write a patch to separate threads for both sender and receiver of migration. Kemari especially needs a separate thread receiver, so that monitor can accepts commands from other HA tools. Is someone already working on this? If not, I would add it to my task list :-) So far, no one (to my knowledge at least), is working on this. Regards, Anthony Liguori Thanks, Yoshi Regards, Anthony Liguori Thanks, Yoshi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7] add mergeable buffers support to vhost_net
Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 08:56:14 AM: On Mon, May 03, 2010 at 08:39:08AM -0700, David Stevens wrote: Michael S. Tsirkin m...@redhat.com wrote on 05/03/2010 03:34:11 AM: On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote: This patch adds mergeable receive buffer support to vhost_net. Signed-off-by: David L Stevens dlstev...@us.ibm.com I've been doing some more testing before sending out a pull request, and I see a drastic performance degradation in guest to host traffic when this is applied but mergeable buffers are not in used by userspace (existing qemu-kvm userspace). Actually, I wouldn't expect it to work at all; the qemu-kvm patch (particularly the feature bit setting bug fix) is required. Which bugfix is that? Actually, I see you put that upstream already-- commit dc14a397812b91dd0d48b03d1b8f66a251542369 in Avi's tree is the one I was talking about. I'll look further. +-DLS -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote: So we probably need to upgrade gva_t to a u64. Please send this as a separate patch, and test on i386 hosts. Are there _any_ regular tests of KVM on i386 hosts? For me this is terribly broken (also after I fixed the issue which gave me a VMEXIT_INVALID at the first vmrun). Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VIA Nano support
On Sun, May 2, 2010 at 1:21 AM, Yuhong Bao yuhongbao_...@hotmail.com wrote: What about details? This old thread seems to have more details. http://www.mail-archive.com/kvm@vger.kernel.org/msg11704.html Hopefully they are of more use to you than me ;-) Thanks, Rusty -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
03.05.2010 12:24, Andre Przywara wrote: Michael Tokarev wrote: 02.05.2010 14:04, Avi Kivity wrote: On 05/01/2010 12:40 AM, Michael Tokarev wrote: 01.05.2010 00:59, Michael Tokarev wrote: Apparently with current kvm stable (0.12.3) Windows NT 4.0 does not install anymore. With default -cpu, it boots, displays the Inspecting your hardware configuration message and BSODs with STOP: 0x003E error as shown here: http://www.corpit.ru/mjt/winnt4_1.gif With -cpu pentium the situation is a bit better, it displays: Microsoft (R) Windows NT (TM) Version 4.0 (Build 1381). 1 System Processor [512 MB Memory] Multiprocessor Kernel Michael, can you try -cpu kvm64? This should be somewhat in between -cpu host and -cpu qemu64. It stops during boot with the same STOP 0x003E message. Also look in dmesg for uncatched rd/wrmsrs. In case you find something there, please try: Nothing in host dmesg, nothing at all. Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
03.05.2010 13:28, Andre Przywara wrote: Avi Kivity wrote: On 05/03/2010 11:24 AM, Andre Przywara wrote: can you try -cpu kvm64? This should be somewhat in between -cpu host and -cpu qemu64. Also look in dmesg for uncatched rd/wrmsrs. In case you find something there, please try: # modprobe kvm ignore_msrs=1 (You have to unload the modules first) It's unlikely NT 4 will touch msrs, it's used to running on very old hardware (pre-msr, even). I expect it's a problem with the bios or vendor/model/blah. That is probably right, although I could be that it uses some MSRs if it finds the CPUID bit set. But I just wanted to make sure that not the issue. I made a diff of the CPUID flags between qemu64 and Michael's host, the difference is: only on qemu64: up hypervisor up is a Linux pseudo flag for a SMP kernel running on an UP host, the hypervisor bit has been introduced much later than even SP6. I guess that is OK, but note that Linux supposedly recognizes the single VCPU guest with -cpu host as SMP. But I guess that is not an issue for NT4, since it does not know about multi-core CPUs and CPUID based topology detection. only on host: vme ht mmxext fxsr_opt rdtscp 3dnowext 3dnow rep_good extd_apicid cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv Most of the features are far younger than NT4, only vme sticks out. This is pretty old flag, but I am not sure if that matters. My documentation reads: VME: virtual-mode enhancements. CR4.VME, CR4.PVI, software interrupt indirection, expansion of the TSS with the software, indirection bitmap, EFLAGS.VIF, EFLAGS.VIP. Let's just rule that out: Michael, can you try to use -cpu host,-vme and see if that makes a difference? With -cpu host,-vme winNT boots just fine as with just -cpu host. I also tried with -cpu qemu64 and kvm64, with +vme and -vme (4 combinations in total) - in all cases winNT crashes with the same 0x003E error. So it appears that vme makes no difference. While trying some other combinations, I noticed that with -cpu pentium, while the install CD stops after loading kernel (no BSoD as with other cases, it just stalls), it stops... differently than in other cases. It sits there doing exactly nothing, all host CPUs are idle, there's no progress of any sort. In all other cases the guest cpu is working at full speed all the time. Just.. random observation, yet another one... ;) Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LKML] Re: [PATCH V3] drivers/uio/uio_pci_generic.c: allow access for non-privileged processes
On Thu, Apr 29, 2010 at 12:29:40PM -0700, Tom Lyon wrote: Michael, et al - sorry for the delay, but I've been digesting the comments and researching new approaches. I think the plan for V4 will be to take things entirely out of the UIO framework, and instead have a driver which supports user mode use of well-behaved PCI devices. I would like to use read and write to support access to memory regions, IO regions, or PCI config space. Config space is a bitch because not everything is safe to read or write, but I've come up with a table driven approach which can be run-time extended for non-compliant devices (under root control) which could then enable non-privileged users. For instance, OHCI 1394 devices use a dword in config space which is not formatted as a PCI capability, root can use sysfs to enable access: echo offset readbits writebits /sys/dev/pci/devices/:xx:xx.x/yyy/config_permit A well-behaved PCI device must have memory BARs = 4K for mmaping, must have separate memory space for MSI-X that does not need mmaping by the user driver, must support the PCI 2.3 interrupt masking, and must not go totally crazy with PCI config space (tg3 is real ugly, e1000 is fine). How about page aligned BARs? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fix -mem-path with hugetlbfs
Avi, please apply to both master and uq/master. --- Fallback to qemu_vmalloc in case file_ram_alloc fails. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/exec.c b/exec.c index 467a0e7..7125a93 100644 --- a/exec.c +++ b/exec.c @@ -2821,8 +2821,12 @@ ram_addr_t qemu_ram_alloc(ram_addr_t size) if (mem_path) { #if defined (__linux__) !defined(TARGET_S390X) new_block-host = file_ram_alloc(size, mem_path); -if (!new_block-host) -exit(1); +if (!new_block-host) { +new_block-host = qemu_vmalloc(size); +#ifdef MADV_MERGEABLE +madvise(new_block-host, size, MADV_MERGEABLE); +#endif +} #else fprintf(stderr, -mem-path option unsupported\n); exit(1); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] first round of vhost-net enhancements for net-next
David, The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Thanks! The following changes since commit 7ef527377b88ff05fb122a47619ea506c631c914: Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 (2010-05-02 22:02:06 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost Michael S. Tsirkin (2): tun: add ioctl to modify vnet header size macvtap: add ioctl to modify vnet header size drivers/net/macvtap.c | 31 +++ drivers/net/tun.c | 32 include/linux/if_tun.h |2 ++ 3 files changed, 57 insertions(+), 8 deletions(-) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
On 05/02/2010 10:44 AM, Avi Kivity wrote: On 05/02/2010 08:38 PM, Brian Gerst wrote: On Sun, May 2, 2010 at 10:53 AM, Avi Kivitya...@redhat.com wrote: The fpu code currently uses current-thread_info-status TS_XSAVE as a way to distinguish between XSAVE capable processors and older processors. The decision is not really task specific; instead we use the task status to avoid a global memory reference - the value should be the same across all threads. Eliminate this tie-in into the task structure by using an alternative instruction keyed off the XSAVE cpu feature; this results in shorter and faster code, without introducing a global memory reference. I think you should either just use cpu_has_xsave, or extend this use of alternatives to all cpu features. It doesn't make sense to only do it for xsave. I was trying to avoid a performance regression relative to the current code, as it appears that some care was taken to avoid the memory reference. I agree that it's probably negligible compared to the save/restore code. If the x86 maintainers agree as well, I'll replace it with cpu_has_xsave. I asked Suresh to comment on this, since he wrote the original code. He did confirm that the intent was to avoid a global memory reference. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IOzone test: Introduce postprocessing module v2
This module contains code to postprocess IOzone data in a convenient way so we can generate performance graphs and condensed data. The graph generation part depends on gnuplot, but if the utility is not present, functionality will gracefully degrade. Use the postprocessing module introduced on the previous patch, use it to analyze results and write performance graphs and performance tables. Also, in order for other tests to be able to use the postprocessing code, added the right __init__.py files, so a simple from autotest_lib.client.tests.iozone import postprocessing will work Note: Martin, as patch will ignore and not create the zero-sized files (high time we move to git), if the changes look good to you I can commit them all at once, making sure all files are created. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/iozone/common.py |8 + client/tests/iozone/iozone.py | 25 ++- client/tests/iozone/postprocessing.py | 487 + 3 files changed, 515 insertions(+), 5 deletions(-) create mode 100644 client/tests/__init__.py create mode 100644 client/tests/iozone/__init__.py create mode 100644 client/tests/iozone/common.py create mode 100755 client/tests/iozone/postprocessing.py diff --git a/client/tests/__init__.py b/client/tests/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/__init__.py b/client/tests/iozone/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/common.py b/client/tests/iozone/common.py new file mode 100644 index 000..ce78b85 --- /dev/null +++ b/client/tests/iozone/common.py @@ -0,0 +1,8 @@ +import os, sys +dirname = os.path.dirname(sys.modules[__name__].__file__) +client_dir = os.path.abspath(os.path.join(dirname, .., ..)) +sys.path.insert(0, client_dir) +import setup_modules +sys.path.pop(0) +setup_modules.setup(base_path=client_dir, +root_module_name=autotest_lib.client) diff --git a/client/tests/iozone/iozone.py b/client/tests/iozone/iozone.py index fa3fba4..03c2c04 100755 --- a/client/tests/iozone/iozone.py +++ b/client/tests/iozone/iozone.py @@ -1,5 +1,6 @@ import os, re from autotest_lib.client.bin import test, utils +import postprocessing class iozone(test.test): @@ -63,17 +64,19 @@ class iozone(test.test): self.results = utils.system_output('%s %s' % (cmd, args)) self.auto_mode = (-a in args) -path = os.path.join(self.resultsdir, 'raw_output_%s' % self.iteration) -raw_output_file = open(path, 'w') -raw_output_file.write(self.results) -raw_output_file.close() +self.results_path = os.path.join(self.resultsdir, + 'raw_output_%s' % self.iteration) +self.analysisdir = os.path.join(self.resultsdir, +'analysis_%s' % self.iteration) + +utils.open_write_close(self.results_path, self.results) def __get_section_name(self, desc): return desc.strip().replace(' ', '_') -def postprocess_iteration(self): +def generate_keyval(self): keylist = {} if self.auto_mode: @@ -150,3 +153,15 @@ class iozone(test.test): keylist[key_name] = result self.write_perf_keyval(keylist) + + +def postprocess_iteration(self): +self.generate_keyval() +if self.auto_mode: +a = postprocessing.IOzoneAnalyzer(list_files=[self.results_path], + output_dir=self.analysisdir) +a.analyze() +p = postprocessing.IOzonePlotter(results_file=self.results_path, + output_dir=self.analysisdir) +p.plot_all() + diff --git a/client/tests/iozone/postprocessing.py b/client/tests/iozone/postprocessing.py new file mode 100755 index 000..c995aea --- /dev/null +++ b/client/tests/iozone/postprocessing.py @@ -0,0 +1,487 @@ +#!/usr/bin/python + +Postprocessing module for IOzone. It is capable to pick results from an +IOzone run, calculate the geometric mean for all throughput results for +a given file size or record size, and then generate a series of 2D and 3D +graphs. The graph generation functionality depends on gnuplot, and if it +is not present, functionality degrates gracefully. + +...@copyright: Red Hat 2010 + +import os, sys, optparse, logging, math, time +import common +from autotest_lib.client.common_lib import logging_config, logging_manager +from autotest_lib.client.common_lib import error +from autotest_lib.client.bin import utils, os_dep + + +_LABELS = ['file_size', 'record_size', 'write', 'rewrite', 'read', 'reread', + 'randread', 'randwrite', 'bkwdread', 'recordrewrite', 'strideread', + 'fwrite', 'frewrite', 'fread', 'freread'] + + +def unique(list): + +Return a list of the elements in list, but without duplicates.
Qemu-KVM 0.12.3 and Multipath - Assertion
Hi Qemu/KVM Devel Team, i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3. As backend we use open-iSCSI with dm-multipath. Multipath is configured to queue i/o if no path is available. If we create a failure on all paths, qemu starts to consume 100% CPU due to i/o waits which is ok so far. 1 odd thing: The Monitor Interface is not responding any more ... What es a really blocker is that KVM crashes with: kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: Assertion `bmdma-unit != (uint8_t)-1' failed. after the multipath has reestablisched at least one path. Any ideas? I remember this was working with earlier kernel/kvm/qemu versions. Thanks, Peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Add new subtest iozone_windows
Following the new IOzone postprocessing changes, add a new KVM subtest iozone_windows, which takes advantage of the fact that there's a windows build for the test, so we can ship it on winutils.iso and run it, providing this way the ability to track IO performance for windows guests also. The new test imports the postprocessing library directly from iozone, so it can postprocess the results right after the benchmark is finished on the windows guest. I'll update winutils.iso on the download page soon. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests/iozone_windows.py | 40 ++ client/tests/kvm/tests_base.cfg.sample |7 - 2 files changed, 46 insertions(+), 1 deletions(-) create mode 100644 client/tests/kvm/tests/iozone_windows.py diff --git a/client/tests/kvm/tests/iozone_windows.py b/client/tests/kvm/tests/iozone_windows.py new file mode 100644 index 000..86ec2c4 --- /dev/null +++ b/client/tests/kvm/tests/iozone_windows.py @@ -0,0 +1,40 @@ +import logging, time, os +from autotest_lib.client.common_lib import error +from autotest_lib.client.bin import utils +from autotest_lib.client.tests.iozone import postprocessing +import kvm_subprocess, kvm_test_utils, kvm_utils + + +def run_iozone_windows(test, params, env): + +Run IOzone for windows on a windows guest: +1) Log into a guest +2) Execute the IOzone test contained in the winutils.iso +3) Get results +4) Postprocess it with the IOzone postprocessing module + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. + +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm) +results_path = os.path.join(test.resultsdir, +'raw_output_%s' % test.iteration) +analysisdir = os.path.join(test.resultsdir, 'analysis_%s' % test.iteration) + +# Run IOzone and record its results +c = command=params.get(iozone_cmd) +t = int(params.get(iozone_timeout)) +logging.info(Running IOzone command on guest, timeout %ss, t) +results = session.get_command_output(command=c, timeout=t) +utils.open_write_close(results_path, results) + +# Postprocess the results using the IOzone postprocessing module +logging.info(Iteration succeed, postprocessing) +a = postprocessing.IOzoneAnalyzer(list_files=[results_path], + output_dir=analysisdir) +a.analyze() +p = postprocessing.IOzonePlotter(results_file=results_path, + output_dir=analysisdir) +p.plot_all() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index f1104ed..e21f3c8 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -214,6 +214,11 @@ variants: dst_rsc_dir = C:\ autoit_entry = C:\autoit\stub\stub.au3 +- iozone_windows: unattended_install +type = iozone_windows +iozone_cmd = D:\IOzone\iozone.exe -a +iozone_timeout = 3600 + - guest_s4: install setup unattended_install type = guest_s4 check_s4_support_cmd = grep -q disk /sys/power/state @@ -417,7 +422,7 @@ variants: variants: # Linux section - @Linux: -no autoit +no autoit iozone_windows shutdown_command = shutdown -h now reboot_command = shutdown -r now status_test_command = echo $? -- 1.7.0.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: David Miller da...@davemloft.net Date: Mon, 03 May 2010 15:07:29 -0700 (PDT) From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. Nevermind, reverted. Do you even compile test what you send to people? drivers/net/macvtap.c: In function ‘macvtap_ioctl’: drivers/net/macvtap.c:713: warning: control reaches end of non-void function You're really batting 1000 today Michael...
Re: [GIT PULL] first round of vhost-net enhancements for net-next
On Mon, May 03, 2010 at 03:08:29PM -0700, David Miller wrote: From: David Miller da...@davemloft.net Date: Mon, 03 May 2010 15:07:29 -0700 (PDT) From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. Nevermind, reverted. Do you even compile test what you send to people? drivers/net/macvtap.c: In function ‘macvtap_ioctl’: drivers/net/macvtap.c:713: warning: control reaches end of non-void function You're really batting 1000 today Michael... Ouch. Should teach me not to send out stuff after midnight. Sorry about the noise. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/6] remove support for !KVM_CAP_SET_TSS_ADDR
Kernels 2.6.24 hardcoded slot 0 for the location of VMX rmode TSS pages. Remove support for it. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-memslot/qemu-kvm.c === --- qemu-kvm-memslot.orig/qemu-kvm.c +++ qemu-kvm-memslot/qemu-kvm.c @@ -157,24 +157,7 @@ static void init_slots(void) static int get_free_slot(kvm_context_t kvm) { -int i; -int tss_ext; - -#if defined(KVM_CAP_SET_TSS_ADDR) !defined(__s390__) -tss_ext = kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR); -#else -tss_ext = 0; -#endif - -/* - * on older kernels where the set tss ioctl is not supprted we must save - * slot 0 to hold the extended memory, as the vmx will use the last 3 - * pages of this slot. - */ -if (tss_ext 0) -i = 0; -else -i = 1; +int i = 0; for (; i KVM_MAX_NUM_MEM_REGIONS; ++i) if (!slots[i].len) Index: qemu-kvm-memslot/qemu-kvm-x86.c === --- qemu-kvm-memslot.orig/qemu-kvm-x86.c +++ qemu-kvm-memslot/qemu-kvm-x86.c @@ -35,7 +35,6 @@ static int lm_capable_kernel; int kvm_set_tss_addr(kvm_context_t kvm, unsigned long addr) { -#ifdef KVM_CAP_SET_TSS_ADDR int r; /* * Tell fw_cfg to notify the BIOS to reserve the range. @@ -45,22 +44,16 @@ int kvm_set_tss_addr(kvm_context_t kvm, exit(1); } - r = kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR); - if (r 0) { - r = kvm_vm_ioctl(kvm_state, KVM_SET_TSS_ADDR, addr); - if (r 0) { - fprintf(stderr, kvm_set_tss_addr: %m\n); - return r; - } - return 0; + r = kvm_vm_ioctl(kvm_state, KVM_SET_TSS_ADDR, addr); + if (r 0) { + fprintf(stderr, kvm_set_tss_addr: %m\n); + return r; } -#endif - return -ENOSYS; + return 0; } static int kvm_init_tss(kvm_context_t kvm) { -#ifdef KVM_CAP_SET_TSS_ADDR int r; r = kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR); @@ -74,9 +67,9 @@ static int kvm_init_tss(kvm_context_t kv fprintf(stderr, kvm_init_tss: unable to set tss addr\n); return r; } - + } else { + fprintf(stderr, kvm does not support KVM_CAP_SET_TSS_ADDR\n); } -#endif return 0; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/6] remove unused kvm_dirty_bitmap array
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-memslot/qemu-kvm.c === --- qemu-kvm-memslot.orig/qemu-kvm.c +++ qemu-kvm-memslot/qemu-kvm.c @@ -2154,7 +2154,6 @@ void kvm_set_phys_mem(target_phys_addr_t * dirty pages logging */ /* FIXME: use unsigned long pointer instead of unsigned char */ -unsigned char *kvm_dirty_bitmap = NULL; int kvm_physical_memory_set_dirty_tracking(int enable) { int r = 0; @@ -2163,17 +2162,9 @@ int kvm_physical_memory_set_dirty_tracki return 0; if (enable) { -if (!kvm_dirty_bitmap) { -unsigned bitmap_size = BITMAP_SIZE(phys_ram_size); -kvm_dirty_bitmap = qemu_malloc(bitmap_size); -r = kvm_dirty_pages_log_enable_all(kvm_context); -} +r = kvm_dirty_pages_log_enable_all(kvm_context); } else { -if (kvm_dirty_bitmap) { -r = kvm_dirty_pages_log_reset(kvm_context); -qemu_free(kvm_dirty_bitmap); -kvm_dirty_bitmap = NULL; -} +r = kvm_dirty_pages_log_reset(kvm_context); } return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/6] introduce qemu_ram_map
Which allows drivers to register an mmap region into ram block mappings. To be used by device assignment driver. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm/cpu-common.h === --- qemu-kvm.orig/cpu-common.h +++ qemu-kvm/cpu-common.h @@ -40,6 +40,7 @@ static inline void cpu_register_physical } ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr); +ram_addr_t qemu_ram_map(ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(ram_addr_t); void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ Index: qemu-kvm/exec.c === --- qemu-kvm.orig/exec.c +++ qemu-kvm/exec.c @@ -2805,6 +2805,34 @@ static void *file_ram_alloc(ram_addr_t m } #endif +ram_addr_t qemu_ram_map(ram_addr_t size, void *host) +{ +RAMBlock *new_block; + +size = TARGET_PAGE_ALIGN(size); +new_block = qemu_malloc(sizeof(*new_block)); + +new_block-host = host; + +new_block-offset = last_ram_offset; +new_block-length = size; + +new_block-next = ram_blocks; +ram_blocks = new_block; + +phys_ram_dirty = qemu_realloc(phys_ram_dirty, +(last_ram_offset + size) TARGET_PAGE_BITS); +memset(phys_ram_dirty + (last_ram_offset TARGET_PAGE_BITS), + 0xff, size TARGET_PAGE_BITS); + +last_ram_offset += size; + +if (kvm_enabled()) +kvm_setup_guest_memory(new_block-host, size); + +return new_block-offset; +} + ram_addr_t qemu_ram_alloc(ram_addr_t size) { RAMBlock *new_block; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/6] remove alias support
Aliases were added to workaround kvm's inability to destroy memory regions. This was fixed in 2.6.29, and advertised via KVM_CAP_DESTROY_MEMORY_REGION_WORKS. Also, alias support will be removed from the kernel in July 2010. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm/qemu-kvm.c === --- qemu-kvm.orig/qemu-kvm.c +++ qemu-kvm/qemu-kvm.c @@ -1076,20 +1076,6 @@ int kvm_deassign_pci_device(kvm_context_ } #endif -int kvm_destroy_memory_region_works(kvm_context_t kvm) -{ -int ret = 0; - -#ifdef KVM_CAP_DESTROY_MEMORY_REGION_WORKS -ret = -kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, - KVM_CAP_DESTROY_MEMORY_REGION_WORKS); -if (ret = 0) -ret = 0; -#endif -return ret; -} - int kvm_reinject_control(kvm_context_t kvm, int pit_reinject) { #ifdef KVM_CAP_REINJECT_CONTROL @@ -2055,11 +2041,6 @@ int kvm_main_loop(void) return 0; } -#ifdef TARGET_I386 -static int destroy_region_works = 0; -#endif - - #if !defined(TARGET_I386) int kvm_arch_init_irq_routing(void) { @@ -2071,6 +2052,10 @@ extern int no_hpet; static int kvm_create_context(void) { +static const char upgrade_note[] = +Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n +(see http://sourceforge.net/projects/kvm).\n; + int r; if (!kvm_irqchip) { @@ -2094,9 +2079,16 @@ static int kvm_create_context(void) return -1; } } -#ifdef TARGET_I386 -destroy_region_works = kvm_destroy_memory_region_works(kvm_context); -#endif + +/* There was a nasty bug in kvm-80 that prevents memory slots from being + * destroyed properly. Since we rely on this capability, refuse to work + * with any kernel without this capability. */ +if (!kvm_check_extension(kvm_state, KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) { +fprintf(stderr, +KVM kernel module broken (DESTROY_MEMORY_REGION).\n%s, +upgrade_note); +return -EINVAL; +} r = kvm_arch_init_irq_routing(); if (r 0) { @@ -2134,73 +2126,11 @@ static int kvm_create_context(void) return 0; } -#ifdef TARGET_I386 -static int must_use_aliases_source(target_phys_addr_t addr) -{ -if (destroy_region_works) -return false; -if (addr == 0xa || addr == 0xa8000) -return true; -return false; -} - -static int must_use_aliases_target(target_phys_addr_t addr) -{ -if (destroy_region_works) -return false; -if (addr = 0xe000 addr 0x1ull) -return true; -return false; -} - -static struct mapping { -target_phys_addr_t phys; -ram_addr_t ram; -ram_addr_t len; -} mappings[50]; -static int nr_mappings; - -static struct mapping *find_ram_mapping(ram_addr_t ram_addr) -{ -struct mapping *p; - -for (p = mappings; p mappings + nr_mappings; ++p) { -if (p-ram = ram_addr ram_addr p-ram + p-len) { -return p; -} -} -return NULL; -} - -static struct mapping *find_mapping(target_phys_addr_t start_addr) -{ -struct mapping *p; - -for (p = mappings; p mappings + nr_mappings; ++p) { -if (p-phys = start_addr start_addr p-phys + p-len) { -return p; -} -} -return NULL; -} - -static void drop_mapping(target_phys_addr_t start_addr) -{ -struct mapping *p = find_mapping(start_addr); - -if (p) -*p = mappings[--nr_mappings]; -} -#endif - void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size, ram_addr_t phys_offset) { int r = 0; unsigned long area_flags; -#ifdef TARGET_I386 -struct mapping *p; -#endif if (start_addr + size phys_ram_size) { phys_ram_size = start_addr + size; @@ -2210,19 +2140,13 @@ void kvm_set_phys_mem(target_phys_addr_t area_flags = phys_offset ~TARGET_PAGE_MASK; if (area_flags != IO_MEM_RAM) { -#ifdef TARGET_I386 -if (must_use_aliases_source(start_addr)) { -kvm_destroy_memory_alias(kvm_context, start_addr); -return; -} -if (must_use_aliases_target(start_addr)) -return; -#endif while (size 0) { -p = find_mapping(start_addr); -if (p) { -kvm_unregister_memory_area(kvm_context, p-phys, p-len); -drop_mapping(p-phys); +int slot; + +slot = get_slot(start_addr); +if (slot != -1) { +kvm_unregister_memory_area(kvm_context, slots[slot].phys_addr, + slots[slot].len); } start_addr += TARGET_PAGE_SIZE; if (size TARGET_PAGE_SIZE) { @@ -2241,30 +2165,12 @@ void kvm_set_phys_mem(target_phys_addr_t if (area_flags = TLB_MMIO) return; -#ifdef TARGET_I386 -if (must_use_aliases_source(start_addr)) { -p = find_ram_mapping(phys_offset); -
[patch 3/6] remove unused kvm_get_dirty_pages
And make kvm_unregister_memory_area static. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm-memslot/qemu-kvm.c === --- qemu-kvm-memslot.orig/qemu-kvm.c +++ qemu-kvm-memslot/qemu-kvm.c @@ -639,8 +639,8 @@ void kvm_destroy_phys_mem(kvm_context_t free_slot(memory.slot); } -void kvm_unregister_memory_area(kvm_context_t kvm, uint64_t phys_addr, -unsigned long size) +static void kvm_unregister_memory_area(kvm_context_t kvm, uint64_t phys_addr, + unsigned long size) { int slot = get_container_slot(phys_addr, size); @@ -667,14 +667,6 @@ static int kvm_get_map(kvm_context_t kvm return 0; } -int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long phys_addr, void *buf) -{ -int slot; - -slot = get_slot(phys_addr); -return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf); -} - int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr, unsigned long len, void *opaque, int (*cb)(unsigned long start, Index: qemu-kvm-memslot/qemu-kvm.h === --- qemu-kvm-memslot.orig/qemu-kvm.h +++ qemu-kvm-memslot/qemu-kvm.h @@ -381,14 +381,11 @@ void *kvm_create_phys_mem(kvm_context_t, unsigned long len, int log, int writable); void kvm_destroy_phys_mem(kvm_context_t, unsigned long phys_start, unsigned long len); -void kvm_unregister_memory_area(kvm_context_t, uint64_t phys_start, -unsigned long len); int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_start, unsigned long size); int kvm_register_phys_mem(kvm_context_t kvm, unsigned long phys_start, void *userspace_addr, unsigned long len, int log); -int kvm_get_dirty_pages(kvm_context_t, unsigned long phys_addr, void *buf); int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr, unsigned long end_addr, void *opaque, int (*cb)(unsigned long start, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 6/6] use upstream memslot management code
Drop qemu-kvm's implementation in favour of qemu's, they are functionally equivalent. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-kvm/qemu-kvm.c === --- qemu-kvm.orig/qemu-kvm.c +++ qemu-kvm/qemu-kvm.c @@ -76,21 +76,11 @@ static int io_thread_sigfd = -1; static CPUState *kvm_debug_cpu_requested; -static uint64_t phys_ram_size; - #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT /* The list of ioperm_data */ static QLIST_HEAD(, ioperm_data) ioperm_head; #endif -//#define DEBUG_MEMREG -#ifdef DEBUG_MEMREG -#define DPRINTF(fmt, args...) \ - do { fprintf(stderr, %s:%d fmt , __func__, __LINE__, ##args); } while (0) -#else -#define DPRINTF(fmt, args...) do {} while (0) -#endif - #define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) int kvm_abi = EXPECTED_KVM_API_VERSION; @@ -137,198 +127,6 @@ static inline void clear_gsi(kvm_context DPRINTF(Invalid GSI %u\n, gsi); } -struct slot_info { -unsigned long phys_addr; -unsigned long len; -unsigned long userspace_addr; -unsigned flags; -int logging_count; -}; - -struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS]; - -static void init_slots(void) -{ -int i; - -for (i = 0; i KVM_MAX_NUM_MEM_REGIONS; ++i) -slots[i].len = 0; -} - -static int get_free_slot(kvm_context_t kvm) -{ -int i = 0; - -for (; i KVM_MAX_NUM_MEM_REGIONS; ++i) -if (!slots[i].len) -return i; -return -1; -} - -static void register_slot(int slot, unsigned long phys_addr, - unsigned long len, unsigned long userspace_addr, - unsigned flags) -{ -slots[slot].phys_addr = phys_addr; -slots[slot].len = len; -slots[slot].userspace_addr = userspace_addr; -slots[slot].flags = flags; -} - -static void free_slot(int slot) -{ -slots[slot].len = 0; -slots[slot].logging_count = 0; -} - -static int get_slot(unsigned long phys_addr) -{ -int i; - -for (i = 0; i KVM_MAX_NUM_MEM_REGIONS; ++i) { -if (slots[i].len slots[i].phys_addr = phys_addr -(slots[i].phys_addr + slots[i].len - 1) = phys_addr) -return i; -} -return -1; -} - -/* Returns -1 if this slot is not totally contained on any other, - * and the number of the slot otherwise */ -static int get_container_slot(uint64_t phys_addr, unsigned long size) -{ -int i; - -for (i = 0; i KVM_MAX_NUM_MEM_REGIONS; ++i) -if (slots[i].len slots[i].phys_addr = phys_addr -(slots[i].phys_addr + slots[i].len) = phys_addr + size) -return i; -return -1; -} - -int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr, - unsigned long size) -{ -int slot = get_container_slot(phys_addr, size); -if (slot == -1) -return 0; -return 1; -} - -/* - * dirty pages logging control - */ -static int kvm_dirty_pages_log_change(kvm_context_t kvm, - unsigned long phys_addr, unsigned flags, - unsigned mask) -{ -int r = -1; -int slot = get_slot(phys_addr); - -if (slot == -1) { -fprintf(stderr, BUG: %s: invalid parameters\n, __FUNCTION__); -return 1; -} - -flags = (slots[slot].flags ~mask) | flags; -if (flags == slots[slot].flags) -return 0; -slots[slot].flags = flags; - -{ -struct kvm_userspace_memory_region mem = { -.slot = slot, -.memory_size = slots[slot].len, -.guest_phys_addr = slots[slot].phys_addr, -.userspace_addr = slots[slot].userspace_addr, -.flags = slots[slot].flags, -}; - - -DPRINTF(slot %d start %llx len %llx flags %x\n, -mem.slot, mem.guest_phys_addr, mem.memory_size, mem.flags); -r = kvm_vm_ioctl(kvm_state, KVM_SET_USER_MEMORY_REGION, mem); -if (r 0) -fprintf(stderr, %s: %m\n, __FUNCTION__); -} -return r; -} - -static int kvm_dirty_pages_log_change_all(kvm_context_t kvm, - int (*change)(kvm_context_t kvm, -uint64_t start, -uint64_t len)) -{ -int i, r; - -for (i = r = 0; i KVM_MAX_NUM_MEM_REGIONS r == 0; i++) { -if (slots[i].len) -r = change(kvm, slots[i].phys_addr, slots[i].len); -} -return r; -} - -int kvm_dirty_pages_log_enable_slot(kvm_context_t kvm, uint64_t phys_addr, -uint64_t len) -{ -int slot = get_slot(phys_addr); - -DPRINTF(start % PRIx64 len % PRIx64 \n, phys_addr, len); -if (slot == -1) { -fprintf(stderr, BUG: %s: invalid parameters\n, __func__); -return -EINVAL; -} - -if (slots[slot].logging_count++) -return 0; - -return kvm_dirty_pages_log_change(kvm,
[patch 0/6] qemu-kvm: use upstream memslot code
See individual patches for details. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IOzone test: Introduce postprocessing module v2
only thing that strikes me is whether the gnuplot support should be abstracted out a bit. See tko/plotgraph.py ? On Mon, May 3, 2010 at 2:52 PM, Lucas Meneghel Rodrigues l...@redhat.com wrote: This module contains code to postprocess IOzone data in a convenient way so we can generate performance graphs and condensed data. The graph generation part depends on gnuplot, but if the utility is not present, functionality will gracefully degrade. Use the postprocessing module introduced on the previous patch, use it to analyze results and write performance graphs and performance tables. Also, in order for other tests to be able to use the postprocessing code, added the right __init__.py files, so a simple from autotest_lib.client.tests.iozone import postprocessing will work Note: Martin, as patch will ignore and not create the zero-sized files (high time we move to git), if the changes look good to you I can commit them all at once, making sure all files are created. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/iozone/common.py | 8 + client/tests/iozone/iozone.py | 25 ++- client/tests/iozone/postprocessing.py | 487 + 3 files changed, 515 insertions(+), 5 deletions(-) create mode 100644 client/tests/__init__.py create mode 100644 client/tests/iozone/__init__.py create mode 100644 client/tests/iozone/common.py create mode 100755 client/tests/iozone/postprocessing.py diff --git a/client/tests/__init__.py b/client/tests/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/__init__.py b/client/tests/iozone/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/common.py b/client/tests/iozone/common.py new file mode 100644 index 000..ce78b85 --- /dev/null +++ b/client/tests/iozone/common.py @@ -0,0 +1,8 @@ +import os, sys +dirname = os.path.dirname(sys.modules[__name__].__file__) +client_dir = os.path.abspath(os.path.join(dirname, .., ..)) +sys.path.insert(0, client_dir) +import setup_modules +sys.path.pop(0) +setup_modules.setup(base_path=client_dir, + root_module_name=autotest_lib.client) diff --git a/client/tests/iozone/iozone.py b/client/tests/iozone/iozone.py index fa3fba4..03c2c04 100755 --- a/client/tests/iozone/iozone.py +++ b/client/tests/iozone/iozone.py @@ -1,5 +1,6 @@ import os, re from autotest_lib.client.bin import test, utils +import postprocessing class iozone(test.test): @@ -63,17 +64,19 @@ class iozone(test.test): self.results = utils.system_output('%s %s' % (cmd, args)) self.auto_mode = (-a in args) - path = os.path.join(self.resultsdir, 'raw_output_%s' % self.iteration) - raw_output_file = open(path, 'w') - raw_output_file.write(self.results) - raw_output_file.close() + self.results_path = os.path.join(self.resultsdir, + 'raw_output_%s' % self.iteration) + self.analysisdir = os.path.join(self.resultsdir, + 'analysis_%s' % self.iteration) + + utils.open_write_close(self.results_path, self.results) def __get_section_name(self, desc): return desc.strip().replace(' ', '_') - def postprocess_iteration(self): + def generate_keyval(self): keylist = {} if self.auto_mode: @@ -150,3 +153,15 @@ class iozone(test.test): keylist[key_name] = result self.write_perf_keyval(keylist) + + + def postprocess_iteration(self): + self.generate_keyval() + if self.auto_mode: + a = postprocessing.IOzoneAnalyzer(list_files=[self.results_path], + output_dir=self.analysisdir) + a.analyze() + p = postprocessing.IOzonePlotter(results_file=self.results_path, + output_dir=self.analysisdir) + p.plot_all() + diff --git a/client/tests/iozone/postprocessing.py b/client/tests/iozone/postprocessing.py new file mode 100755 index 000..c995aea --- /dev/null +++ b/client/tests/iozone/postprocessing.py @@ -0,0 +1,487 @@ +#!/usr/bin/python + +Postprocessing module for IOzone. It is capable to pick results from an +IOzone run, calculate the geometric mean for all throughput results for +a given file size or record size, and then generate a series of 2D and 3D +graphs. The graph generation functionality depends on gnuplot, and if it +is not present, functionality degrates gracefully. + +...@copyright: Red Hat 2010 + +import os, sys, optparse, logging, math, time +import common +from autotest_lib.client.common_lib import logging_config, logging_manager +from autotest_lib.client.common_lib import error +from autotest_lib.client.bin import utils, os_dep + + +_LABELS =
Re: [PATCH] IOzone test: Introduce postprocessing module v2
On Mon, 2010-05-03 at 16:52 -0700, Martin Bligh wrote: only thing that strikes me is whether the gnuplot support should be abstracted out a bit. See tko/plotgraph.py ? I thought about it. Ideally, we would do all the plotting using a python library, such as matplotlib, which has a decent API. However, I spent quite some time trying to figure out how to draw the surface graphs using matplot lib and in the end, I gave up (3d support on that lib is just starting). There are some other libs, such as mayavi (http://mayavi.sourceforge.net) that I would like to try out on the near future. Your code in plotgraph.py is aimed to 2D graphs, a good candidate for replacement using matplotlib (their support to 2D is excellent). So, instead of spending much time encapsulating gnuplot on a nice API, I'd prefer to have this intermediate work (anyway it does the job) and when possible, get back to this subject. What do you think? On Mon, May 3, 2010 at 2:52 PM, Lucas Meneghel Rodrigues l...@redhat.com wrote: This module contains code to postprocess IOzone data in a convenient way so we can generate performance graphs and condensed data. The graph generation part depends on gnuplot, but if the utility is not present, functionality will gracefully degrade. Use the postprocessing module introduced on the previous patch, use it to analyze results and write performance graphs and performance tables. Also, in order for other tests to be able to use the postprocessing code, added the right __init__.py files, so a simple from autotest_lib.client.tests.iozone import postprocessing will work Note: Martin, as patch will ignore and not create the zero-sized files (high time we move to git), if the changes look good to you I can commit them all at once, making sure all files are created. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/iozone/common.py |8 + client/tests/iozone/iozone.py | 25 ++- client/tests/iozone/postprocessing.py | 487 + 3 files changed, 515 insertions(+), 5 deletions(-) create mode 100644 client/tests/__init__.py create mode 100644 client/tests/iozone/__init__.py create mode 100644 client/tests/iozone/common.py create mode 100755 client/tests/iozone/postprocessing.py diff --git a/client/tests/__init__.py b/client/tests/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/__init__.py b/client/tests/iozone/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/common.py b/client/tests/iozone/common.py new file mode 100644 index 000..ce78b85 --- /dev/null +++ b/client/tests/iozone/common.py @@ -0,0 +1,8 @@ +import os, sys +dirname = os.path.dirname(sys.modules[__name__].__file__) +client_dir = os.path.abspath(os.path.join(dirname, .., ..)) +sys.path.insert(0, client_dir) +import setup_modules +sys.path.pop(0) +setup_modules.setup(base_path=client_dir, +root_module_name=autotest_lib.client) diff --git a/client/tests/iozone/iozone.py b/client/tests/iozone/iozone.py index fa3fba4..03c2c04 100755 --- a/client/tests/iozone/iozone.py +++ b/client/tests/iozone/iozone.py @@ -1,5 +1,6 @@ import os, re from autotest_lib.client.bin import test, utils +import postprocessing class iozone(test.test): @@ -63,17 +64,19 @@ class iozone(test.test): self.results = utils.system_output('%s %s' % (cmd, args)) self.auto_mode = (-a in args) -path = os.path.join(self.resultsdir, 'raw_output_%s' % self.iteration) -raw_output_file = open(path, 'w') -raw_output_file.write(self.results) -raw_output_file.close() +self.results_path = os.path.join(self.resultsdir, + 'raw_output_%s' % self.iteration) +self.analysisdir = os.path.join(self.resultsdir, +'analysis_%s' % self.iteration) + +utils.open_write_close(self.results_path, self.results) def __get_section_name(self, desc): return desc.strip().replace(' ', '_') -def postprocess_iteration(self): +def generate_keyval(self): keylist = {} if self.auto_mode: @@ -150,3 +153,15 @@ class iozone(test.test): keylist[key_name] = result self.write_perf_keyval(keylist) + + +def postprocess_iteration(self): +self.generate_keyval() +if self.auto_mode: +a = postprocessing.IOzoneAnalyzer(list_files=[self.results_path], + output_dir=self.analysisdir) +a.analyze() +p = postprocessing.IOzonePlotter(results_file=self.results_path, +
KVM: x86: properly update ready_for_interrupt_injection
The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in qemu. Fix by always updating state before returning to userspace. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..dff08e5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4653,7 +4653,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) } srcu_read_unlock(kvm-srcu, vcpu-srcu_idx); - post_kvm_run_save(vcpu); vapic_exit(vcpu); @@ -4703,6 +4702,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = __vcpu_run(vcpu); out: + post_kvm_run_save(vcpu); if (vcpu-sigset_active) sigprocmask(SIG_SETMASK, sigsaved, NULL); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? I've given up on figuring out the block device. What seem to me to be sane semantics along the lines of memory barriers are foreign to disk people: they want (and depend on) flushing everywhere. For example, tdb transactions do not require a flush, they only require what I would call a barrier: that prior data be written out before any future data. Surely that would be more efficient in general than a flush! In fact, TDB wants only writes to *that file* (and metadata) written out first; it has no ordering issues with other I/O on the same device. A generic I/O interface would allow you to specify this request depends on these outstanding requests and leave it at that. It might have some sync flush command for dumb applications and OSes. The userspace API might be not be as precise and only allow such a barrier against all prior writes on this fd. ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IOzone test: Introduce postprocessing module v2
yup, fair enough. Go ahead and check it in. If we end up doing this in another test, we should make an abstraction On Mon, May 3, 2010 at 5:39 PM, Lucas Meneghel Rodrigues l...@redhat.com wrote: On Mon, 2010-05-03 at 16:52 -0700, Martin Bligh wrote: only thing that strikes me is whether the gnuplot support should be abstracted out a bit. See tko/plotgraph.py ? I thought about it. Ideally, we would do all the plotting using a python library, such as matplotlib, which has a decent API. However, I spent quite some time trying to figure out how to draw the surface graphs using matplot lib and in the end, I gave up (3d support on that lib is just starting). There are some other libs, such as mayavi (http://mayavi.sourceforge.net) that I would like to try out on the near future. Your code in plotgraph.py is aimed to 2D graphs, a good candidate for replacement using matplotlib (their support to 2D is excellent). So, instead of spending much time encapsulating gnuplot on a nice API, I'd prefer to have this intermediate work (anyway it does the job) and when possible, get back to this subject. What do you think? On Mon, May 3, 2010 at 2:52 PM, Lucas Meneghel Rodrigues l...@redhat.com wrote: This module contains code to postprocess IOzone data in a convenient way so we can generate performance graphs and condensed data. The graph generation part depends on gnuplot, but if the utility is not present, functionality will gracefully degrade. Use the postprocessing module introduced on the previous patch, use it to analyze results and write performance graphs and performance tables. Also, in order for other tests to be able to use the postprocessing code, added the right __init__.py files, so a simple from autotest_lib.client.tests.iozone import postprocessing will work Note: Martin, as patch will ignore and not create the zero-sized files (high time we move to git), if the changes look good to you I can commit them all at once, making sure all files are created. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/iozone/common.py | 8 + client/tests/iozone/iozone.py | 25 ++- client/tests/iozone/postprocessing.py | 487 + 3 files changed, 515 insertions(+), 5 deletions(-) create mode 100644 client/tests/__init__.py create mode 100644 client/tests/iozone/__init__.py create mode 100644 client/tests/iozone/common.py create mode 100755 client/tests/iozone/postprocessing.py diff --git a/client/tests/__init__.py b/client/tests/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/__init__.py b/client/tests/iozone/__init__.py new file mode 100644 index 000..e69de29 diff --git a/client/tests/iozone/common.py b/client/tests/iozone/common.py new file mode 100644 index 000..ce78b85 --- /dev/null +++ b/client/tests/iozone/common.py @@ -0,0 +1,8 @@ +import os, sys +dirname = os.path.dirname(sys.modules[__name__].__file__) +client_dir = os.path.abspath(os.path.join(dirname, .., ..)) +sys.path.insert(0, client_dir) +import setup_modules +sys.path.pop(0) +setup_modules.setup(base_path=client_dir, + root_module_name=autotest_lib.client) diff --git a/client/tests/iozone/iozone.py b/client/tests/iozone/iozone.py index fa3fba4..03c2c04 100755 --- a/client/tests/iozone/iozone.py +++ b/client/tests/iozone/iozone.py @@ -1,5 +1,6 @@ import os, re from autotest_lib.client.bin import test, utils +import postprocessing class iozone(test.test): @@ -63,17 +64,19 @@ class iozone(test.test): self.results = utils.system_output('%s %s' % (cmd, args)) self.auto_mode = (-a in args) - path = os.path.join(self.resultsdir, 'raw_output_%s' % self.iteration) - raw_output_file = open(path, 'w') - raw_output_file.write(self.results) - raw_output_file.close() + self.results_path = os.path.join(self.resultsdir, + 'raw_output_%s' % self.iteration) + self.analysisdir = os.path.join(self.resultsdir, + 'analysis_%s' % self.iteration) + + utils.open_write_close(self.results_path, self.results) def __get_section_name(self, desc): return desc.strip().replace(' ', '_') - def postprocess_iteration(self): + def generate_keyval(self): keylist = {} if self.auto_mode: @@ -150,3 +153,15 @@ class iozone(test.test): keylist[key_name] = result self.write_perf_keyval(keylist) + + + def postprocess_iteration(self): + self.generate_keyval() + if self.auto_mode: + a = postprocessing.IOzoneAnalyzer(list_files=[self.results_path], +
KVM call agenda for May 4
Please send in any agenda items you are interested in covering. If we have a lack of agenda items I'll cancel the week's call. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Qemu-KVM 0.12.3 and Multipath - Assertion
Hi Peter, On 03.05.2010 23:26, Peter Lieven wrote: Hi Qemu/KVM Devel Team, i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3. As backend we use open-iSCSI with dm-multipath. Multipath is configured to queue i/o if no path is available. If we create a failure on all paths, qemu starts to consume 100% CPU due to i/o waits which is ok so far. 1 odd thing: The Monitor Interface is not responding any more ... What es a really blocker is that KVM crashes with: kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: Assertion `bmdma-unit != (uint8_t)-1' failed. after the multipath has reestablisched at least one path. Any ideas? I remember this was working with earlier kernel/kvm/qemu versions. I have the same issue on my machine, although I am using local storage (LVM or a physical disk) to write my data to. I reported the Assertion failed on March 17th to the list. Marcello and Avi had asked some question back then, but I don't know if they have come up with a fix for it. Regards André -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html