Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT
On 11/18/10 01:41, Hidetoshi Seto wrote: This patch introduce a fallback mechanism for old systems that do not support utimensat(). This fix build failure with following warnings: hw/virtio-9p-local.c: In function 'local_utimensat': hw/virtio-9p-local.c:479: warning: implicit declaration of function 'utimensat' hw/virtio-9p-local.c:479: warning: nested extern declaration of 'utimensat' and: hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod': hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this function) hw/virtio-9p.c:1410: error: (Each undeclared identifier is reported only once hw/virtio-9p.c:1410: error: for each function it appears in.) hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this function) hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod': hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this function) v4: - Use tv_now.tv_usec - Rebased on latest qemu.git v3: - Use better alternative handling for UTIME_NOW/OMIT - Move qemu_utimensat() to cutils.c V2: - Introduce qemu_utimensat() Acked-by: Chris Wright chr...@sous-sol.org Acked-by: M. Mohan Kumar mo...@in.ibm.com Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com Hi Hidetoshi, I think the idea of the patch is good, but please move qemu_utimensat() to oslib-posix.c and provide a wrapper for oslib-win32.c. It is emulation for a system library function, so it doesn't belong in cutils.c, but rather in the oslib group. Thanks, Jes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: clear vapic after reset
Clear the vapic address immediately after reset. This allows dual-boot guests to work efficiently, and more importantly, works around the bios using 'rep insb' to read in the option rom and confusing the vapic machinery. Signed-off-by: Avi Kivity a...@redhat.com --- target-i386/kvm.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index b7b2430..95e5d02 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -439,8 +439,20 @@ int kvm_arch_init_vcpu(CPUState *env) return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data); } +static void kvm_clear_vapic(CPUState *env) +{ +#ifdef KVM_SET_VAPIC_ADDR +struct kvm_vapic_addr va = { +.vapic_addr = 0, +}; + +kvm_vcpu_ioctl(env, KVM_SET_VAPIC_ADDR, va); +#endif +} + void kvm_arch_reset_vcpu(CPUState *env) { +kvm_clear_vapic(env); env-exception_injected = -1; env-interrupt_injected = -1; env-nmi_injected = 0; -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: take kvm_lock for hardware_disable() during cpu hotplug
(2010/11/18 14:45), Zachary Amsden wrote: No, I believe your patch is correct and the lock should be there. Did you test with spinlock debugging just to be sure? Sorry but no. I have no experience with cpu hotplug. So I thought it would take too much time to do real test by myself and reported like this this time. Any easy way to test? Yes, quite easy. Some systems may not let cpu0 go offline, but you can manually disable and re-enable the other processors: [r...@mysore ~]# echo 0 /sys/devices/system/cpu/cpu1/online [r...@mysore ~]# echo 1 /sys/devices/system/cpu/cpu1/online Cheers, Zach Thanks a lot! I tried and got a log like this: kernel: [ 422.084620] kvm: disabling virtualization on CPU1 kernel: [ 422.085757] CPU 1 is now offline kernel: [ 422.085766] lockdep: fixing up alternatives. kernel: [ 422.085780] SMP alternatives: switching to UP code kernel: [ 472.081069] lockdep: fixing up alternatives. kernel: [ 472.081080] SMP alternatives: switching to SMP code kernel: [ 472.099182] Booting Node 0 Processor 1 APIC 0x1 kernel: [ 422.104799] kvm: enabling virtualization on CPU1 Working correctly, I think. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT
Hello, Am Donnerstag 18 November 2010 01:41:39 schrieb Hidetoshi Seto: This patch introduce a fallback mechanism for old systems that do not support utimensat(). This fix build failure with following warnings: +#ifdef CONFIG_UTIMENSAT +return utimensat(dirfd, path, times, flags); +#else +/* Fallback: use utimes() instead of utimensat() */ Since we also had a problem with utimestat() some time ago with Samba http://lists.samba.org/archive/samba-technical/2010-November/074613.html I'd like to comment on that: Your have to be careful about compile-time-detection and runtime-detection: If you later run your utimestat()-enabled binary on an older kernel not supporting that syscall, you get -1 as the return-value and errno=ENOSYS. So even if you detected utimesatat() during compile-time, please always provide a fallback for run-time. This is less important for people compiling there own version of kvm, but is essential for Linux distributions, since people often run newer kvm versions on older kernels. Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer h...@univention.de Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.de signature.asc Description: This is a digitally signed message part.
Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT
(2010/11/18 17:02), Jes Sorensen wrote: On 11/18/10 01:41, Hidetoshi Seto wrote: This patch introduce a fallback mechanism for old systems that do not support utimensat(). This fix build failure with following warnings: hw/virtio-9p-local.c: In function 'local_utimensat': hw/virtio-9p-local.c:479: warning: implicit declaration of function 'utimensat' hw/virtio-9p-local.c:479: warning: nested extern declaration of 'utimensat' and: hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod': hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this function) hw/virtio-9p.c:1410: error: (Each undeclared identifier is reported only once hw/virtio-9p.c:1410: error: for each function it appears in.) hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this function) hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod': hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this function) v4: - Use tv_now.tv_usec - Rebased on latest qemu.git v3: - Use better alternative handling for UTIME_NOW/OMIT - Move qemu_utimensat() to cutils.c V2: - Introduce qemu_utimensat() Acked-by: Chris Wright chr...@sous-sol.org Acked-by: M. Mohan Kumar mo...@in.ibm.com Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com Hi Hidetoshi, I think the idea of the patch is good, but please move qemu_utimensat() to oslib-posix.c and provide a wrapper for oslib-win32.c. It is emulation for a system library function, so it doesn't belong in cutils.c, but rather in the oslib group. Unfortunately one fact is that I'm not familiar with win32 codes so I don't have any idea how the wrapper for win32 will be... If someone could kindly tell me about the win32 part, I could update this patch to v5, but even though I have no test environment for the new part :- Could we wait an incremental patch on this v4? Can somebody help me? Volunteers? Thanks, H.Seto -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote: Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. What locking complexity is there with copying entry approach? - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- The below is compile tested only. Sending out for early flames/feedback. Please review! include/linux/kvm_host.h |4 ++ virt/kvm/eventfd.c | 81 +++-- virt/kvm/irq_comm.c |6 ++- 3 files changed, 78 insertions(+), 13 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..b6f7047 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} +static inline void kvm_irqfd_update(struct kvm *kvm) {} static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..49c1864 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -44,14 +44,18 @@ */ struct _irqfd { - struct kvm *kvm; - struct eventfd_ctx *eventfd; - int gsi; - struct list_head list; - poll_tablept; - wait_queue_t wait; - struct work_structinject; - struct work_structshutdown; + /* Used for MSI fast-path */ + struct kvm *kvm; + wait_queue_t wait; + struct kvm_kernel_irq_routing_entry __rcu *irq_entry; + /* Used for level IRQ fast-path */ + int gsi; + struct work_struct inject; + /* Used for setup/shutdown */ + struct eventfd_ctx *eventfd; + struct list_head list; + poll_table pt; + struct work_struct shutdown; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = irqfd-irq_entry; Why not rcu_dereference()? And why it can't be zero here? /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + if (irq) + kvm_set_msi(irq, irqfd-kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); + else + schedule_work(irqfd-inject); + rcu_read_unlock(); + } if (flags POLLHUP) { /* The eventfd is closing, detach from KVM */ @@ -166,6 +178,7 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, static int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) { + struct kvm_irq_routing_table *irq_rt; struct _irqfd *irqfd, *tmp; struct file *file = NULL; struct eventfd_ctx *eventfd = NULL; @@ -215,6 +228,10 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) goto fail; } + rcu_read_lock(); + irqfd_update(kvm, irqfd,
Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT
(2010/11/18 17:28), Philipp Hahn wrote: Hello, Am Donnerstag 18 November 2010 01:41:39 schrieb Hidetoshi Seto: This patch introduce a fallback mechanism for old systems that do not support utimensat(). This fix build failure with following warnings: +#ifdef CONFIG_UTIMENSAT +return utimensat(dirfd, path, times, flags); +#else +/* Fallback: use utimes() instead of utimensat() */ Since we also had a problem with utimestat() some time ago with Samba http://lists.samba.org/archive/samba-technical/2010-November/074613.html I'd like to comment on that: Your have to be careful about compile-time-detection and runtime-detection: If you later run your utimestat()-enabled binary on an older kernel not supporting that syscall, you get -1 as the return-value and errno=ENOSYS. So even if you detected utimesatat() during compile-time, please always provide a fallback for run-time. This is less important for people compiling there own version of kvm, but is essential for Linux distributions, since people often run newer kvm versions on older kernels. Hum, you have a good point. Well, then I'll change it like: -#ifdef CONFIG_UTIMENSAT -return utimensat(dirfd, path, times, flags); -#else +{ +int ret = utimensat(dirfd, path, times, flags); +if (ret != -1 || errno != ENOSYS) { +return ret; +} +} Thanks, H.Seto -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT
On 11/18/10 09:48, Hidetoshi Seto wrote: (2010/11/18 17:02), Jes Sorensen wrote: Hi Hidetoshi, I think the idea of the patch is good, but please move qemu_utimensat() to oslib-posix.c and provide a wrapper for oslib-win32.c. It is emulation for a system library function, so it doesn't belong in cutils.c, but rather in the oslib group. Unfortunately one fact is that I'm not familiar with win32 codes so I don't have any idea how the wrapper for win32 will be... If someone could kindly tell me about the win32 part, I could update this patch to v5, but even though I have no test environment for the new part :- Could we wait an incremental patch on this v4? Can somebody help me? Volunteers? Hi Hidetoshi, I don't actually know much about win32 myself, the only thing I do is to try and cross-compile for it using mingw32 to make sure the build doesn't break. One option is to leave it open, or put in a dummy wrapper which asserts in the win32 part of the code, so that someone who is interested in win32 can fix it up. That should be pretty easy to do, and I think thats a fine starting point. Cheers, Jes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 11:05:22AM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote: Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. What locking complexity is there with copying entry approach? Without RCU, we need two locks: - irqfd lock to scan the list of irqfds - eventfd wqh lock in the irqfd to update the entry To update all irqfds on list, wqh lock would be nested within irqfd lock. lock(kvm-irqfds.lock) list_for_each(irqfd, kvm-irqfds.list) lock(irqfd-wqh) update(irqfd) unlock(irqfd-wqh) unlock(kvm-irqfds.lock) Problem is, irqfd is nested within wqh for cleanup (POLLHUP) path. With RCU we do assign and let sync take care of flushing old entries out. - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- The below is compile tested only. Sending out for early flames/feedback. Please review! include/linux/kvm_host.h |4 ++ virt/kvm/eventfd.c | 81 +++-- virt/kvm/irq_comm.c |6 ++- 3 files changed, 78 insertions(+), 13 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..b6f7047 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} +static inline void kvm_irqfd_update(struct kvm *kvm) {} static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..49c1864 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -44,14 +44,18 @@ */ struct _irqfd { - struct kvm *kvm; - struct eventfd_ctx *eventfd; - int gsi; - struct list_head list; - poll_tablept; - wait_queue_t wait; - struct work_structinject; - struct work_structshutdown; + /* Used for MSI fast-path */ + struct kvm *kvm; + wait_queue_t wait; + struct kvm_kernel_irq_routing_entry __rcu *irq_entry; + /* Used for level IRQ fast-path */ + int gsi; + struct work_struct inject; + /* Used for setup/shutdown */ + struct eventfd_ctx *eventfd; + struct list_head list; + poll_table pt; + struct work_struct shutdown; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = irqfd-irq_entry; Why not rcu_dereference()? And why it can't be zero here? /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + if (irq) + kvm_set_msi(irq, irqfd-kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); + else +
Re: 2.6.37-rc2 after KVM shutdown - unregister_netdevice: waiting for vmtst01eth0 to become free. Usage count = 1
Le jeudi 18 novembre 2010 à 07:28 +0100, Nikola Ciprich a écrit : Yep, this is a known problem, thanks ! fix is there : http://patchwork.ozlabs.org/patch/71354/ Thanks Eric, this indeed fixes the problem.. I noticed the fix didn't make it to 2.6.37-rc2-git3 though, maybe it just got omited? anyways, thanks for help! n. Its in David Miller net-2.6 tree (all pending network patches for current linux-2.6 version), so it'll be included next time David push its tree to Linus, dont worry ;) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 11:16:02AM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 11:05:22AM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote: Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. What locking complexity is there with copying entry approach? Without RCU, we need two locks: - irqfd lock to scan the list of irqfds - eventfd wqh lock in the irqfd to update the entry To update all irqfds on list, wqh lock would be nested within irqfd lock. lock(kvm-irqfds.lock) list_for_each(irqfd, kvm-irqfds.list) lock(irqfd-wqh) update(irqfd) unlock(irqfd-wqh) unlock(kvm-irqfds.lock) Problem is, irqfd is nested within wqh for cleanup (POLLHUP) path. With RCU we do assign and let sync take care of flushing old entries out. Make sense. What about other comments :) - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- The below is compile tested only. Sending out for early flames/feedback. Please review! include/linux/kvm_host.h |4 ++ virt/kvm/eventfd.c | 81 +++-- virt/kvm/irq_comm.c |6 ++- 3 files changed, 78 insertions(+), 13 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..b6f7047 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} +static inline void kvm_irqfd_update(struct kvm *kvm) {} static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..49c1864 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -44,14 +44,18 @@ */ struct _irqfd { - struct kvm *kvm; - struct eventfd_ctx *eventfd; - int gsi; - struct list_head list; - poll_tablept; - wait_queue_t wait; - struct work_structinject; - struct work_structshutdown; + /* Used for MSI fast-path */ + struct kvm *kvm; + wait_queue_t wait; + struct kvm_kernel_irq_routing_entry __rcu *irq_entry; + /* Used for level IRQ fast-path */ + int gsi; + struct work_struct inject; + /* Used for setup/shutdown */ + struct eventfd_ctx *eventfd; + struct list_head list; + poll_table pt; + struct work_struct shutdown; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = irqfd-irq_entry; Why not rcu_dereference()? And why it can't be zero here? /* An event has been signaled, inject an interrupt */ -
Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
On 11/18/2010 03:58 AM, Sheng Yang wrote: On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote: On 11/15/2010 11:15 AM, Sheng Yang wrote: This patch enable per-vector mask for assigned devices using MSI-X. This patch provided two new APIs: one is for guest to specific device's MSI-X table address in MMIO, the other is for userspace to get information about mask bit. All the mask bit operation are kept in kernel, in order to accelerate. Userspace shouldn't access the device MMIO directly for the information, instead it should uses provided API to do so. Signed-off-by: Sheng Yangsh...@linux.intel.com --- arch/x86/kvm/x86.c |1 + include/linux/kvm.h | 32 + include/linux/kvm_host.h |5 + virt/kvm/assigned-dev.c | 318 +- 4 files changed, 355 insertions(+), 1 deletions(-) Documentation? For we are keeping changing the API for last several versions, I'd like to settle down the API first. Would bring back the document after API was agreed. Maybe for APIs we should start with only the documentation patch, agree on that, and move on to the implementation. What if it's a 64-bit write on a 32-bit host? In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't seen any OS is using it in this way now, so I think we can leave it later. Also seems QEmu doesn't got the way to handle 64bit MMIO. There's a difference, if the API doesn't support it, we can't add it later without changing both kernel and userspace. That's not very good. We should do the entire thing in the kernel or in userspace. We can have a new EXIT_REASON to let userspace know an msix entry changed, and it should read it from the kernel. If you look it in this way: 1. Mask bit owned by kernel. 2. Routing owned by userspace. 3. Read the routing in kernel is an speed up for normal operation - because kernel can read from them. So I think the logic here is clear to understand. Still, it's complicated and the state is split across multiple components. But if we can modify the routing in kernel, it would be raise some sync issues due to both kernel and userspace own routing. So maybe the better solution is move the routing to kernel. That may work, but I don't think we can do this for vfio. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
On 11/18/2010 04:22 AM, Sheng Yang wrote: On Wednesday 17 November 2010 22:01:41 Avi Kivity wrote: On 11/15/2010 11:15 AM, Sheng Yang wrote: We need to query the entry later. +int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi, + struct kvm_kernel_irq_routing_entry *entry) +{ + int count = 0; + struct kvm_kernel_irq_routing_entry *ei = NULL; + struct kvm_irq_routing_table *irq_rt; + struct hlist_node *n; + + rcu_read_lock(); + irq_rt = rcu_dereference(kvm-irq_routing); + if (gsi irq_rt-nr_rt_entries) + hlist_for_each_entry(ei, n,irq_rt-map[gsi], link) + count++; + if (count == 1) + *entry = *ei; + rcu_read_unlock(); + + return (count != 1); +} + Not good form to rely on ei being valid after the loop. I guess this is only useful for msi? Need to document it. May can be used for others later, it's somehow generic. Where should I document it? Non-msi interrupts (wires) can be wired to more than one interrupt line (and often are - pic/ioapic). You can document it by adding _msi to the name. *entry may be stale after rcu_read_unlock(). Is this a problem? I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re- order issue would happen. If the guest is reading and writing the field at the same time(from two cpus), it should got some kinds of sync method for itself - or it may not care what's the reading result(like the one after msix_mask_irq()). I guess so. Michael/Alex? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/6] KVM: MMU: fix forgot flush vcpu tlbs
On 11/18/2010 09:17 AM, Xiao Guangrong wrote: On 11/18/2010 01:36 AM, Marcelo Tosatti wrote: I don't think we need to flush immediately; set a tlb dirty bit somewhere that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page() can consult the bit and force a flush if set. Yep. Great, i'll do it in the v3. Do we need a simple bug fix patch(which immediately flush tlbs) for backport first? Oh yes. Simple fix first, clever ideas later (which will likely need to be fixed anyway). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 11:05:22AM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote: Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. What locking complexity is there with copying entry approach? - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- The below is compile tested only. Sending out for early flames/feedback. Please review! include/linux/kvm_host.h |4 ++ virt/kvm/eventfd.c | 81 +++-- virt/kvm/irq_comm.c |6 ++- 3 files changed, 78 insertions(+), 13 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..b6f7047 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} +static inline void kvm_irqfd_update(struct kvm *kvm) {} static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..49c1864 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -44,14 +44,18 @@ */ struct _irqfd { - struct kvm *kvm; - struct eventfd_ctx *eventfd; - int gsi; - struct list_head list; - poll_tablept; - wait_queue_t wait; - struct work_structinject; - struct work_structshutdown; + /* Used for MSI fast-path */ + struct kvm *kvm; + wait_queue_t wait; + struct kvm_kernel_irq_routing_entry __rcu *irq_entry; + /* Used for level IRQ fast-path */ + int gsi; + struct work_struct inject; + /* Used for setup/shutdown */ + struct eventfd_ctx *eventfd; + struct list_head list; + poll_table pt; + struct work_struct shutdown; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = irqfd-irq_entry; Why not rcu_dereference()? Of course. Good catch, thanks. And why it can't be zero here? It can, I check below. /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + if (irq) + kvm_set_msi(irq, irqfd-kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); + else + schedule_work(irqfd-inject); + rcu_read_unlock(); + } if (flags POLLHUP) { /* The eventfd is closing, detach from KVM */ @@ -166,6 +178,7 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, static int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) { + struct kvm_irq_routing_table *irq_rt; struct _irqfd *irqfd, *tmp; struct file *file = NULL; struct eventfd_ctx *eventfd = NULL; @@ -215,6 +228,10 @@
Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
On Thu, Nov 18, 2010 at 11:28:02AM +0200, Avi Kivity wrote: On 11/18/2010 03:58 AM, Sheng Yang wrote: On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote: On 11/15/2010 11:15 AM, Sheng Yang wrote: This patch enable per-vector mask for assigned devices using MSI-X. This patch provided two new APIs: one is for guest to specific device's MSI-X table address in MMIO, the other is for userspace to get information about mask bit. All the mask bit operation are kept in kernel, in order to accelerate. Userspace shouldn't access the device MMIO directly for the information, instead it should uses provided API to do so. Signed-off-by: Sheng Yangsh...@linux.intel.com --- arch/x86/kvm/x86.c |1 + include/linux/kvm.h | 32 + include/linux/kvm_host.h |5 + virt/kvm/assigned-dev.c | 318 +- 4 files changed, 355 insertions(+), 1 deletions(-) Documentation? For we are keeping changing the API for last several versions, I'd like to settle down the API first. Would bring back the document after API was agreed. Maybe for APIs we should start with only the documentation patch, agree on that, and move on to the implementation. What if it's a 64-bit write on a 32-bit host? In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't seen any OS is using it in this way now, so I think we can leave it later. Also seems QEmu doesn't got the way to handle 64bit MMIO. There's a difference, if the API doesn't support it, we can't add it later without changing both kernel and userspace. That's not very good. We should do the entire thing in the kernel or in userspace. We can have a new EXIT_REASON to let userspace know an msix entry changed, and it should read it from the kernel. If you look it in this way: 1. Mask bit owned by kernel. 2. Routing owned by userspace. 3. Read the routing in kernel is an speed up for normal operation - because kernel can read from them. So I think the logic here is clear to understand. Still, it's complicated and the state is split across multiple components. But if we can modify the routing in kernel, it would be raise some sync issues due to both kernel and userspace own routing. So maybe the better solution is move the routing to kernel. That may work, but I don't think we can do this for vfio. Actually, if done right it might work for VFIO: we would need 2 eventfds to notify it that it has to mask/unmask entries. The interface would need to be careful to keep programming of the guest side emulation and the masking in the backend device completely separate. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote: *entry may be stale after rcu_read_unlock(). Is this a problem? I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re- order issue would happen. If the guest is reading and writing the field at the same time(from two cpus), it should got some kinds of sync method for itself - or it may not care what's the reading result(like the one after msix_mask_irq()). I guess so. Michael/Alex? This is kvm_get_irq_routing_entry which is used for table reads, correct? Actually, the pci read *is* the sync method that guests use, they rely on reads to flush out all previous writes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
On Wed, Nov 17, 2010 at 09:29:22AM +0800, Sheng Yang wrote: +#define KVM_MSIX_TYPE_ASSIGNED_DEV 1 + +#define KVM_MSIX_FLAG_MASKBIT(1 0) +#define KVM_MSIX_FLAG_QUERY_MASKBIT (1 0) + +struct kvm_msix_entry { + __u32 id; + __u32 type; Is type really necessary? Will it ever differ from KVM_MSIX_TYPE_ASSIGNED_DEV? This is the suggestion from Michael. He want it to be reused by emulated/pv devices. So I add the type field here. Maybe id field can be reused for this somehow. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On 11/18/2010 12:12 AM, Michael S. Tsirkin wrote: Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. @@ -166,6 +178,7 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, static int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) { + struct kvm_irq_routing_table *irq_rt; struct _irqfd *irqfd, *tmp; struct file *file = NULL; struct eventfd_ctx *eventfd = NULL; @@ -215,6 +228,10 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) goto fail; } + rcu_read_lock(); + irqfd_update(kvm, irqfd, rcu_dereference(kvm-irq_routing)); + rcu_read_unlock(); Wow, complicated. rcu_read_lock() protects kvm-irq_routing, while we're in the update side of rcu-protected irqfd-irq_entry. A comment please. The rest looks good, it's nice we finally got the irq injection path so streamlined. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On 11/01/2010 10:14 AM, Alex Williamson wrote: Register the actual VM RAM using the new API Signed-off-by: Alex Williamsonalex.william...@redhat.com --- hw/pc.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 69b13bf..0ea6d10 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size, /* allocate RAM */ ram_addr = qemu_ram_alloc(NULL, pc.ram, below_4g_mem_size + above_4g_mem_size); -cpu_register_physical_memory(0, 0xa, ram_addr); -cpu_register_physical_memory(0x10, - below_4g_mem_size - 0x10, - ram_addr + 0x10); + +qemu_ram_register(0, 0xa, ram_addr); +qemu_ram_register(0x10, below_4g_mem_size - 0x10, + ram_addr + 0x10); #if TARGET_PHYS_ADDR_BITS 32 if (above_4g_mem_size 0) { -cpu_register_physical_memory(0x1ULL, above_4g_mem_size, - ram_addr + below_4g_mem_size); +qemu_ram_register(0x1ULL, above_4g_mem_size, + ram_addr + below_4g_mem_size); } Take a look at the memory shadowing in the i440fx. The regions of memory in the BIOS area can temporarily become RAM. That's because there is normally RAM backing this space but the memory controller redirects writes to the ROM space. Not sure the best way to handle this, but the basic concept is, RAM always exists but if a device tries to access it, it may or may not be accessible as RAM at any given point in time. Regards, Anthony Liguori #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 11:34:26AM +0200, Michael S. Tsirkin wrote: @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = irqfd-irq_entry; Why not rcu_dereference()? Of course. Good catch, thanks. And why it can't be zero here? It can, I check below. Yeah, missed that. Thanks. /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + if (irq) + kvm_set_msi(irq, irqfd-kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); + else + schedule_work(irqfd-inject); + rcu_read_unlock(); + } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote: 2010/11/16 Gleb Natapov g...@redhat.com: On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote: Perhaps the FW path should use device class names if no name is specified. What do you mean by device class name. We can do something like this: if (dev-child_bus.lh_first) return dev-child_bus.lh_first-info-name; i.e if there is child bus use its bus name as fw name. This will make all pci devices to have pci as fw name automatically. The problem is that theoretically same device can provide different buses. I meant PCI class name, like display for display controllers, network for NICs etc. That is what my pci bus related patch is doing already. I'll try Sparc32 to see how this fits there. Except bootindex is not implemented for SCSI. Will look into adding it. Thanks. The bootindex on Sparc32 looks like this: bootindex /e...@7880/d...@1,0 /ether...@/ethernet-...@0 For arches other then x86 there is a lot of work left to be done :) For starter exotic sparc buses should get their own get_fw_dev_path() implementation. I don't think I got Lance setup right. OF paths for the devices would be: /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0 If qdev hierarchy does not correspond to real HW there is no much we can do expect for fixing qdev. The logic for ESP is that ESP (registers at 0x7880, slot offset 0x88) is handled by the DMA controller (registers at 0x7840, slot offset 0x84), they are in a SBus slot #5, and SBus (registers at 0x10001000) is in turn handled by IOMMU (registers at 0x1000). Lance should be handled the same way. This hierarchy is partly known by QEMU because DMA accesses use this flow, but not otherwise. There is no concept of SBus slots, DMA talks to IOMMU directly. Though in this case both ESP, Lance and their DMA controllers are on board devices in a MACIO chip. It may be possible to add the hierarchy information at each stage. It should also be possible for BIOS to determine the device just from the physical address if we ignored OF compatibility. It would be nice to be OF compatible at least at some level. Of course OF spec is not strict enough to have two different implementations produce exactly same device path that can be compared by strcpy. Can we apply the series now? At least for x86 it provides useful paths and work can be continue for other arches by interested parties. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1841658 ] OpenSolaris 64bit panic with kvm-54
Bugs item #1841658, was opened at 2007-11-30 13:11 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1841658group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Works For Me Priority: 3 Private: No Submitted By: Carlo Marcelo Arenas Belon (carenas) Assigned to: Nobody/Anonymous (nobody) Summary: OpenSolaris 64bit panic with kvm-54 Initial Comment: Wouldn't mark it as a regression per-se as vanilla kvm-53 wouldn't work (because of the need for IDE patches to get it to run/install), but vanilla kvm-54 or kvm-54 + the same patches added to kvm-53 and including pre-kvm-55 patches like 71be592a14aa8d127315b2c47bf83cc0d810a341 wouldn't work. The panic is observed in kvm-54 (--no-kvm runs ok, and --no-kvm-irqchip doesn't help) while running nexenta OpenSolaris alpha 7 or beta 1 (other OpenSolaris distributions most likely affected as well) and with the following trace : panic[cpu0]/thread=fffec2de2260: BAD TRAP: type=e (#pf Page fault) rp=ff0001735f30 addr=0 occurred in module unix due to a NULL pointer dereference dbus: #pf Page fault Bad kernel fault at addr=0x0 pid=278, pc=0xfb83c189, sp=0xff0001736028, eflags=0x10246 cr0: 80050033pg,wp,ne,et,mp,pe cr4: 6b8xmme,fxsr,pge,pae,pse,de cr2: 0 cr3: 7dc4000 cr8: 0 rdi:0 rsi: fffec0025630 rdx: fffec2de2260 rcx:1 r8: fffec0025630 r9:3 rax:0 rbx:0 rbp: ff0001736080 r10:1 r11: fffec1ad31e0 r12:0 r13: fffec0025680 r14: c0025488 r15:0 fsb:0 gsb: fbc26ef0 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:0 rip: fb83c189 cs: 30 rfl:10246 rsp: ff0001736028 ss: 38 ff0001735e10 unix:die+c8 () ff0001735f20 unix:trap+135b () ff0001735f30 unix:cmntrap+e9 () ff0001736080 unix:mutex_exit+9 () ff00017360c0 genunix:kmem_alloc+88 () ff0001736110 zfs:zio_push_transform+3a () ff0001736190 zfs:zio_create+256 () ff0001736240 zfs:zio_vdev_child_io+97 () ff0001736320 zfs:vdev_cache_read+182 () ff0001736370 zfs:vdev_disk_io_start+41 () ff0001736390 zfs:vdev_io_start+1d () ff00017363d0 zfs:zio_vdev_io_start+123 () ff00017363f0 zfs:zio_next_stage_async+bb () ff0001736410 zfs:zio_nowait+11 () ff0001736450 zfs:vdev_mirror_io_start+18f () ff0001736490 zfs:zio_vdev_io_start+131 () ff00017364b0 zfs:zio_next_stage+b3 () ff00017364e0 zfs:zio_ready+10e () ff0001736500 zfs:zio_next_stage+b3 () ff0001736550 zfs:zio_wait_for_children+5d () ff0001736570 zfs:zio_wait_children_ready+20 () ff0001736590 zfs:zio_next_stage_async+bb () ff00017365b0 zfs:zio_nowait+11 () ff0001736660 zfs:arc_read+4e8 () ff0001736700 zfs:dbuf_read_impl+129 () ff0001736760 zfs:dbuf_read+c5 () ff0001736810 zfs:dmu_buf_hold_array_by_dnode+1c4 () ff00017368a0 zfs:dmu_buf_hold_array+74 () ff0001736930 zfs:dmu_read_uio+4d () ff00017369c0 zfs:zfs_read+15e () ff0001736a30 genunix:fop_read+69 () ff0001736af0 genunix:vn_rdwr+161 () ff0001736c70 genunix:gexec+11c () ff0001736e90 genunix:exec_common+41d () ff0001736ec0 genunix:exece+1b () ff0001736f10 unix:brand_sys_sysenter+1f2 () while running in a Gentoo Linux 2007.0 host with Intel(R) Core(TM)2 CPU 6320. 32bit OpenSolaris works fine -- Comment By: Jes Sorensen (jessorensen) Date: 2010-11-18 11:23 Message: Works for me per previous comments. If you see this again, please open a new bug in Launchpad. Thanks, Jes -- Comment By: Jes Sorensen (jessorensen) Date: 2010-08-19 12:09 Message: Hi, I didn't see any replies to my question as of June 21, are you still seeing this? I wasn't able to reproduce this in my testing. Regards, Jes -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-21 14:55 Message: Hi, I pulled down the iso image you mentioned, and it seems to boot fine for me here. I was able to run the install to a local disk image and boot it again afterwards. This is using a 64 bit guest CPU on a Fedora 12 system. What flags are you using to launch it when you see the crash? Are you running on an Intel or an AMD system and did you specify SMP by any chance? Cheers, Jes
Re: [PATCH] ceph/rbd block driver for qemu-kvm (v8)
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HAL type for Win2003 Server on recent KVM versions?
On 11/18/2010 12:58 AM, Kenni Lund wrote: Hi I'm about to move a couple of virtual machines from a Fedora 11 system to a new server with a more recent operating system and newer version of KVM, etc. One of the guests is a Windows Server 2003 Standard SP2, which is currently running with the ACPI Multiprocessor PC HAL. Considering moving to RHEL, I've been reading the virtualization documentation for RHEL 6.0, which says that I need to set HAL to Standard PC when installing a new Win2003 guest. Since my current guest has been running perfectly fine for a long time with its current HAL, I was wondering if the system will become unstable, unbootable or what the disadvantage will be, if I move the guest to for example RHEL 6.0, without reinstalling or upgrading the guest to select another HAL mode? On the other hand, it seems like I can upgrade from the current ACPI Multiprocessor PC into Standard PC, but I'm not sure if I'll gain anything by trying this. I suggest using the default HAL, whatever it is. That's what everyone else is using so you get the best tested configuration. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
So the following on top will fix it all. Any more comments befpre I bundle it up, test and report? kvm: fix up msi fastpath This will be folded into the msi fastpath patch. Changes: - simplify irq_entry/irq_routing update rules: simply to it all under irqfds.lock - document locking for rcu update side - rcu_dereference for rcu pointer access Still compile-tested only. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b6f7047..d13ced3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -16,6 +16,7 @@ #include linux/mm.h #include linux/preempt.h #include linux/msi.h +#include linux/rcupdate.h #include asm/signal.h #include linux/kvm.h @@ -206,6 +207,8 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP + /* Update side is protected by irq_lock and, +* if configured, irqfds.lock. */ struct kvm_irq_routing_table __rcu *irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; @@ -605,7 +608,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); -void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt); +void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else @@ -617,7 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} -static inline void kvm_irqfd_update(struct kvm *kvm) {} +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 49c1864..b0cfae7 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -47,6 +47,7 @@ struct _irqfd { /* Used for MSI fast-path */ struct kvm *kvm; wait_queue_t wait; + /* Update side is protected by irqfds.lock */ struct kvm_kernel_irq_routing_entry __rcu *irq_entry; /* Used for level IRQ fast-path */ int gsi; @@ -133,7 +134,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) if (flags POLLIN) { rcu_read_lock(); - irq = irqfd-irq_entry; + irq = rcu_dereference(irqfd-irq_entry); /* An event has been signaled, inject an interrupt */ if (irq) kvm_set_msi(irq, irqfd-kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); @@ -175,6 +176,27 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, add_wait_queue(wqh, irqfd-wait); } +/* Must be called under irqfds.lock */ +static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd, +struct kvm_irq_routing_table *irq_rt) +{ + struct kvm_kernel_irq_routing_entry *e; + struct hlist_node *n; + + if (irqfd-gsi = irq_rt-nr_rt_entries) { + rcu_assign_pointer(irqfd-irq_entry, NULL); + return; + } + + hlist_for_each_entry(e, n, irq_rt-map[irqfd-gsi], link) { + /* Only fast-path MSI. */ + if (e-type == KVM_IRQ_ROUTING_MSI) + rcu_assign_pointer(irqfd-irq_entry, e); + else + rcu_assign_pointer(irqfd-irq_entry, NULL); + } +} + static int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) { @@ -228,9 +250,9 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) goto fail; } - rcu_read_lock(); - irqfd_update(kvm, irqfd, rcu_dereference(kvm-irq_routing)); - rcu_read_unlock(); + irq_rt = rcu_dereference_protected(kvm-irq_routing, + lockdep_is_held(kvm-irqfds.lock)); + irqfd_update(kvm, irqfd, irq_rt); events = file-f_op-poll(file, irqfd-pt); @@ -345,35 +367,17 @@ kvm_irqfd_release(struct kvm *kvm) } -/* Must be called under irqfds.lock */ -static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd, -struct kvm_irq_routing_table *irq_rt) -{ - struct kvm_kernel_irq_routing_entry *e; - struct hlist_node *n; - - if (irqfd-gsi = irq_rt-nr_rt_entries) { - rcu_assign_pointer(irqfd-irq_entry, NULL); - return; - } - - hlist_for_each_entry(e, n, irq_rt-map[irqfd-gsi], link) { - /* Only fast-path MSI. */ - if (e-type == KVM_IRQ_ROUTING_MSI) - rcu_assign_pointer(irqfd-irq_entry, e);
[ kvm-Bugs-1998355 ] IO Performance
Bugs item #1998355, was opened at 2008-06-20 02:11 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1998355group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Out of Date Priority: 5 Private: No Submitted By: Joshua Rosen (bjrosen) Assigned to: Nobody/Anonymous (nobody) Summary: IO Performance Initial Comment: Is there any way of mapping a host's directory into a KVM VM similar to VMware's Shared Folder feature? I've been benchmarking the performance of NCVerilog under various VMs. The performance of KVM when using a virtual disk is excellent, in fact it's better than VMware Server or VMware Workstation, however if you use an NFS mounted host directory the performance is unspeakably awful. An NFS mounted directory under VMware Server 2.0 (Beta 2) is also slow but it's still significantly better than KVM. Using a Shared Folder with VMware Workstation eliminates the IO bottleneck, the performance there is about the same as accessing a virtual disk. The system that I did these benchmarks on is a 3GHz Core2 with 8G of RAM. VMware was running under CentOS5.1 with a 2.6.23.7 kernel. KVM is running on Fedora 9 with a 2.6.25.xx kernel. The Verilog simulation times for my test suite are as follows, Native 06:34 VM Server 2, virtual disk 08:05 VM Server 2, NFS18:37 VM Workstation, shared folder 08:14 KVM, Virtual disk 07:42 KVM, NFS38:36 -- Comment By: Jes Sorensen (jessorensen) Date: 2010-11-18 11:16 Message: Per previous comment, bug is out of date. There are a number of solutions in upstream and better performance. If you still feel this is an issue, please feel free to open a new bug in Launchpad. Thanks, Jes -- Comment By: Jes Sorensen (jessorensen) Date: 2010-08-19 12:18 Message: Hi, Did this get resolved? If so would you mind closing this bug. If I don't hear back, I will assume it is fixed and close it at some point :) Thanks, Jes -- Comment By: Joshua Rosen (bjrosen) Date: 2008-06-22 15:54 Message: Logged In: YES user_id=39829 Originator: YES I've tried the following command line, it brings up the VM but I can't configure the NIC, there is an error message about vlan 0. qemu-kvm -M pc -m 512 -smp 1 -monitor pty -net nic,macaddr=a0:1e:37:84:b1:da,model=virtio -boot c -hda /home/xen/panther Warning: vlan 0 is not connected to host network I've also tried the following, but the qemu-ifup command is missing an argument qemu-kvm -M pc -m 512 -smp 1 -monitor pty -net nic,macaddr=a0:1e:37:84:b1:da,model=virtio -net tap,script=/etc/xen/qemu-ifup -boot c -hda /home/xen/panther config qemu network with xen bridge for tap0 Incorrect number of arguments for command Usage: brctl addif bridge deviceadd interface to bridge char device redirected to /dev/pts/3 -- Comment By: Dor Laor (thekozmo) Date: 2008-06-22 08:08 Message: Logged In: YES user_id=2124464 Originator: NO Example cmdline: ./qemu/x86_64-softmmu/qemu-system-x86_64 -boot c -drive file=/images/xpbase.qcow2,if=ide,cache=on,format=qcow2,boot=on -m 384 -net nic,macaddr=a0:1e:37:84:b1:da,model=virtio -net tap,script=/etc/kvm/qemu-ifup -snapshot btw: you can use 'ps' to discover libvirt cmdline -- Comment By: Joshua Rosen (bjrosen) Date: 2008-06-22 03:18 Message: Logged In: YES user_id=39829 Originator: YES How do I use virtio-net instead of virtio-blk? I've been launching the VM using virt-manager which has no options. KVM doesn't have a MAN page so I have no idea about how to launch the VM using the CLI. Would you please give me specific step by step instructions. Thanks, -- Comment By: Joshua Rosen (bjrosen) Date: 2008-06-21 16:28 Message: Logged In: YES user_id=39829 Originator: YES How do I use virtio-net instead of virtio-blk? I've been launching the VM using virt-manager which has no options. KVM doesn't have a MAN page so I have no idea about how to launch the VM using the CLI. Would you please give me specific step by step instructions. Thanks, -- Comment By: Dor Laor (thekozmo) Date: 2008-06-21 15:45 Message: Logged In: YES user_id=2124464 Originator: NO If you don't boot from virtio-blk there is no need for this configuration. Just use virtio-net in
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On 11/18/2010 12:57 PM, Michael S. Tsirkin wrote: So the following on top will fix it all. Any more comments befpre I bundle it up, test and report? Nope (not that I can comment on an incremental). I guess I should create an empty Documentation/kvm/locking.txt and force everyone else to update it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 01:03:44PM +0200, Avi Kivity wrote: On 11/18/2010 12:57 PM, Michael S. Tsirkin wrote: So the following on top will fix it all. Any more comments befpre I bundle it up, test and report? Nope (not that I can comment on an incremental). Here it is rolled up. I guess I should create an empty Documentation/kvm/locking.txt and force everyone else to update it. Comments near the relevant fields not better? diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..d13ced3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -16,6 +16,7 @@ #include linux/mm.h #include linux/preempt.h #include linux/msi.h +#include linux/rcupdate.h #include asm/signal.h #include linux/kvm.h @@ -206,6 +207,8 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP + /* Update side is protected by irq_lock and, +* if configured, irqfds.lock. */ struct kvm_irq_routing_table __rcu *irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; @@ -462,6 +465,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); @@ -603,6 +608,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); +void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else @@ -614,6 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..b0cfae7 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -44,14 +44,19 @@ */ struct _irqfd { - struct kvm *kvm; - struct eventfd_ctx *eventfd; - int gsi; - struct list_head list; - poll_tablept; - wait_queue_t wait; - struct work_structinject; - struct work_structshutdown; + /* Used for MSI fast-path */ + struct kvm *kvm; + wait_queue_t wait; + /* Update side is protected by irqfds.lock */ + struct kvm_kernel_irq_routing_entry __rcu *irq_entry; + /* Used for level IRQ fast-path */ + int gsi; + struct work_struct inject; + /* Used for setup/shutdown */ + struct eventfd_ctx *eventfd; + struct list_head list; + poll_table pt; + struct work_struct shutdown; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -125,10 +130,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = rcu_dereference(irqfd-irq_entry); /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + if (irq) + kvm_set_msi(irq, irqfd-kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1); + else + schedule_work(irqfd-inject); + rcu_read_unlock(); + } if (flags POLLHUP) { /* The eventfd is closing, detach from KVM */ @@ -163,9 +176,31 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, add_wait_queue(wqh, irqfd-wait); } +/* Must be called under irqfds.lock */ +static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd, +struct kvm_irq_routing_table *irq_rt) +{ + struct kvm_kernel_irq_routing_entry *e; + struct hlist_node *n; + + if (irqfd-gsi = irq_rt-nr_rt_entries) { + rcu_assign_pointer(irqfd-irq_entry, NULL); + return; + } + + hlist_for_each_entry(e, n, irq_rt-map[irqfd-gsi], link) { + /*
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote: On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote: 2010/11/16 Gleb Natapov g...@redhat.com: On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote: Perhaps the FW path should use device class names if no name is specified. What do you mean by device class name. We can do something like this: if (dev-child_bus.lh_first) return dev-child_bus.lh_first-info-name; i.e if there is child bus use its bus name as fw name. This will make all pci devices to have pci as fw name automatically. The problem is that theoretically same device can provide different buses. I meant PCI class name, like display for display controllers, network for NICs etc. That is what my pci bus related patch is doing already. I'll try Sparc32 to see how this fits there. Except bootindex is not implemented for SCSI. Will look into adding it. Thanks. The bootindex on Sparc32 looks like this: bootindex /e...@7880/d...@1,0 /ether...@/ethernet-...@0 For arches other then x86 there is a lot of work left to be done :) For starter exotic sparc buses should get their own get_fw_dev_path() implementation. I don't think I got Lance setup right. OF paths for the devices would be: /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0 If qdev hierarchy does not correspond to real HW there is no much we can do expect for fixing qdev. That's bad. This raises a concern: if these paths expose qdev internals, any attempt to fix this will break migration. The logic for ESP is that ESP (registers at 0x7880, slot offset 0x88) is handled by the DMA controller (registers at 0x7840, slot offset 0x84), they are in a SBus slot #5, and SBus (registers at 0x10001000) is in turn handled by IOMMU (registers at 0x1000). Lance should be handled the same way. This hierarchy is partly known by QEMU because DMA accesses use this flow, but not otherwise. There is no concept of SBus slots, DMA talks to IOMMU directly. Though in this case both ESP, Lance and their DMA controllers are on board devices in a MACIO chip. It may be possible to add the hierarchy information at each stage. It should also be possible for BIOS to determine the device just from the physical address if we ignored OF compatibility. It would be nice to be OF compatible at least at some level. Of course OF spec is not strict enough to have two different implementations produce exactly same device path that can be compared by strcpy. Can we apply the series now? At least for x86 it provides useful paths and work can be continue for other arches by interested parties. -- Gleb. Something I only now realized is that we commit to never changing the paths for any architecture that supports migration. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote: On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote: 2010/11/16 Gleb Natapov g...@redhat.com: On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote: Perhaps the FW path should use device class names if no name is specified. What do you mean by device class name. We can do something like this: if (dev-child_bus.lh_first) return dev-child_bus.lh_first-info-name; i.e if there is child bus use its bus name as fw name. This will make all pci devices to have pci as fw name automatically. The problem is that theoretically same device can provide different buses. I meant PCI class name, like display for display controllers, network for NICs etc. That is what my pci bus related patch is doing already. I'll try Sparc32 to see how this fits there. Except bootindex is not implemented for SCSI. Will look into adding it. Thanks. The bootindex on Sparc32 looks like this: bootindex /e...@7880/d...@1,0 /ether...@/ethernet-...@0 For arches other then x86 there is a lot of work left to be done :) For starter exotic sparc buses should get their own get_fw_dev_path() implementation. I don't think I got Lance setup right. OF paths for the devices would be: /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0 If qdev hierarchy does not correspond to real HW there is no much we can do expect for fixing qdev. That's bad. This raises a concern: if these paths expose qdev internals, any attempt to fix this will break migration. The path expose internal HW hierarchy. It is designed to do so. Qdev designed to do the same: describe HW hierarchy. If qdev fails to do so it is broken. I do not see connection to migration at all since the path is not used in migration code. The logic for ESP is that ESP (registers at 0x7880, slot offset 0x88) is handled by the DMA controller (registers at 0x7840, slot offset 0x84), they are in a SBus slot #5, and SBus (registers at 0x10001000) is in turn handled by IOMMU (registers at 0x1000). Lance should be handled the same way. This hierarchy is partly known by QEMU because DMA accesses use this flow, but not otherwise. There is no concept of SBus slots, DMA talks to IOMMU directly. Though in this case both ESP, Lance and their DMA controllers are on board devices in a MACIO chip. It may be possible to add the hierarchy information at each stage. It should also be possible for BIOS to determine the device just from the physical address if we ignored OF compatibility. It would be nice to be OF compatible at least at some level. Of course OF spec is not strict enough to have two different implementations produce exactly same device path that can be compared by strcpy. Can we apply the series now? At least for x86 it provides useful paths and work can be continue for other arches by interested parties. -- Gleb. Something I only now realized is that we commit to never changing the paths for any architecture that supports migration. No connection to migration whatsoever. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote: On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote: 2010/11/16 Gleb Natapov g...@redhat.com: On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote: Perhaps the FW path should use device class names if no name is specified. What do you mean by device class name. We can do something like this: if (dev-child_bus.lh_first) return dev-child_bus.lh_first-info-name; i.e if there is child bus use its bus name as fw name. This will make all pci devices to have pci as fw name automatically. The problem is that theoretically same device can provide different buses. I meant PCI class name, like display for display controllers, network for NICs etc. That is what my pci bus related patch is doing already. I'll try Sparc32 to see how this fits there. Except bootindex is not implemented for SCSI. Will look into adding it. Thanks. The bootindex on Sparc32 looks like this: bootindex /e...@7880/d...@1,0 /ether...@/ethernet-...@0 For arches other then x86 there is a lot of work left to be done :) For starter exotic sparc buses should get their own get_fw_dev_path() implementation. I don't think I got Lance setup right. OF paths for the devices would be: /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0 If qdev hierarchy does not correspond to real HW there is no much we can do expect for fixing qdev. That's bad. This raises a concern: if these paths expose qdev internals, any attempt to fix this will break migration. The path expose internal HW hierarchy. It is designed to do so. Qdev designed to do the same: describe HW hierarchy. If qdev fails to do so it is broken. Yes. But since you use qdev to build up the path, a broken qdev will give you a broken path. I do not see connection to migration at all since the path is not used in migration code. The connection is that if we pass the list with path 1 which you define as broken to BIOS, then migrate to a machine with an updated qemu which has a correct path, BIOS won't be able to complete the boot. Right? Same in reverse direction. As solution could be a fuzzy matching of paths that wiull let us recover. The logic for ESP is that ESP (registers at 0x7880, slot offset 0x88) is handled by the DMA controller (registers at 0x7840, slot offset 0x84), they are in a SBus slot #5, and SBus (registers at 0x10001000) is in turn handled by IOMMU (registers at 0x1000). Lance should be handled the same way. This hierarchy is partly known by QEMU because DMA accesses use this flow, but not otherwise. There is no concept of SBus slots, DMA talks to IOMMU directly. Though in this case both ESP, Lance and their DMA controllers are on board devices in a MACIO chip. It may be possible to add the hierarchy information at each stage. It should also be possible for BIOS to determine the device just from the physical address if we ignored OF compatibility. It would be nice to be OF compatible at least at some level. Of course OF spec is not strict enough to have two different implementations produce exactly same device path that can be compared by strcpy. Can we apply the series now? At least for x86 it provides useful paths and work can be continue for other arches by interested parties. -- Gleb. Something I only now realized is that we commit to never changing the paths for any architecture that supports migration. No connection to migration whatsoever. It just seems silly to use different paths for the same thing. Besides the connection above, I was hoping to use these paths for section names in migration. If we can't guarantee they are stable, we'll have to roll our own, and if we do this, with stability guarantees required for migration format, maybe use it for other things like BIOS as well? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote: *entry may be stale after rcu_read_unlock(). Is this a problem? I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re- order issue would happen. If the guest is reading and writing the field at the same time(from two cpus), it should got some kinds of sync method for itself - or it may not care what's the reading result(like the one after msix_mask_irq()). I guess so. Michael/Alex? This is kvm_get_irq_routing_entry which is used for table reads, correct? Actually, the pci read *is* the sync method that guests use, they rely on reads to flush out all previous writes. Michael, I think the *sync* you are talking about is not the one I meant. I was talking about two cpus case, one is reading and the other is writing, the order can't be determined if guest doesn't use lock or some other synchronize methods; and you're talking about to flush out all previous writes of the only one CPU... -- regards, Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support
On Thu, Nov 18, 2010 at 5:28 PM, Avi Kivity a...@redhat.com wrote: On 11/18/2010 03:58 AM, Sheng Yang wrote: On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote: On 11/15/2010 11:15 AM, Sheng Yang wrote: This patch enable per-vector mask for assigned devices using MSI-X. This patch provided two new APIs: one is for guest to specific device's MSI-X table address in MMIO, the other is for userspace to get information about mask bit. All the mask bit operation are kept in kernel, in order to accelerate. Userspace shouldn't access the device MMIO directly for the information, instead it should uses provided API to do so. Signed-off-by: Sheng Yangsh...@linux.intel.com --- arch/x86/kvm/x86.c | 1 + include/linux/kvm.h | 32 + include/linux/kvm_host.h | 5 + virt/kvm/assigned-dev.c | 318 +- 4 files changed, 355 insertions(+), 1 deletions(-) Documentation? For we are keeping changing the API for last several versions, I'd like to settle down the API first. Would bring back the document after API was agreed. Maybe for APIs we should start with only the documentation patch, agree on that, and move on to the implementation. Yes, would follow it next time. And I would bring back the documents in the next edition, for Michael and I have reached agreement on API. What if it's a 64-bit write on a 32-bit host? In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't seen any OS is using it in this way now, so I think we can leave it later. Also seems QEmu doesn't got the way to handle 64bit MMIO. There's a difference, if the API doesn't support it, we can't add it later without changing both kernel and userspace. Um... Which API you're talking about? I think userspace API(set msix mmio, and get mask bit status) is unrelated here? That's not very good. We should do the entire thing in the kernel or in userspace. We can have a new EXIT_REASON to let userspace know an msix entry changed, and it should read it from the kernel. If you look it in this way: 1. Mask bit owned by kernel. 2. Routing owned by userspace. 3. Read the routing in kernel is an speed up for normal operation - because kernel can read from them. So I think the logic here is clear to understand. Still, it's complicated and the state is split across multiple components. So how about removing the reading acceleration part in the patch temporarily? Kernel owns mask bit and userspace owns others. That should be better. I can add the reading part later when we can find an elegant way to do so. But if we can modify the routing in kernel, it would be raise some sync issues due to both kernel and userspace own routing. So maybe the better solution is move the routing to kernel. That may work, but I don't think we can do this for vfio. -- regards, Yang, Sheng -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 01:52:30PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote: On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote: 2010/11/16 Gleb Natapov g...@redhat.com: On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote: Perhaps the FW path should use device class names if no name is specified. What do you mean by device class name. We can do something like this: if (dev-child_bus.lh_first) return dev-child_bus.lh_first-info-name; i.e if there is child bus use its bus name as fw name. This will make all pci devices to have pci as fw name automatically. The problem is that theoretically same device can provide different buses. I meant PCI class name, like display for display controllers, network for NICs etc. That is what my pci bus related patch is doing already. I'll try Sparc32 to see how this fits there. Except bootindex is not implemented for SCSI. Will look into adding it. Thanks. The bootindex on Sparc32 looks like this: bootindex /e...@7880/d...@1,0 /ether...@/ethernet-...@0 For arches other then x86 there is a lot of work left to be done :) For starter exotic sparc buses should get their own get_fw_dev_path() implementation. I don't think I got Lance setup right. OF paths for the devices would be: /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0 If qdev hierarchy does not correspond to real HW there is no much we can do expect for fixing qdev. That's bad. This raises a concern: if these paths expose qdev internals, any attempt to fix this will break migration. The path expose internal HW hierarchy. It is designed to do so. Qdev designed to do the same: describe HW hierarchy. If qdev fails to do so it is broken. Yes. But since you use qdev to build up the path, a broken qdev will give you a broken path. Qdev bug. Fix it like any other bug. The nice is that when you compare device path produced by qdev and real HW you can see when qdev is wrong. I do not see connection to migration at all since the path is not used in migration code. The connection is that if we pass the list with path 1 which you define as broken to BIOS, then migrate to a machine with an updated qemu which has a correct path, BIOS won't be able to complete the boot. You solve it like you solve all such issue with -M machine type. But the problem exists only if migration happens in a short window between start of the boot process and BIOS reading boot order string. After reboot new qemu should have new BIOS. Right? Same in reverse direction. Reverse direction is not and never was supported. As solution could be a fuzzy matching of paths that wiull let us recover. Firmware can try its best of course, but nothing is guarantied. The logic for ESP is that ESP (registers at 0x7880, slot offset 0x88) is handled by the DMA controller (registers at 0x7840, slot offset 0x84), they are in a SBus slot #5, and SBus (registers at 0x10001000) is in turn handled by IOMMU (registers at 0x1000). Lance should be handled the same way. This hierarchy is partly known by QEMU because DMA accesses use this flow, but not otherwise. There is no concept of SBus slots, DMA talks to IOMMU directly. Though in this case both ESP, Lance and their DMA controllers are on board devices in a MACIO chip. It may be possible to add the hierarchy information at each stage. It should also be possible for BIOS to determine the device just from the physical address if we ignored OF compatibility. It would be nice to be OF compatible at least at some level. Of course OF spec is not strict enough to have two different implementations produce exactly same device path that can be compared by strcpy. Can we apply the series now? At least for x86 it provides useful paths and work can be continue for other arches by interested parties. -- Gleb. Something I only now realized is that we commit to never changing the paths for any architecture that supports migration. No connection to migration whatsoever. It just seems silly to use different paths for the same thing. Besides the connection above, I was hoping to use these paths for section names in migration. If we can't guarantee they are stable, we'll have to roll our own, and if we do this, with stability guarantees required for
[PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()
cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/vmx.c | 63 --- 1 files changed, 25 insertions(+), 38 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9367abc..b4b66a8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3902,17 +3902,33 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu) #define Q l #endif -/* - * We put this into a separate noinline function to prevent the compiler - * from duplicating the code. This is needed because this code - * uses non local labels that cannot be duplicated. - * Do not put any flow control into this function. - * Better would be to put this whole monstrosity into a .S file. - */ -static void noinline do_vmx_vcpu_run(struct kvm_vcpu *vcpu) +static void vmx_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - asm volatile( + + /* Record the guest's net vcpu time for enforced NMI injections. */ + if (unlikely(!cpu_has_virtual_nmis() vmx-soft_vnmi_blocked)) + vmx-entry_time = ktime_get(); + + /* Don't enter VMX if guest state is invalid, let the exit handler + start emulation until we arrive back to a valid state */ + if (vmx-emulation_required emulate_invalid_guest_state) + return; + + if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty)) + vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]); + if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty)) + vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]); + + /* When single-stepping over STI and MOV SS, we must clear the +* corresponding interruptibility bits in the guest state. Otherwise +* vmentry fails as it then expects bit 14 (BS) in pending debug +* exceptions being set, but that's not correct for the guest debugging +* case. */ + if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) + vmx_set_interrupt_shadow(vcpu, 0); + + asm( /* Store host registers */ push %%Rdx; push %%Rbp; push %%Rcx \n\t @@ -4007,35 +4023,6 @@ static void noinline do_vmx_vcpu_run(struct kvm_vcpu *vcpu) , r8, r9, r10, r11, r12, r13, r14, r15 #endif ); -} - -static void vmx_vcpu_run(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - - /* Record the guest's net vcpu time for enforced NMI injections. */ - if (unlikely(!cpu_has_virtual_nmis() vmx-soft_vnmi_blocked)) - vmx-entry_time = ktime_get(); - - /* Don't enter VMX if guest state is invalid, let the exit handler - start emulation until we arrive back to a valid state */ - if (vmx-emulation_required emulate_invalid_guest_state) - return; - - if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty)) - vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]); - if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty)) - vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]); - - /* When single-stepping over STI and MOV SS, we must clear the -* corresponding interruptibility bits in the guest state. Otherwise -* vmentry fails as it then expects bit 14 (BS) in pending debug -* exceptions being set, but that's not correct for the guest debugging -* case. */ - if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) - vmx_set_interrupt_shadow(vcpu, 0); - - do_vmx_vcpu_run(vcpu); vcpu-arch.regs_avail = ~((1 VCPU_REGS_RIP) | (1 VCPU_REGS_RSP) | (1 VCPU_EXREG_PDPTR)); -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 02:16:05PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 01:52:30PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote: On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote: 2010/11/16 Gleb Natapov g...@redhat.com: On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote: Perhaps the FW path should use device class names if no name is specified. What do you mean by device class name. We can do something like this: if (dev-child_bus.lh_first) return dev-child_bus.lh_first-info-name; i.e if there is child bus use its bus name as fw name. This will make all pci devices to have pci as fw name automatically. The problem is that theoretically same device can provide different buses. I meant PCI class name, like display for display controllers, network for NICs etc. That is what my pci bus related patch is doing already. I'll try Sparc32 to see how this fits there. Except bootindex is not implemented for SCSI. Will look into adding it. Thanks. The bootindex on Sparc32 looks like this: bootindex /e...@7880/d...@1,0 /ether...@/ethernet-...@0 For arches other then x86 there is a lot of work left to be done :) For starter exotic sparc buses should get their own get_fw_dev_path() implementation. I don't think I got Lance setup right. OF paths for the devices would be: /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0 If qdev hierarchy does not correspond to real HW there is no much we can do expect for fixing qdev. That's bad. This raises a concern: if these paths expose qdev internals, any attempt to fix this will break migration. The path expose internal HW hierarchy. It is designed to do so. Qdev designed to do the same: describe HW hierarchy. If qdev fails to do so it is broken. Yes. But since you use qdev to build up the path, a broken qdev will give you a broken path. Qdev bug. Fix it like any other bug. The nice is that when you compare device path produced by qdev and real HW you can see when qdev is wrong. I do not see connection to migration at all since the path is not used in migration code. The connection is that if we pass the list with path 1 which you define as broken to BIOS, then migrate to a machine with an updated qemu which has a correct path, BIOS won't be able to complete the boot. You solve it like you solve all such issue with -M machine type. So that's unavoidable if we think paths are correct. But if we know they are wrong, we are better off correcting them first IMO. But the problem exists only if migration happens in a short window between start of the boot process and BIOS reading boot order string. After reboot new qemu should have new BIOS. That makes it even more nasty, doesn't it? Right? Same in reverse direction. Reverse direction is not and never was supported. As solution could be a fuzzy matching of paths that wiull let us recover. Firmware can try its best of course, but nothing is guarantied. No I mean qemu could do matching fuzzily. This way if we get a path from the old BIOS we can survive. The logic for ESP is that ESP (registers at 0x7880, slot offset 0x88) is handled by the DMA controller (registers at 0x7840, slot offset 0x84), they are in a SBus slot #5, and SBus (registers at 0x10001000) is in turn handled by IOMMU (registers at 0x1000). Lance should be handled the same way. This hierarchy is partly known by QEMU because DMA accesses use this flow, but not otherwise. There is no concept of SBus slots, DMA talks to IOMMU directly. Though in this case both ESP, Lance and their DMA controllers are on board devices in a MACIO chip. It may be possible to add the hierarchy information at each stage. It should also be possible for BIOS to determine the device just from the physical address if we ignored OF compatibility. It would be nice to be OF compatible at least at some level. Of course OF spec is not strict enough to have two different implementations produce exactly same device path that can be compared by strcpy. Can we apply the series now? At least for x86 it provides useful paths and work can be continue for other arches by interested parties. -- Gleb.
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On 11/18/2010 01:10 PM, Michael S. Tsirkin wrote: I guess I should create an empty Documentation/kvm/locking.txt and force everyone else to update it. Comments near the relevant fields not better? Not an either/or. You can't understand the system from random source comments. diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..d13ced3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -16,6 +16,7 @@ #includelinux/mm.h #includelinux/preempt.h #includelinux/msi.h +#includelinux/rcupdate.h #includeasm/signal.h #includelinux/kvm.h @@ -206,6 +207,8 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP + /* Update side is protected by irq_lock and, +* if configured, irqfds.lock. */ /* * kernel style comment * here and elsewhere */ struct kvm_irq_routing_table __rcu *irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; @@ -462,6 +465,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); No point in the level argument for an msi specific function. #else @@ -614,6 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} blank line +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
On Thu, Nov 18, 2010 at 07:59:10PM +0800, Sheng Yang wrote: On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote: *entry may be stale after rcu_read_unlock(). Is this a problem? I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re- order issue would happen. If the guest is reading and writing the field at the same time(from two cpus), it should got some kinds of sync method for itself - or it may not care what's the reading result(like the one after msix_mask_irq()). I guess so. Michael/Alex? This is kvm_get_irq_routing_entry which is used for table reads, correct? Actually, the pci read *is* the sync method that guests use, they rely on reads to flush out all previous writes. Michael, I think the *sync* you are talking about is not the one I meant. I was talking about two cpus case, one is reading and the other is writing, the order can't be determined if guest doesn't use lock or some other synchronize methods; and you're talking about to flush out all previous writes of the only one CPU... Yes, but you don't seem to flush out writes on a read, either. -- regards, Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 02:23:20PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 02:16:05PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 01:52:30PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote: On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote: 2010/11/16 Gleb Natapov g...@redhat.com: On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote: Perhaps the FW path should use device class names if no name is specified. What do you mean by device class name. We can do something like this: if (dev-child_bus.lh_first) return dev-child_bus.lh_first-info-name; i.e if there is child bus use its bus name as fw name. This will make all pci devices to have pci as fw name automatically. The problem is that theoretically same device can provide different buses. I meant PCI class name, like display for display controllers, network for NICs etc. That is what my pci bus related patch is doing already. I'll try Sparc32 to see how this fits there. Except bootindex is not implemented for SCSI. Will look into adding it. Thanks. The bootindex on Sparc32 looks like this: bootindex /e...@7880/d...@1,0 /ether...@/ethernet-...@0 For arches other then x86 there is a lot of work left to be done :) For starter exotic sparc buses should get their own get_fw_dev_path() implementation. I don't think I got Lance setup right. OF paths for the devices would be: /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0 If qdev hierarchy does not correspond to real HW there is no much we can do expect for fixing qdev. That's bad. This raises a concern: if these paths expose qdev internals, any attempt to fix this will break migration. The path expose internal HW hierarchy. It is designed to do so. Qdev designed to do the same: describe HW hierarchy. If qdev fails to do so it is broken. Yes. But since you use qdev to build up the path, a broken qdev will give you a broken path. Qdev bug. Fix it like any other bug. The nice is that when you compare device path produced by qdev and real HW you can see when qdev is wrong. I do not see connection to migration at all since the path is not used in migration code. The connection is that if we pass the list with path 1 which you define as broken to BIOS, then migrate to a machine with an updated qemu which has a correct path, BIOS won't be able to complete the boot. You solve it like you solve all such issue with -M machine type. So that's unavoidable if we think paths are correct. But if we know they are wrong, we are better off correcting them first IMO. They are correct for x86. My patch set does not even tries to cover all HW. If sparc want to use them to it better be fixed. Or if there is enough info in the path to determine device it may choose to use it as is. But the problem exists only if migration happens in a short window between start of the boot process and BIOS reading boot order string. After reboot new qemu should have new BIOS. That makes it even more nasty, doesn't it? No. Right? Same in reverse direction. Reverse direction is not and never was supported. As solution could be a fuzzy matching of paths that wiull let us recover. Firmware can try its best of course, but nothing is guarantied. No I mean qemu could do matching fuzzily. This way if we get a path from the old BIOS we can survive. Qemu does not take paths from BIOS so I don't know what are you talking about here. The logic for ESP is that ESP (registers at 0x7880, slot offset 0x88) is handled by the DMA controller (registers at 0x7840, slot offset 0x84), they are in a SBus slot #5, and SBus (registers at 0x10001000) is in turn handled by IOMMU (registers at 0x1000). Lance should be handled the same way. This hierarchy is partly known by QEMU because DMA accesses use this flow, but not otherwise. There is no concept of SBus slots, DMA talks to IOMMU directly. Though in this case both ESP, Lance and their DMA controllers are on board devices in a MACIO chip. It may be possible to add the hierarchy information at each stage. It should also be possible
Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func
On Thu, Nov 18, 2010 at 8:33 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Nov 18, 2010 at 07:59:10PM +0800, Sheng Yang wrote: On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote: *entry may be stale after rcu_read_unlock(). Is this a problem? I suppose not. All MSI-X MMIO accessing would be executed without delay, so no re- order issue would happen. If the guest is reading and writing the field at the same time(from two cpus), it should got some kinds of sync method for itself - or it may not care what's the reading result(like the one after msix_mask_irq()). I guess so. Michael/Alex? This is kvm_get_irq_routing_entry which is used for table reads, correct? Actually, the pci read *is* the sync method that guests use, they rely on reads to flush out all previous writes. Michael, I think the *sync* you are talking about is not the one I meant. I was talking about two cpus case, one is reading and the other is writing, the order can't be determined if guest doesn't use lock or some other synchronize methods; and you're talking about to flush out all previous writes of the only one CPU... Yes, but you don't seem to flush out writes on a read, either. ... I don't understand... We are emulating the writing operation using software and make it in effect immediately... What should we supposed to do with this flush? -- regards, Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot
On 11/18/2010 07:14 AM, Takuya Yoshikawa wrote: This patch introduces the counter to hold the number of dirty bits in each memslot. We will use this to optimize dirty logging later. @@ -3217,11 +3216,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); - for (i = 0; !is_dirty i n/sizeof(long); i++) - is_dirty = memslot-dirty_bitmap[i]; - This can already be an improvement. @@ -152,6 +152,7 @@ struct kvm_memory_slot { unsigned long *rmap; unsigned long *dirty_bitmap; unsigned long *dirty_bitmap_head; + unsigned long num_dirty_bits; The bits themselves are not dirty; only the pages are dirty. (+ we usually use 'nr' for 'number') -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 02:29:11PM +0200, Avi Kivity wrote: On 11/18/2010 01:10 PM, Michael S. Tsirkin wrote: I guess I should create an empty Documentation/kvm/locking.txt and force everyone else to update it. Comments near the relevant fields not better? Not an either/or. You can't understand the system from random source comments. diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..d13ced3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -16,6 +16,7 @@ #includelinux/mm.h #includelinux/preempt.h #includelinux/msi.h +#includelinux/rcupdate.h #includeasm/signal.h #includelinux/kvm.h @@ -206,6 +207,8 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP +/* Update side is protected by irq_lock and, + * if configured, irqfds.lock. */ /* * kernel style comment * here and elsewhere */ struct kvm_irq_routing_table __rcu *irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; @@ -462,6 +465,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, +int irq_source_id, int level); No point in the level argument for an msi specific function. This is an existing function I made non-static. We have per-gsi callbacks so level is required there to match. I could add a wrapper I guess: int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, int irq_source_id, int level) { if (!level) return -1; return kvm_send_msi(irq_entry, kvm, irq_source_id); } This results in less code for irqfd but more code for ioctl injection ... is it worth it? #else @@ -614,6 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) } static inline void kvm_irqfd_release(struct kvm *kvm) {} blank line There's no line before kvm_eventfd_init either ... I added one. +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ +rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap
On 11/18/2010 07:15 AM, Takuya Yoshikawa wrote: Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using rmap: kvm: rework remove-write-access for a slot http://www.spinics.net/lists/kvm/msg35871.html One problem pointed out there was that this approach might hurt cache locality and make things slow down. But if we restrict the story to dirty logging, we notice that only small portion of pages are actually needed to be write protected. For example, I have confirmed that even when we are playing with tools like x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%. In the case of live-migration, we will see more sparseness in the usual workload because the RAM size is really big. So this patch uses his approach with small modification to use switched out dirty bitmap as a hint to restrict the rmap travel. We can also use this to selectively write protect pages to reduce unwanted page faults in the future. Looks like a good approach. Any measurements? +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte = rmap_next(kvm, rmapp, NULL); + + while (spte) { + /* avoid RMW */ + if (is_writable_pte(*spte)) + *spte= ~PT_WRITABLE_MASK; This is racy, *spte can be modified concurrently by hardware. update_spte() can be used for this. + spte = rmap_next(kvm, rmapp, spte); + } +} + +/* + * Write protect the pages set dirty in a given bitmap. + */ +void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm, + struct kvm_memory_slot *memslot, + unsigned long *dirty_bitmap) +{ + int i; + unsigned long gfn_offset; + + for_each_set_bit(gfn_offset, dirty_bitmap, memslot-npages) { + rmapp_remove_write_access(kvm,memslot-rmap[gfn_offset]); + + for (i = 0; i KVM_NR_PAGE_SIZES - 1; i++) { + unsigned long gfn = memslot-base_gfn + gfn_offset; + unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2); + int idx = gfn / huge - memslot-base_gfn / huge; Better to use a shift than a division here. + + if (!(gfn_offset || (gfn % huge))) + break; Why? + rmapp_remove_write_access(kvm, + memslot-lpage_info[i][idx].rmap_pde); + } + } + kvm_flush_remote_tlbs(kvm); +} + void kvm_mmu_zap_all(struct kvm *kvm) { struct kvm_mmu_page *sp, *node; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 038d719..3556b4d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, } /* + * Check the dirty bit ratio of a given memslot. + * 0: clean + * 1: sparse + * 2: dense + */ +static int dirty_bitmap_density(struct kvm_memory_slot *memslot) +{ + if (!memslot-num_dirty_bits) + return 0; + if (memslot-num_dirty_bits memslot-npages / 128) + return 1; + return 2; +} Use an enum please. + +/* * Get (and clear) the dirty memory log for a memory slot. */ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) { - int r; + int r, density; struct kvm_memory_slot *memslot; unsigned long n; @@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); /* If nothing is dirty, don't bother messing with page tables. */ - if (memslot-num_dirty_bits) { + density = dirty_bitmap_density(memslot); + if (density) { struct kvm_memslots *slots, *old_slots; unsigned long *dirty_bitmap; @@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, kfree(old_slots); spin_lock(kvm-mmu_lock); - kvm_mmu_slot_remove_write_access(kvm, log-slot); + if (density == 2) + kvm_mmu_slot_remove_write_access(kvm, log-slot); + else + kvm_mmu_slot_remove_write_access_mask(kvm, + slots-memslots[log-slot], + dirty_bitmap); spin_unlock(kvm-mmu_lock); wrt. O(1) write protection: hard to tell if the two methods can coexist. For direct mapped shadow pages (i.e. ep/npt) I think we can use the mask to speed up clearing of an individual sp's sptes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On 11/18/2010 03:03 PM, Michael S. Tsirkin wrote: int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); No point in the level argument for an msi specific function. This is an existing function I made non-static. We have per-gsi callbacks so level is required there to match. Right. I could add a wrapper I guess: int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, int irq_source_id, int level) { if (!level) return -1; return kvm_send_msi(irq_entry, kvm, irq_source_id); } This results in less code for irqfd but more code for ioctl injection ... is it worth it? IMO not. Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? Yes. Either precompute, or compute on first use and cache. Precompute is more realtime-friendly so I prefer it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot
Am 18.11.2010 13:54, Avi Kivity wrote: On 11/18/2010 07:14 AM, Takuya Yoshikawa wrote: This patch introduces the counter to hold the number of dirty bits in each memslot. We will use this to optimize dirty logging later. @@ -3217,11 +3216,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); -for (i = 0; !is_dirty i n/sizeof(long); i++) -is_dirty = memslot-dirty_bitmap[i]; - This can already be an improvement. /Me wonders if it wouldn't make sense to expand this optimization to the user space interface as well, i.e. signaling there are no dirty pages via some flag instead of writing zeros in a bitmap. Of course, this means supporting both interfaces for a longer period. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 02:37:08PM +0200, Gleb Natapov wrote: So that's unavoidable if we think paths are correct. But if we know they are wrong, we are better off correcting them first IMO. They are correct for x86. My patch set does not even tries to cover all HW. If sparc want to use them to it better be fixed. Or if there is enough info in the path to determine device it may choose to use it as is. Fair enough I guess. But the problem exists only if migration happens in a short window between start of the boot process and BIOS reading boot order string. After reboot new qemu should have new BIOS. That makes it even more nasty, doesn't it? No. Nasty as in hard to reproduce. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT
Hello, Am Donnerstag 18 November 2010 10:05:59 schrieb Hidetoshi Seto: Your have to be careful about compile-time-detection and runtime-detection: ... -#ifdef CONFIG_UTIMENSAT -return utimensat(dirfd, path, times, flags); -#else +{ +int ret = utimensat(dirfd, path, times, flags); +if (ret != -1 || errno != ENOSYS) { +return ret; +} +} You might still want to do the compile-time-check, something like: #ifdef CONFIG_UTIMENSAT { int ret = utimensat(dirfd, path, times, flags); if (ret != -1 || errno != ENOSYS) { return ret; } } #endif // fallback Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer h...@univention.de Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.de signature.asc Description: This is a digitally signed message part.
Re: [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot
On 11/18/2010 03:10 PM, Jan Kiszka wrote: Am 18.11.2010 13:54, Avi Kivity wrote: On 11/18/2010 07:14 AM, Takuya Yoshikawa wrote: This patch introduces the counter to hold the number of dirty bits in each memslot. We will use this to optimize dirty logging later. @@ -3217,11 +3216,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); -for (i = 0; !is_dirty i n/sizeof(long); i++) -is_dirty = memslot-dirty_bitmap[i]; - This can already be an improvement. /Me wonders if it wouldn't make sense to expand this optimization to the user space interface as well, i.e. signaling there are no dirty pages via some flag instead of writing zeros in a bitmap. Of course, this means supporting both interfaces for a longer period. An 8MB framebuffer is 2K bits, or 256 bytes wide. Comparing 256 cache hot bytes against zero is not worth a new interface. Larger memory slots are very unlikely to be always unmodified. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote: +static inline void kvm_irq_routing_update(struct kvm *kvm, +struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? I do not think this info should be part of routing entry. Routing entry is more about describing wires on the board. Other then that this is a good idea that, IIRC, we already discussed once. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 15/15] Pass boot device list to firmware.
On Thu, Nov 18, 2010 at 03:12:02PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 02:37:08PM +0200, Gleb Natapov wrote: So that's unavoidable if we think paths are correct. But if we know they are wrong, we are better off correcting them first IMO. They are correct for x86. My patch set does not even tries to cover all HW. If sparc want to use them to it better be fixed. Or if there is enough info in the path to determine device it may choose to use it as is. Fair enough I guess. But the problem exists only if migration happens in a short window between start of the boot process and BIOS reading boot order string. After reboot new qemu should have new BIOS. That makes it even more nasty, doesn't it? No. Nasty as in hard to reproduce. It is very easy to reproduce if you know what you are looking for :). Just stick sleep() in correct place in the BIOS. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On 11/18/2010 03:14 PM, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote: +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? I do not think this info should be part of routing entry. Routing entry is more about describing wires on the board. Other then that this is a good idea that, IIRC, we already discussed once. Not as part of the routing entry exposed to userspace. But as a private kernel field, why not? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote: +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ +rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? I do not think this info should be part of routing entry. Routing entry is more about describing wires on the board. Not for msi. kvm_kernel_irq_routing_entry seems to just keep an address/data pair in that case. So union { struct { unsigned irqchip; unsigned pin; } irqchip; struct msi_msg msi; }; would become union { struct { unsigned irqchip; unsigned pin; } irqchip; struct { struct msi_msg msi; struct kvm_vpcu *dest; } msi; }; or something like this. Other then that this is a good idea that, IIRC, we already discussed once. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 03:20:27PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote: +static inline void kvm_irq_routing_update(struct kvm *kvm, +struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? I do not think this info should be part of routing entry. Routing entry is more about describing wires on the board. Not for msi. kvm_kernel_irq_routing_entry seems to just keep an address/data pair in that case. So Yeah. Using routing_entry for MSI was miss design. We discussed that too :) union { struct { unsigned irqchip; unsigned pin; } irqchip; struct msi_msg msi; }; would become union { struct { unsigned irqchip; unsigned pin; } irqchip; struct { struct msi_msg msi; struct kvm_vpcu *dest; } msi; }; or something like this. Ah so you want to do it only for MSI? For MSI it makes sense. Remember though that sometimes destination depend on message itself (specifically on delivery mode). Other then that this is a good idea that, IIRC, we already discussed once. -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On 11/18/2010 03:35 PM, Gleb Natapov wrote: or something like this. Ah so you want to do it only for MSI? For MSI it makes sense. Remember though that sometimes destination depend on message itself (specifically on delivery mode). Yes, broadcast or multicast or lowest priority wouldn't get this treatment. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()
On 11/18/2010 1:17 PM, Avi Kivity wrote: cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. That won't work on gcc versions that didn't have __noclone yet. Noclone is fairly recent (4.5 or 4.4) -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 03:35:01PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 03:20:27PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote: +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ +rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? I do not think this info should be part of routing entry. Routing entry is more about describing wires on the board. Not for msi. kvm_kernel_irq_routing_entry seems to just keep an address/data pair in that case. So Yeah. Using routing_entry for MSI was miss design. We discussed that too :) union { struct { unsigned irqchip; unsigned pin; } irqchip; struct msi_msg msi; }; would become union { struct { unsigned irqchip; unsigned pin; } irqchip; struct { struct msi_msg msi; struct kvm_vpcu *dest; } msi; }; or something like this. Ah so you want to do it only for MSI? For MSI it makes sense. Remember though that sometimes destination depend on message itself (specifically on delivery mode). Of course. We'll take message/data and precompute destination. Set to NULL for e.g. broadcast and recompute at injection time in that case. BTW SELF doesn't work for MSI at the moment, not sure whether it's relevant or when is it used. Other then that this is a good idea that, IIRC, we already discussed once. -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 03:39:15PM +0200, Avi Kivity wrote: On 11/18/2010 03:35 PM, Gleb Natapov wrote: or something like this. Ah so you want to do it only for MSI? For MSI it makes sense. Remember though that sometimes destination depend on message itself (specifically on delivery mode). Yes, broadcast or multicast or lowest priority wouldn't get this treatment. Unless there's a single online VCPU :) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HAL type for Win2003 Server on recent KVM versions?
2010/11/18 Avi Kivity a...@redhat.com: On 11/18/2010 12:58 AM, Kenni Lund wrote: Hi I'm about to move a couple of virtual machines from a Fedora 11 system to a new server with a more recent operating system and newer version of KVM, etc. One of the guests is a Windows Server 2003 Standard SP2, which is currently running with the ACPI Multiprocessor PC HAL. Considering moving to RHEL, I've been reading the virtualization documentation for RHEL 6.0, which says that I need to set HAL to Standard PC when installing a new Win2003 guest. Since my current guest has been running perfectly fine for a long time with its current HAL, I was wondering if the system will become unstable, unbootable or what the disadvantage will be, if I move the guest to for example RHEL 6.0, without reinstalling or upgrading the guest to select another HAL mode? On the other hand, it seems like I can upgrade from the current ACPI Multiprocessor PC into Standard PC, but I'm not sure if I'll gain anything by trying this. I suggest using the default HAL, whatever it is. That's what everyone else is using so you get the best tested configuration. Thanks Avi, ACPI Multiprocessor PC was/is the default HAL, I didn't change anything when the system was originally installed. I'm curious why the RHEL 6 documentation claims that you actively need to select the Standard PC HAL on installation, if it's not even the recommended/preferred HAL...(?): Windows 2003 requires a specific computer type in order to install properly on a fully-virtualized guest. This needs to be specified at the beginning of the installation process.[1] [1] http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization_Windows2003.html Best regards Kenni -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()
On 11/18/2010 03:48 PM, Andi Kleen wrote: On 11/18/2010 1:17 PM, Avi Kivity wrote: cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. That won't work on gcc versions that didn't have __noclone yet. Noclone is fairly recent (4.5 or 4.4) Are the gcc versions that don't have noclone susceptible to cloning? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: fast-path msi injection with irqfd
On Thu, Nov 18, 2010 at 03:48:43PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 03:35:01PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 03:20:27PM +0200, Michael S. Tsirkin wrote: On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote: On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote: +static inline void kvm_irq_routing_update(struct kvm *kvm, +struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; Apart from these minor issues, looks good. Something we should consider improving is the loop over all VCPUs that kvm_irq_delivery_to_apic invokes. I think that (for non-broadcast interrupts) it should be possible to precompute an store the CPU in question as part of the routing entry. Something for a separate patch ... comments? I do not think this info should be part of routing entry. Routing entry is more about describing wires on the board. Not for msi. kvm_kernel_irq_routing_entry seems to just keep an address/data pair in that case. So Yeah. Using routing_entry for MSI was miss design. We discussed that too :) union { struct { unsigned irqchip; unsigned pin; } irqchip; struct msi_msg msi; }; would become union { struct { unsigned irqchip; unsigned pin; } irqchip; struct { struct msi_msg msi; struct kvm_vpcu *dest; } msi; }; or something like this. Ah so you want to do it only for MSI? For MSI it makes sense. Remember though that sometimes destination depend on message itself (specifically on delivery mode). Of course. We'll take message/data and precompute destination. Set to NULL for e.g. broadcast and recompute at injection time in that case. BTW SELF doesn't work for MSI at the moment, not sure whether it's relevant or when is it used. Yes, only lowest prio is defined for MSI. Self or all but self has not meaning for MSI. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HAL type for Win2003 Server on recent KVM versions?
On 11/18/2010 09:05 AM, Kenni Lund wrote: 2010/11/18 Avi Kivity a...@redhat.com: On 11/18/2010 12:58 AM, Kenni Lund wrote: Hi I'm about to move a couple of virtual machines from a Fedora 11 system to a new server with a more recent operating system and newer version of KVM, etc. One of the guests is a Windows Server 2003 Standard SP2, which is currently running with the ACPI Multiprocessor PC HAL. Considering moving to RHEL, I've been reading the virtualization documentation for RHEL 6.0, which says that I need to set HAL to Standard PC when installing a new Win2003 guest. Since my current guest has been running perfectly fine for a long time with its current HAL, I was wondering if the system will become unstable, unbootable or what the disadvantage will be, if I move the guest to for example RHEL 6.0, without reinstalling or upgrading the guest to select another HAL mode? On the other hand, it seems like I can upgrade from the current ACPI Multiprocessor PC into Standard PC, but I'm not sure if I'll gain anything by trying this. I suggest using the default HAL, whatever it is. That's what everyone else is using so you get the best tested configuration. Thanks Avi, ACPI Multiprocessor PC was/is the default HAL, I didn't change anything when the system was originally installed. I'm curious why the RHEL 6 documentation claims that you actively need to select the Standard PC HAL on installation, if it's not even the recommended/preferred HAL...(?): Windows 2003 requires a specific computer type in order to install properly on a fully-virtualized guest. This needs to be specified at the beginning of the installation process.[1] [1] http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization_Windows2003.html I'm pretty sure that was incorrectly copied over from the RHEL5 xen documentation. The docs people have been informed so it should be fixed soon-ish. - Cole -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM test: unattended installation cleanup
On Thu, 2010-11-18 at 12:44 -0200, Lucas Meneghel Rodrigues wrote: From: Jason Wang jasow...@redhat.com This patch does the following things: - Drop the built-in tftp/dhcp based unatteded installation for the following reason: 1 It's based on slirp and was not supported by major distributions. It's only used for linux guest installation and we can simply replace it with the -kernel method used by network installation. 2 The configuration was complex and hard to be shared with network based installation. After using -kernel method, most of the configurations could be shared and easy to be configurated. In order to achieve this: 1 a new option 'boot_path' is used to specifiy the path of the kernel/initrd from the medium. 2 autoyast file is detected through the extra_params instead of kernel_args (which is dropped with tftp option). - Re-strucutre the unattaneded installation related configurations and make them easy to be used and configurated. - Move cdrom related params into unattended_install.cdrom variants, as there's no need to launch with cdrom when testing a guest installed from network. Changes from v1: - Make possible to execute parallel guest installation (the 1st version of the patch was using a unique path for initrd.img and vmlinuz inside the host filesystem). - Reorganize some of the KVM autotest defaults Tested with RHEL/Fedora/Windows installation. OpenSUSE/SLES is untested. Oh, by the way, I've tested the patch with OpenSUSE 11.3 and it works like a charm. I am really really happy to get rid of slirp code dependency, thank you very much Jason! Lucas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM updates for 2.6.37-rc2
Linus, please pull from git://git.kernel.org/pub/scm/virt/kvm/kvm.git kvm-updates/2.6.37 To receive the following updates: Avi Kivity (2): KVM: Correct ordering of ldt reload wrt fs/gs reload KVM: VMX: Fix host userspace gsbase corruption arch/x86/kvm/svm.c |2 +- arch/x86/kvm/vmx.c | 19 +-- 2 files changed, 10 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()
On 11/18/2010 3:32 PM, Avi Kivity wrote: On 11/18/2010 03:48 PM, Andi Kleen wrote: On 11/18/2010 1:17 PM, Avi Kivity wrote: cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. That won't work on gcc versions that didn't have __noclone yet. Noclone is fairly recent (4.5 or 4.4) Are the gcc versions that don't have noclone susceptible to cloning? I believe the problem can happen due to inlining already -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()
On 11/18/2010 05:00 PM, Andi Kleen wrote: On 11/18/2010 3:32 PM, Avi Kivity wrote: On 11/18/2010 03:48 PM, Andi Kleen wrote: On 11/18/2010 1:17 PM, Avi Kivity wrote: cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. That won't work on gcc versions that didn't have __noclone yet. Noclone is fairly recent (4.5 or 4.4) Are the gcc versions that don't have noclone susceptible to cloning? I believe the problem can happen due to inlining already vmx_vcpu_run() cannot be inlined (it is only called via a function pointer; call site is in a different module) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On 11/18/2010 01:42 AM, Anthony Liguori wrote: Gack. For the benefit of those that want to join the fun without digging up the spec, these magic flippable segments the i440fx can toggle are 12 fixed 16k segments from 0xc to 0xe and a single 64k segment from 0xf to 0xf. There are read-enable and write-enable bits for each, so the chipset can be configured to read from the bios and write to memory (to setup BIOS-RAM caching), and read from memory and write to the bios (to enable BIOS-RAM caching). The other bit combinations are also available. Yup. As Gleb mentions, there's the SDRAM register which controls whether 0xa is mapped to PCI or whether it's mapped to RAM (but KVM explicitly disabled SMM support). KVM not supporting SMM is a bug (albeit one that is likely to remain unresolved for a while). Let's pretend that kvm smm support is not an issue. IIUC, SMM means that there two memory maps when the cpu accesses memory, one for SMM, one for non-SMM. For my purpose in using this to program the IOMMU with guest physical to host virtual addresses for device assignment, it doesn't really matter since there should never be a DMA in this range of memory. But for a general RAM API, I'm not sure either. I'm tempted to say that while this is in fact a use of RAM, the RAM is never presented to the guest as usable system memory (E820_RAM for x86), and should therefore be excluded from the RAM API if we're using it only to track regions that are actual guest usable physical memory. We had talked on irc that pc.c should be registering 0x0 to below_4g_mem_size as ram, but now I tend to disagree with that. The memory backing 0xa-0x10 is present, but it's not presented to the guest as usable RAM. What's your strict definition of what the RAM API includes? Is it only what the guest could consider usable RAM or does it also include quirky chipset accelerator features like this (everything with a guest physical address)? Thanks, Today we model on flat space that's a mixed of device memory, RAM, or ROM. This is not how machines work and the limitations of this model is holding us back. IRL, there's a block of RAM that's connected to a memory controller. The CPU is also connected to the memory controller. Devices are connected to another controller which is in turn connected to the memory controller. There may, in fact, be more than one controller between a device and the memory controller. A controller may change the way a device sees memory in arbitrary ways. In fact, two controllers accessing the same page might see something totally different. The idea behind the RAM API is to begin to establish this hierarchy. RAM is not what any particular device sees--it's actual RAM. IOW, the RAM API should represent what address mapping I would get if I talked directly to DIMMs. This is not what RamBlock is even though the name would suggest otherwise. RamBlocks are anything that qemu represents as cache consistency directly accessable memory. Device ROMs and areas of device RAM are all allocated from the RamBlock space. So the very first task of a RAM API is to simplify differentiate these two things. Once we have the base RAM API, we can start adding the proper APIs that sit on top of it (like a PCI memory API). Things aren't that bad - a ram_addr_t and a physical address are already different things, so we already have one level of translation. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On 11/18/2010 09:22 AM, Avi Kivity wrote: On 11/18/2010 01:42 AM, Anthony Liguori wrote: Gack. For the benefit of those that want to join the fun without digging up the spec, these magic flippable segments the i440fx can toggle are 12 fixed 16k segments from 0xc to 0xe and a single 64k segment from 0xf to 0xf. There are read-enable and write-enable bits for each, so the chipset can be configured to read from the bios and write to memory (to setup BIOS-RAM caching), and read from memory and write to the bios (to enable BIOS-RAM caching). The other bit combinations are also available. Yup. As Gleb mentions, there's the SDRAM register which controls whether 0xa is mapped to PCI or whether it's mapped to RAM (but KVM explicitly disabled SMM support). KVM not supporting SMM is a bug (albeit one that is likely to remain unresolved for a while). Let's pretend that kvm smm support is not an issue. IIUC, SMM means that there two memory maps when the cpu accesses memory, one for SMM, one for non-SMM. No. That's not what it means. With the i440fx, when the CPU accesses 0xa, it gets forwarded to the PCI bus no different than an access to 0xe. If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU accesses to 0xa to RAM instead of the PCI bus. Alternatively, if the SMRAM register is activated, then the i440fx will redirect 0xa to RAM regardless of whether the CPU asserts that signal. That means that even without KVM supporting SMM, this mode can happen. In general, the memory controller can redirect IO accesses to RAM or to the PCI bus. The PCI bus may redirect the access to the ISA bus. For my purpose in using this to program the IOMMU with guest physical to host virtual addresses for device assignment, it doesn't really matter since there should never be a DMA in this range of memory. But for a general RAM API, I'm not sure either. I'm tempted to say that while this is in fact a use of RAM, the RAM is never presented to the guest as usable system memory (E820_RAM for x86), and should therefore be excluded from the RAM API if we're using it only to track regions that are actual guest usable physical memory. We had talked on irc that pc.c should be registering 0x0 to below_4g_mem_size as ram, but now I tend to disagree with that. The memory backing 0xa-0x10 is present, but it's not presented to the guest as usable RAM. What's your strict definition of what the RAM API includes? Is it only what the guest could consider usable RAM or does it also include quirky chipset accelerator features like this (everything with a guest physical address)? Thanks, Today we model on flat space that's a mixed of device memory, RAM, or ROM. This is not how machines work and the limitations of this model is holding us back. IRL, there's a block of RAM that's connected to a memory controller. The CPU is also connected to the memory controller. Devices are connected to another controller which is in turn connected to the memory controller. There may, in fact, be more than one controller between a device and the memory controller. A controller may change the way a device sees memory in arbitrary ways. In fact, two controllers accessing the same page might see something totally different. The idea behind the RAM API is to begin to establish this hierarchy. RAM is not what any particular device sees--it's actual RAM. IOW, the RAM API should represent what address mapping I would get if I talked directly to DIMMs. This is not what RamBlock is even though the name would suggest otherwise. RamBlocks are anything that qemu represents as cache consistency directly accessable memory. Device ROMs and areas of device RAM are all allocated from the RamBlock space. So the very first task of a RAM API is to simplify differentiate these two things. Once we have the base RAM API, we can start adding the proper APIs that sit on top of it (like a PCI memory API). Things aren't that bad - a ram_addr_t and a physical address are already different things, so we already have one level of translation. Yeah, but ram_addr_t doesn't model anything meaningful IRL. It's an internal implementation detail. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On Wed, Nov 17, 2010 at 05:42:28PM -0600, Anthony Liguori wrote: For my purpose in using this to program the IOMMU with guest physical to host virtual addresses for device assignment, it doesn't really matter since there should never be a DMA in this range of memory. But for a general RAM API, I'm not sure either. I'm tempted to say that while this is in fact a use of RAM, the RAM is never presented to the guest as usable system memory (E820_RAM for x86), and should therefore be excluded from the RAM API if we're using it only to track regions that are actual guest usable physical memory. We had talked on irc that pc.c should be registering 0x0 to below_4g_mem_size as ram, but now I tend to disagree with that. The memory backing 0xa-0x10 is present, but it's not presented to the guest as usable RAM. What's your strict definition of what the RAM API includes? Is it only what the guest could consider usable RAM or does it also include quirky chipset accelerator features like this (everything with a guest physical address)? Thanks, Today we model on flat space that's a mixed of device memory, RAM, or ROM. This is not how machines work and the limitations of this model is holding us back. IRL, there's a block of RAM that's connected to a memory controller. The CPU is also connected to the memory controller. Devices are connected to another controller which is in turn connected to the memory controller. There may, in fact, be more than one controller between a device and the memory controller. A controller may change the way a device sees memory in arbitrary ways. In fact, two controllers accessing the same page might see something totally different. The idea behind the RAM API is to begin to establish this hierarchy. RAM is not what any particular device sees--it's actual RAM. IOW, the RAM API should represent what address mapping I would get if I talked directly to DIMMs. This is not what RamBlock is even though the name would suggest otherwise. RamBlocks are anything that qemu represents as cache consistency directly accessable memory. Device ROMs and areas of device RAM are all allocated from the RamBlock space. So the very first task of a RAM API is to simplify differentiate these two things. Once we have the base RAM API, we can start adding the proper APIs that sit on top of it (like a PCI memory API). +1 for all above. What happens when device access some address is completely different from what happens when CPU access the same address (or even another device on another bus). For instance how MSI is implemented now CPU can send MSI by writing to 0xfee0 memory range. I do not think you can do that on real HW. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On 11/18/2010 05:46 PM, Anthony Liguori wrote: On 11/18/2010 09:22 AM, Avi Kivity wrote: On 11/18/2010 01:42 AM, Anthony Liguori wrote: Gack. For the benefit of those that want to join the fun without digging up the spec, these magic flippable segments the i440fx can toggle are 12 fixed 16k segments from 0xc to 0xe and a single 64k segment from 0xf to 0xf. There are read-enable and write-enable bits for each, so the chipset can be configured to read from the bios and write to memory (to setup BIOS-RAM caching), and read from memory and write to the bios (to enable BIOS-RAM caching). The other bit combinations are also available. Yup. As Gleb mentions, there's the SDRAM register which controls whether 0xa is mapped to PCI or whether it's mapped to RAM (but KVM explicitly disabled SMM support). KVM not supporting SMM is a bug (albeit one that is likely to remain unresolved for a while). Let's pretend that kvm smm support is not an issue. IIUC, SMM means that there two memory maps when the cpu accesses memory, one for SMM, one for non-SMM. No. That's not what it means. With the i440fx, when the CPU accesses 0xa, it gets forwarded to the PCI bus no different than an access to 0xe. If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU accesses to 0xa to RAM instead of the PCI bus. That's what two memory maps mean. If you have one cpu in SMM and another outside SMM, then those two maps are active simultaneously. Alternatively, if the SMRAM register is activated, then the i440fx will redirect 0xa to RAM regardless of whether the CPU asserts that signal. That means that even without KVM supporting SMM, this mode can happen. That's a single memory map that is modified under hardware control, it's no different than BARs and such. Things aren't that bad - a ram_addr_t and a physical address are already different things, so we already have one level of translation. Yeah, but ram_addr_t doesn't model anything meaningful IRL. It's an internal implementation detail. Does it matter? We can say those are addresses on the memory bus. Since they are not observable anyway, who cares if the correspond with reality or not? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On 11/18/2010 09:57 AM, Avi Kivity wrote: On 11/18/2010 05:46 PM, Anthony Liguori wrote: On 11/18/2010 09:22 AM, Avi Kivity wrote: On 11/18/2010 01:42 AM, Anthony Liguori wrote: Gack. For the benefit of those that want to join the fun without digging up the spec, these magic flippable segments the i440fx can toggle are 12 fixed 16k segments from 0xc to 0xe and a single 64k segment from 0xf to 0xf. There are read-enable and write-enable bits for each, so the chipset can be configured to read from the bios and write to memory (to setup BIOS-RAM caching), and read from memory and write to the bios (to enable BIOS-RAM caching). The other bit combinations are also available. Yup. As Gleb mentions, there's the SDRAM register which controls whether 0xa is mapped to PCI or whether it's mapped to RAM (but KVM explicitly disabled SMM support). KVM not supporting SMM is a bug (albeit one that is likely to remain unresolved for a while). Let's pretend that kvm smm support is not an issue. IIUC, SMM means that there two memory maps when the cpu accesses memory, one for SMM, one for non-SMM. No. That's not what it means. With the i440fx, when the CPU accesses 0xa, it gets forwarded to the PCI bus no different than an access to 0xe. If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU accesses to 0xa to RAM instead of the PCI bus. That's what two memory maps mean. If you have one cpu in SMM and another outside SMM, then those two maps are active simultaneously. I'm not sure if more modern memory controllers do special things here, but for the i440fx, if any CPU asserts SMM mode, then any memory access to that space is going to access SMRAM. Alternatively, if the SMRAM register is activated, then the i440fx will redirect 0xa to RAM regardless of whether the CPU asserts that signal. That means that even without KVM supporting SMM, this mode can happen. That's a single memory map that is modified under hardware control, it's no different than BARs and such. There is a single block of RAM. The memory controller may either forward an address unmodified to the RAM block or it may forward the address to the PCI bus[1]. A non CPU access goes through a controller hierarchy and may be modified while it transverses the hierarchy. So really, we should have a big chunk of RAM that we associate with a guest, with a list of intercepts that changes as the devices are modified. Instead of having that list dispatch directly to a device, we should send all intercepted accesses to the memory controller and let the memory controller propagate out the access to the appropriate device. [1] The except is access to the local APIC. That's handled directly by the CPU (or immediately outside of the CPU before the access gets to the memory controller if the local APIC is external to the CPU). Things aren't that bad - a ram_addr_t and a physical address are already different things, so we already have one level of translation. Yeah, but ram_addr_t doesn't model anything meaningful IRL. It's an internal implementation detail. Does it matter? We can say those are addresses on the memory bus. Since they are not observable anyway, who cares if the correspond with reality or not? It matters a lot because the life cycle of RAM is different from the life cycle of ROM. For instance, the original goal was to madvise(MADV_DONTNEED) RAM on reboot. You can't do that to ROM because the contents matter. But for PV devices, we can be loose in how we define the way the devices interact with the rest of the system. For instance, we can say that virtio-pci devices are directly connected to RAM and do not go through the memory controllers. That means we could get stable mappings of the virtio ring. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Minor emulator cleanups
On Wed, Nov 17, 2010 at 01:40:49PM +0200, Avi Kivity wrote: A couple of trivial patches that clean up a bit of cruft from the emulator. Avi Kivity (2): KVM: x86 emulator: drop unused #ifndef __KERNEL__ KVM: x86 emulator: drop DPRINTF() arch/x86/kvm/emulate.c | 14 +- 1 files changed, 1 insertions(+), 13 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/6] KVM: MMU: don't drop spte if overwrite it from W to RO
On Thu, Nov 18, 2010 at 03:12:56PM +0800, Xiao Guangrong wrote: On 11/17/2010 11:57 PM, Avi Kivity wrote: set_pte: update_spte(sptep, spte); +/* + * If we overwrite a writable spte with a read-only one we + * should flush remote TLBs. Otherwise rmap_write_protect + * will find a read-only spte, even though the writable spte + * might be cached on a CPU's TLB. + */ +if (is_writable_pte(entry) !is_writable_pte(*sptep)) +kvm_flush_remote_tlbs(vcpu-kvm); There is no need to flush on sync_page path since the guest is responsible for it. If we don't, the next rmap_write_protect() will incorrectly decide that there's no need to flush tlbs. Maybe it's not a problem if guest can flush all tlbs after overwrite it? Marcelo, what's your comment about this? It can, but there is no guarantee. Your patch is correct. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: fix and cleanup: kvm_lock and hardware_disable
On Tue, Nov 16, 2010 at 05:32:44PM +0900, Takuya Yoshikawa wrote: Hello! During investigating kvm's mutual exclusions, starting from checking kvm's srcu grace periods, I could not understand some of the locking rules. This one is an example which I doubt. But I'm not so sure. Please check! Thanks, Takuya Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] Introduce segmented addresses to the emulator
On Wed, Nov 17, 2010 at 03:28:20PM +0200, Avi Kivity wrote: Currently we lose segment information associated with memory operands. This prevents us from doing proper segment checks. This patchset prepares the way by remembering which segment is associated with a memory operand. Avi Kivity (2): KVM: x86 emulator: preserve an operand's segment identity v2: truncate linear address to 32 bits if not in long mode (thanks Gleb) KVM: x86 emulator: do not perform address calculations on linear addresses v2: fix typo arch/x86/include/asm/kvm_emulate.h |5 +- arch/x86/kvm/emulate.c | 107 +++- 2 files changed, 60 insertions(+), 52 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]KVM: VMX: Inform user about INTEL_TXT dependency
On Wed, Nov 17, 2010 at 11:40:17AM +0800, Shane Wang wrote: Inform user to either disable TXT in the BIOS or do TXT launch with tboot before enabling KVM since some BIOSes do not set FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled. Signed-off-by: Shane Wang shane.w...@intel.com --- arch/x86/kvm/vmx.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On 11/18/2010 06:09 PM, Anthony Liguori wrote: That's what two memory maps mean. If you have one cpu in SMM and another outside SMM, then those two maps are active simultaneously. I'm not sure if more modern memory controllers do special things here, but for the i440fx, if any CPU asserts SMM mode, then any memory access to that space is going to access SMRAM. How does SMP work then? SMM Space Open (DOPEN). When DOPEN=1 and DLCK=0, SMM space DRAM is made visible even when CPU cycle does not indicate SMM mode access via EXF4#/Ab7# signal. This is intended to help BIOS initialize SMM space. Software should ensure that DOPEN=1 is mutually exclusive with DCLS=1. When DLCK is set to a 1, DOPEN is set to 0 and becomes read only. The words cpu cycle does not indicate SMM mode seem to say that SMM accesses are made on a per-transaction basis, or so my lawyers tell me. Alternatively, if the SMRAM register is activated, then the i440fx will redirect 0xa to RAM regardless of whether the CPU asserts that signal. That means that even without KVM supporting SMM, this mode can happen. That's a single memory map that is modified under hardware control, it's no different than BARs and such. There is a single block of RAM. The memory controller may either forward an address unmodified to the RAM block or it may forward the address to the PCI bus[1]. A non CPU access goes through a controller hierarchy and may be modified while it transverses the hierarchy. So really, we should have a big chunk of RAM that we associate with a guest, with a list of intercepts that changes as the devices are modified. Instead of having that list dispatch directly to a device, we should send all intercepted accesses to the memory controller and let the memory controller propagate out the access to the appropriate device. [1] The except is access to the local APIC. That's handled directly by the CPU (or immediately outside of the CPU before the access gets to the memory controller if the local APIC is external to the CPU). Agree. However the point with SMM is that the dispatch is made not only based on the address, but also based on SMM mode (and, unfortunately, can also be different based on read vs write). Things aren't that bad - a ram_addr_t and a physical address are already different things, so we already have one level of translation. Yeah, but ram_addr_t doesn't model anything meaningful IRL. It's an internal implementation detail. Does it matter? We can say those are addresses on the memory bus. Since they are not observable anyway, who cares if the correspond with reality or not? It matters a lot because the life cycle of RAM is different from the life cycle of ROM. For instance, the original goal was to madvise(MADV_DONTNEED) RAM on reboot. You can't do that to ROM because the contents matter. I don't think you can do that to RAM either. But for PV devices, we can be loose in how we define the way the devices interact with the rest of the system. For instance, we can say that virtio-pci devices are directly connected to RAM and do not go through the memory controllers. That means we could get stable mappings of the virtio ring. That wouldn't work once we have an iommu and start to assign them to nested guests. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC
On Thu, Nov 18, 2010 at 06:18:06PM +0200, Avi Kivity wrote: But for PV devices, we can be loose in how we define the way the devices interact with the rest of the system. For instance, we can say that virtio-pci devices are directly connected to RAM and do not go through the memory controllers. That means we could get stable mappings of the virtio ring. That wouldn't work once we have an iommu and start to assign them to nested guests. Yea. Not sure whether I'm worried about that though. Mixing in all the problems inherent in nested virt, PV and assigned devices seems especially masochistic. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/6] KVM: MMU: don't drop spte if overwrite it from W to RO
On 11/18/2010 05:32 PM, Marcelo Tosatti wrote: There is no need to flush on sync_page path since the guest is responsible for it. If we don't, the next rmap_write_protect() will incorrectly decide that there's no need to flush tlbs. Maybe it's not a problem if guest can flush all tlbs after overwrite it? Marcelo, what's your comment about this? It can, but there is no guarantee. Your patch is correct. We keep tripping on the same problem again and again. spte.w (and tlb.pte.w) is multiplexed between guest and host, hence we cannot trust the guest regarding its consistency. I wish we had a systematic way of dealing with this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: fast-path msi injection with irqfd
Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. This also adds some comments about locking rules and rcu usage in code. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- OK, this seems to work fine for me. Tested with virtio-net in guest with and without vhost-net. Pls review/apply if appropriate. include/linux/kvm_host.h | 16 virt/kvm/eventfd.c | 91 -- virt/kvm/irq_comm.c |7 ++-- 3 files changed, 99 insertions(+), 15 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a055742..4393c1b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -16,6 +16,7 @@ #include linux/mm.h #include linux/preempt.h #include linux/msi.h +#include linux/rcupdate.h #include asm/signal.h #include linux/kvm.h @@ -206,6 +207,10 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP + /* +* Update side is protected by irq_lock and, +* if configured, irqfds.lock. +*/ struct kvm_irq_routing_table __rcu *irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; @@ -462,6 +467,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, + int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); @@ -603,17 +610,26 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); +void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #else static inline void kvm_eventfd_init(struct kvm *kvm) {} + static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags) { return -EINVAL; } static inline void kvm_irqfd_release(struct kvm *kvm) {} + +static inline void kvm_irq_routing_update(struct kvm *kvm, + struct kvm_irq_routing_table *irq_rt) +{ + rcu_assign_pointer(kvm-irq_routing, irq_rt); +} + static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) { return -ENOSYS; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..2ca4535 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -44,14 +44,19 @@ */ struct _irqfd { - struct kvm *kvm; - struct eventfd_ctx *eventfd; - int gsi; - struct list_head list; - poll_tablept; - wait_queue_t wait; - struct work_structinject; - struct work_structshutdown; + /* Used for MSI fast-path */ + struct kvm *kvm; + wait_queue_t wait; + /* Update side is protected by irqfds.lock */ + struct kvm_kernel_irq_routing_entry __rcu *irq_entry; + /* Used for level IRQ fast-path */ + int gsi; + struct work_struct inject; + /* Used for setup/shutdown */ + struct eventfd_ctx *eventfd; + struct list_head list; + poll_table pt; + struct work_struct shutdown; }; static struct workqueue_struct *irqfd_cleanup_wq; @@ -125,14 +130,22 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; + struct kvm_kernel_irq_routing_entry *irq; + struct kvm *kvm = irqfd-kvm; - if (flags POLLIN) + if (flags POLLIN) { + rcu_read_lock(); + irq = rcu_dereference(irqfd-irq_entry); /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + if (irq) + kvm_set_msi(irq, kvm,
Re: HAL type for Win2003 Server on recent KVM versions?
2010/11/18 Cole Robinson crobi...@redhat.com: On 11/18/2010 09:05 AM, Kenni Lund wrote: I'm curious why the RHEL 6 documentation claims that you actively need to select the Standard PC HAL on installation, if it's not even the recommended/preferred HAL...(?): Windows 2003 requires a specific computer type in order to install properly on a fully-virtualized guest. This needs to be specified at the beginning of the installation process.[1] [1] http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization_Windows2003.html I'm pretty sure that was incorrectly copied over from the RHEL5 xen documentation. The docs people have been informed so it should be fixed soon-ish. Perfect, thanks! :) Best regards Kenni -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] Minimal RAM API support
v3: - Address review comments - pc registers all memory below 4G in one chunk Let me know if there are any further issues. Thanks, Alex v2: - Move to Makefile.objs - Move structures to memory.c and create a callback function - Fix memory leak I haven't moved to the state parameter because there should only be a single instance of this per VM. The state parameter seems like it would add complications in setup and function calling, but maybe point me to an example if I'm off base. Thanks, Alex v1: For VFIO based device assignment, we need to know what guest memory areas are actual RAM. RAMBlocks have long since become a grab bag of misc allocations, so aren't effective for this. Anthony has had a RAM API in mind for a while now that addresses this problem. This implements just enough of it so that we have an interface to get actual guest memory physical addresses to setup the host IOMMU. We can continue building a full RAM API on top of this stub. Anthony, feel free to add copyright to memory.c as it's based on your initial implementation. I had to add something since the file in your branch just copies a header with Frabrice's copywrite. Thanks, Alex --- Alex Williamson (2): RAM API: Make use of it for x86 PC Minimal RAM API support Makefile.objs |1 + cpu-common.h |2 + hw/pc.c |9 ++--- memory.c | 97 + memory.h | 44 ++ 5 files changed, 147 insertions(+), 6 deletions(-) create mode 100644 memory.c create mode 100644 memory.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] Minimal RAM API support
This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Makefile.objs |1 + cpu-common.h |2 + memory.c | 97 + memory.h | 44 ++ 4 files changed, 144 insertions(+), 0 deletions(-) create mode 100644 memory.c create mode 100644 memory.h diff --git a/Makefile.objs b/Makefile.objs index f07fb01..33fae0b 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -154,6 +154,7 @@ hw-obj-y += vl.o loader.o hw-obj-y += virtio.o virtio-console.o hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o hw-obj-y += watchdog.o +hw-obj-y += memory.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o hw-obj-$(CONFIG_ECC) += ecc.o hw-obj-$(CONFIG_NAND) += nand.o diff --git a/cpu-common.h b/cpu-common.h index a543b5d..6aa2738 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -23,6 +23,8 @@ /* address in the RAM (different from a physical address) */ typedef unsigned long ram_addr_t; +#include memory.h + /* memory API */ typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value); diff --git a/memory.c b/memory.c new file mode 100644 index 000..742776f --- /dev/null +++ b/memory.c @@ -0,0 +1,97 @@ +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ +#include memory.h +#include range.h + +typedef struct ram_slot { +target_phys_addr_t start_addr; +ram_addr_t size; +ram_addr_t offset; +QLIST_ENTRY(ram_slot) next; +} ram_slot; + +static QLIST_HEAD(ram_slots, ram_slot) ram_slots = +QLIST_HEAD_INITIALIZER(ram_slots); + +static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr, + ram_addr_t size) +{ +ram_slot *slot; + +QLIST_FOREACH(slot, ram_slots, next) { +if (slot-start_addr == start_addr slot-size == size) { +return slot; +} + +if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) { +hw_error(Ram range overlaps existing slot\n); +} +} + +return NULL; +} + +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset) +{ +ram_slot *slot; + +if (!size) { +return -EINVAL; +} + +assert(!qemu_ram_find_slot(start_addr, size)); + +slot = qemu_mallocz(sizeof(ram_slot)); + +slot-start_addr = start_addr; +slot-size = size; +slot-offset = phys_offset; + +QLIST_INSERT_HEAD(ram_slots, slot, next); + +cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset); + +return 0; +} + +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size) +{ +ram_slot *slot; + +if (!size) { +return; +} + +slot = qemu_ram_find_slot(start_addr, size); +assert(slot != NULL); + +QLIST_REMOVE(slot, next); +qemu_free(slot); +cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED); + +return; +} + +int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn) +{ +ram_slot *slot; + +QLIST_FOREACH(slot, ram_slots, next) { +int ret = fn(opaque, slot-start_addr, slot-size, slot-offset); +if (ret) { +return ret; +} +} +return 0; +} diff --git a/memory.h b/memory.h new file mode 100644 index 000..e7aa5cb --- /dev/null +++ b/memory.h @@ -0,0 +1,44 @@ +#ifndef QEMU_MEMORY_H +#define QEMU_MEMORY_H +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include qemu-common.h +#include cpu-common.h + +typedef int (*qemu_ram_for_each_slot_fn)(void *opaque, + target_phys_addr_t start_addr, + ram_addr_t size, + ram_addr_t phys_offset); + +/** + * qemu_ram_register() : Register a region of guest physical memory + * + * The new region must not overlap an existing region. + */ +int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset); + +/** + * qemu_ram_unregister() : Unregister a region of guest physical memory + */ +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size); + +/** + * qemu_ram_for_each_slot() : Call fn() on each registered region + * + * Stop on non-zero return from fn(). + */ +int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn); + +#endif /* QEMU_MEMORY_H */ -- To unsubscribe
[PATCH v3 2/2] RAM API: Make use of it for x86 PC
Register the actual VM RAM using the new API Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/pc.c |9 +++-- 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 69b13bf..fb7ee21 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -912,14 +912,11 @@ void pc_memory_init(ram_addr_t ram_size, /* allocate RAM */ ram_addr = qemu_ram_alloc(NULL, pc.ram, below_4g_mem_size + above_4g_mem_size); -cpu_register_physical_memory(0, 0xa, ram_addr); -cpu_register_physical_memory(0x10, - below_4g_mem_size - 0x10, - ram_addr + 0x10); +qemu_ram_register(0, below_4g_mem_size, ram_addr); #if TARGET_PHYS_ADDR_BITS 32 if (above_4g_mem_size 0) { -cpu_register_physical_memory(0x1ULL, above_4g_mem_size, - ram_addr + below_4g_mem_size); +qemu_ram_register(0x1ULL, above_4g_mem_size, + ram_addr + below_4g_mem_size); } #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ppc32 build failed
Hi, I searched the archive found some discutions about this, not fixed yet? could someone tell, is g4 kvm available now? powerpc g4 build failed (host kernel 2.6.37-rc2): CCppc-softmmu/kvm.o /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu': /home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no member named 'pvr' /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers': /home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c: At top level: /home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 'kvm_arch_process_irqchip_events' /home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 'kvm_arch_process_irqchip_events' was here make[1]: *** [kvm.o] Error 1 make: *** [subdir-ppc-softmmu] Error 2 -- To adhere means to yield. To yield means to adhere. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ppc32 build failed
(2010/11/19 15:01), Yang Rui Rui wrote: Hi, I searched the archive found some discutions about this, not fixed yet? could someone tell, is g4 kvm available now? Hi, (added kvm-ppc to Cc) I'm using g4 (Mac mini box) to run KVM. - though not tried 2.6.37-rc2 yet. Aren't you using upstream qemu? IIRC, ppc kvm needs to use upstream qemu. You can see useful information on KVM PowerPC port page. http://www.linux-kvm.org/page/PowerPC - no g4 example but we can find enough information. BTW, why the entry Book3S PPC32 in the processor support table is still NO, anyone? http://www.linux-kvm.org/page/Processor_support I don't know well about PPC, so you need to ask Alex about more technical issues. Takuya powerpc g4 build failed (host kernel 2.6.37-rc2): CC ppc-softmmu/kvm.o /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu': /home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no member named 'pvr' /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers': /home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c: At top level: /home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 'kvm_arch_process_irqchip_events' /home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 'kvm_arch_process_irqchip_events' was here make[1]: *** [kvm.o] Error 1 make: *** [subdir-ppc-softmmu] Error 2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ppc32 build failed
On 11/19/2010 02:24 PM, Takuya Yoshikawa wrote: (2010/11/19 15:01), Yang Rui Rui wrote: Hi, I searched the archive found some discutions about this, not fixed yet? could someone tell, is g4 kvm available now? Hi, (added kvm-ppc to Cc) I'm using g4 (Mac mini box) to run KVM. - though not tried 2.6.37-rc2 yet. Aren't you using upstream qemu? IIRC, ppc kvm needs to use upstream qemu. I use qemu-kvm git version. Do you means qemu instead of qemu-kvm? You can see useful information on KVM PowerPC port page. http://www.linux-kvm.org/page/PowerPC - no g4 example but we can find enough information. BTW, why the entry Book3S PPC32 in the processor support table is still NO, anyone? http://www.linux-kvm.org/page/Processor_support Thanks a lot I don't know well about PPC, so you need to ask Alex about more technical issues. Takuya powerpc g4 build failed (host kernel 2.6.37-rc2): CC ppc-softmmu/kvm.o /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu': /home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no member named 'pvr' /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers': /home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c: At top level: /home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 'kvm_arch_process_irqchip_events' /home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 'kvm_arch_process_irqchip_events' was here make[1]: *** [kvm.o] Error 1 make: *** [subdir-ppc-softmmu] Error 2 -- To adhere means to yield. To yield means to adhere. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ppc32 build failed
On 11/19/2010 02:37 PM, Yang Ruirui R wrote: On 11/19/2010 02:24 PM, Takuya Yoshikawa wrote: (2010/11/19 15:01), Yang Rui Rui wrote: Hi, I searched the archive found some discutions about this, not fixed yet? could someone tell, is g4 kvm available now? Hi, (added kvm-ppc to Cc) I'm using g4 (Mac mini box) to run KVM. - though not tried 2.6.37-rc2 yet. Aren't you using upstream qemu? IIRC, ppc kvm needs to use upstream qemu. I use qemu-kvm git version. Do you means qemu instead of qemu-kvm? Hi, qemu 0.13.0 build passed You can see useful information on KVM PowerPC port page. http://www.linux-kvm.org/page/PowerPC - no g4 example but we can find enough information. BTW, why the entry Book3S PPC32 in the processor support table is still NO, anyone? http://www.linux-kvm.org/page/Processor_support Thanks a lot I don't know well about PPC, so you need to ask Alex about more technical issues. Takuya powerpc g4 build failed (host kernel 2.6.37-rc2): CC ppc-softmmu/kvm.o /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu': /home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no member named 'pvr' /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers': /home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c: At top level: /home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 'kvm_arch_process_irqchip_events' /home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 'kvm_arch_process_irqchip_events' was here make[1]: *** [kvm.o] Error 1 make: *** [subdir-ppc-softmmu] Error 2 -- To adhere means to yield. To yield means to adhere. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ppc32 build failed
Aren't you using upstream qemu? IIRC, ppc kvm needs to use upstream qemu. I use qemu-kvm git version. Do you means qemu instead of qemu-kvm? Hi, qemu 0.13.0 build passed Yes, that what I meant! Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
http://maralemprendimientos.com/important.php -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ppc32 build failed
(2010/11/19 15:01), Yang Rui Rui wrote: Hi, I searched the archive found some discutions about this, not fixed yet? could someone tell, is g4 kvm available now? Hi, (added kvm-ppc to Cc) I'm using g4 (Mac mini box) to run KVM. - though not tried 2.6.37-rc2 yet. Aren't you using upstream qemu? IIRC, ppc kvm needs to use upstream qemu. You can see useful information on KVM PowerPC port page. http://www.linux-kvm.org/page/PowerPC - no g4 example but we can find enough information. BTW, why the entry Book3S PPC32 in the processor support table is still NO, anyone? http://www.linux-kvm.org/page/Processor_support I don't know well about PPC, so you need to ask Alex about more technical issues. Takuya powerpc g4 build failed (host kernel 2.6.37-rc2): CC ppc-softmmu/kvm.o /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu': /home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no member named 'pvr' /home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers': /home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no member named 'u' /home/dave/qemu-kvm/target-ppc/kvm.c: At top level: /home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 'kvm_arch_process_irqchip_events' /home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 'kvm_arch_process_irqchip_events' was here make[1]: *** [kvm.o] Error 1 make: *** [subdir-ppc-softmmu] Error 2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html