Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-18 Thread Jes Sorensen
On 11/18/10 01:41, Hidetoshi Seto wrote:
 This patch introduce a fallback mechanism for old systems that do not
 support utimensat().  This fix build failure with following warnings:
 
 hw/virtio-9p-local.c: In function 'local_utimensat':
 hw/virtio-9p-local.c:479: warning: implicit declaration of function 
 'utimensat'
 hw/virtio-9p-local.c:479: warning: nested extern declaration of 'utimensat'
 
 and:
 
 hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod':
 hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this 
 function)
 hw/virtio-9p.c:1410: error: (Each undeclared identifier is reported only once
 hw/virtio-9p.c:1410: error: for each function it appears in.)
 hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this 
 function)
 hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod':
 hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this 
 function)
 
 v4:
   - Use tv_now.tv_usec
   - Rebased on latest qemu.git
 v3:
   - Use better alternative handling for UTIME_NOW/OMIT
   - Move qemu_utimensat() to cutils.c
 V2:
   - Introduce qemu_utimensat()
 
 Acked-by: Chris Wright chr...@sous-sol.org
 Acked-by: M. Mohan Kumar mo...@in.ibm.com
 Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com

Hi Hidetoshi,

I think the idea of the patch is good, but please move qemu_utimensat()
to oslib-posix.c and provide a wrapper for oslib-win32.c. It is
emulation for a system library function, so it doesn't belong in
cutils.c, but rather in the oslib group.

Thanks,
Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: clear vapic after reset

2010-11-18 Thread Avi Kivity
Clear the vapic address immediately after reset.  This allows dual-boot guests
to work efficiently, and more importantly, works around the bios using
'rep insb' to read in the option rom and confusing the vapic machinery.

Signed-off-by: Avi Kivity a...@redhat.com
---
 target-i386/kvm.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index b7b2430..95e5d02 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -439,8 +439,20 @@ int kvm_arch_init_vcpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data);
 }
 
+static void kvm_clear_vapic(CPUState *env)
+{
+#ifdef KVM_SET_VAPIC_ADDR
+struct kvm_vapic_addr va = {
+.vapic_addr = 0,
+};
+
+kvm_vcpu_ioctl(env, KVM_SET_VAPIC_ADDR, va);
+#endif
+}
+
 void kvm_arch_reset_vcpu(CPUState *env)
 {
+kvm_clear_vapic(env);
 env-exception_injected = -1;
 env-interrupt_injected = -1;
 env-nmi_injected = 0;
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: take kvm_lock for hardware_disable() during cpu hotplug

2010-11-18 Thread Takuya Yoshikawa

(2010/11/18 14:45), Zachary Amsden wrote:


No, I believe your patch is correct and the lock should be there. Did you test 
with spinlock debugging just to be sure?



Sorry but no.

I have no experience with cpu hotplug.

So I thought it would take too much time to do real test by myself and reported 
like this this time.

Any easy way to test?


Yes, quite easy. Some systems may not let cpu0 go offline, but you can manually 
disable and re-enable the other processors:

[r...@mysore ~]# echo 0  /sys/devices/system/cpu/cpu1/online
[r...@mysore ~]# echo 1  /sys/devices/system/cpu/cpu1/online

Cheers,

Zach



Thanks a lot!

I tried and got a log like this:

kernel: [  422.084620] kvm: disabling virtualization on CPU1
kernel: [  422.085757] CPU 1 is now offline
kernel: [  422.085766] lockdep: fixing up alternatives.
kernel: [  422.085780] SMP alternatives: switching to UP code
kernel: [  472.081069] lockdep: fixing up alternatives.
kernel: [  472.081080] SMP alternatives: switching to SMP code
kernel: [  472.099182] Booting Node 0 Processor 1 APIC 0x1
kernel: [  422.104799] kvm: enabling virtualization on CPU1

Working correctly, I think.


  Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-18 Thread Philipp Hahn
Hello,

Am Donnerstag 18 November 2010 01:41:39 schrieb Hidetoshi Seto:
 This patch introduce a fallback mechanism for old systems that do not
 support utimensat().  This fix build failure with following warnings:

 +#ifdef CONFIG_UTIMENSAT
 +return utimensat(dirfd, path, times, flags);
 +#else
 +/* Fallback: use utimes() instead of utimensat() */

Since we also had a problem with utimestat() some time ago with Samba
http://lists.samba.org/archive/samba-technical/2010-November/074613.html
I'd like to comment on that:

Your have to be careful about compile-time-detection and runtime-detection: If 
you later run your utimestat()-enabled binary on an older kernel not 
supporting that syscall, you get -1 as the return-value and errno=ENOSYS. So 
even if you detected utimesatat() during compile-time, please always provide 
a fallback for run-time.
This is less important for people compiling there own version of kvm, but is 
essential for Linux distributions, since people often run newer kvm versions 
on older kernels.

Sincerely
Philipp Hahn
-- 
Philipp Hahn   Open Source Software Engineer  h...@univention.de   
Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0
Mary-Somerville-Str.1  28359 Bremen   fax: +49 421 22 232-99
http://www.univention.de


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-18 Thread Hidetoshi Seto
(2010/11/18 17:02), Jes Sorensen wrote:
 On 11/18/10 01:41, Hidetoshi Seto wrote:
 This patch introduce a fallback mechanism for old systems that do not
 support utimensat().  This fix build failure with following warnings:

 hw/virtio-9p-local.c: In function 'local_utimensat':
 hw/virtio-9p-local.c:479: warning: implicit declaration of function 
 'utimensat'
 hw/virtio-9p-local.c:479: warning: nested extern declaration of 'utimensat'

 and:

 hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod':
 hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this 
 function)
 hw/virtio-9p.c:1410: error: (Each undeclared identifier is reported only once
 hw/virtio-9p.c:1410: error: for each function it appears in.)
 hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this 
 function)
 hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod':
 hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this 
 function)

 v4:
   - Use tv_now.tv_usec
   - Rebased on latest qemu.git
 v3:
   - Use better alternative handling for UTIME_NOW/OMIT
   - Move qemu_utimensat() to cutils.c
 V2:
   - Introduce qemu_utimensat()

 Acked-by: Chris Wright chr...@sous-sol.org
 Acked-by: M. Mohan Kumar mo...@in.ibm.com
 Signed-off-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
 
 Hi Hidetoshi,
 
 I think the idea of the patch is good, but please move qemu_utimensat()
 to oslib-posix.c and provide a wrapper for oslib-win32.c. It is
 emulation for a system library function, so it doesn't belong in
 cutils.c, but rather in the oslib group.

Unfortunately one fact is that I'm not familiar with win32 codes so I don't
have any idea how the wrapper for win32 will be...
If someone could kindly tell me about the win32 part, I could update this
patch to v5, but even though I have no test environment for the new part :-

Could we wait an incremental patch on this v4?
Can somebody help me?  Volunteers?


Thanks,
H.Seto


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote:
 Store irq routing table pointer in the irqfd object,
 and use that to inject MSI directly without bouncing out to
 a kernel thread.
 
 While we touch this structure, rearrange irqfd fields to make fastpath
 better packed for better cache utilization.
 
 Some notes on the design:
 - Use pointer into the rt instead of copying an entry,
   to make it possible to use rcu, thus side-stepping
   locking complexities.  We also save some memory this way.
What locking complexity is there with copying entry approach?

 - Old workqueue code is still used for level irqs.
   I don't think we DTRT with level anyway, however,
   it seems easier to keep the code around as
   it has been thought through and debugged, and fix level later than
   rip out and re-instate it later.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 The below is compile tested only.  Sending out for early
 flames/feedback.  Please review!
 
  include/linux/kvm_host.h |4 ++
  virt/kvm/eventfd.c   |   81 +++--
  virt/kvm/irq_comm.c  |6 ++-
  3 files changed, 78 insertions(+), 13 deletions(-)
 
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index a055742..b6f7047 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
 *ioapic,
  unsigned long *deliver_bitmask);
  #endif
  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
 *kvm,
 + int irq_source_id, int level);
  void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
  void kvm_register_irq_ack_notifier(struct kvm *kvm,
  struct kvm_irq_ack_notifier *kian);
 @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) 
 {}
  void kvm_eventfd_init(struct kvm *kvm);
  int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
  void kvm_irqfd_release(struct kvm *kvm);
 +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt);
  int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
  
  #else
 @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int 
 gsi, int flags)
  }
  
  static inline void kvm_irqfd_release(struct kvm *kvm) {}
 +static inline void kvm_irqfd_update(struct kvm *kvm) {}
  static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
  {
   return -ENOSYS;
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index c1f1e3c..49c1864 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -44,14 +44,18 @@
   */
  
  struct _irqfd {
 - struct kvm   *kvm;
 - struct eventfd_ctx   *eventfd;
 - int   gsi;
 - struct list_head  list;
 - poll_tablept;
 - wait_queue_t  wait;
 - struct work_structinject;
 - struct work_structshutdown;
 + /* Used for MSI fast-path */
 + struct kvm *kvm;
 + wait_queue_t wait;
 + struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
 + /* Used for level IRQ fast-path */
 + int gsi;
 + struct work_struct inject;
 + /* Used for setup/shutdown */
 + struct eventfd_ctx *eventfd;
 + struct list_head list;
 + poll_table pt;
 + struct work_struct shutdown;
  };
  
  static struct workqueue_struct *irqfd_cleanup_wq;
 @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 
 sync, void *key)
  {
   struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
   unsigned long flags = (unsigned long)key;
 + struct kvm_kernel_irq_routing_entry *irq;
  
 - if (flags  POLLIN)
 + if (flags  POLLIN) {
 + rcu_read_lock();
 + irq = irqfd-irq_entry;
Why not rcu_dereference()? And why it can't be zero here?

   /* An event has been signaled, inject an interrupt */
 - schedule_work(irqfd-inject);
 + if (irq)
 + kvm_set_msi(irq, irqfd-kvm, 
 KVM_USERSPACE_IRQ_SOURCE_ID, 1);
 + else
 + schedule_work(irqfd-inject);
 + rcu_read_unlock();
 + }
  
   if (flags  POLLHUP) {
   /* The eventfd is closing, detach from KVM */
 @@ -166,6 +178,7 @@ irqfd_ptable_queue_proc(struct file *file, 
 wait_queue_head_t *wqh,
  static int
  kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
  {
 + struct kvm_irq_routing_table *irq_rt;
   struct _irqfd *irqfd, *tmp;
   struct file *file = NULL;
   struct eventfd_ctx *eventfd = NULL;
 @@ -215,6 +228,10 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
   goto fail;
   }
  
 + rcu_read_lock();
 + irqfd_update(kvm, irqfd, 

Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-18 Thread Hidetoshi Seto
(2010/11/18 17:28), Philipp Hahn wrote:
 Hello,
 
 Am Donnerstag 18 November 2010 01:41:39 schrieb Hidetoshi Seto:
 This patch introduce a fallback mechanism for old systems that do not
 support utimensat().  This fix build failure with following warnings:
 
 +#ifdef CONFIG_UTIMENSAT
 +return utimensat(dirfd, path, times, flags);
 +#else
 +/* Fallback: use utimes() instead of utimensat() */
 
 Since we also had a problem with utimestat() some time ago with Samba
 http://lists.samba.org/archive/samba-technical/2010-November/074613.html
 I'd like to comment on that:
 
 Your have to be careful about compile-time-detection and runtime-detection: 
 If 
 you later run your utimestat()-enabled binary on an older kernel not 
 supporting that syscall, you get -1 as the return-value and errno=ENOSYS. So 
 even if you detected utimesatat() during compile-time, please always provide 
 a fallback for run-time.
 This is less important for people compiling there own version of kvm, but is 
 essential for Linux distributions, since people often run newer kvm versions 
 on older kernels.

Hum, you have a good point.

Well, then I'll change it like:

-#ifdef CONFIG_UTIMENSAT
-return utimensat(dirfd, path, times, flags);
-#else
+{
+int ret = utimensat(dirfd, path, times, flags);
+if (ret != -1 || errno != ENOSYS) {
+return ret;
+}
+}


Thanks,
H.Seto

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-18 Thread Jes Sorensen
On 11/18/10 09:48, Hidetoshi Seto wrote:
 (2010/11/18 17:02), Jes Sorensen wrote:
 Hi Hidetoshi,

 I think the idea of the patch is good, but please move qemu_utimensat()
 to oslib-posix.c and provide a wrapper for oslib-win32.c. It is
 emulation for a system library function, so it doesn't belong in
 cutils.c, but rather in the oslib group.
 
 Unfortunately one fact is that I'm not familiar with win32 codes so I don't
 have any idea how the wrapper for win32 will be...
 If someone could kindly tell me about the win32 part, I could update this
 patch to v5, but even though I have no test environment for the new part :-
 
 Could we wait an incremental patch on this v4?
 Can somebody help me?  Volunteers?

Hi Hidetoshi,

I don't actually know much about win32 myself, the only thing I do is to
try and cross-compile for it using mingw32 to make sure the build
doesn't break. One option is to leave it open, or put in a dummy wrapper
which asserts in the win32 part of the code, so that someone who is
interested in win32 can fix it up.

That should be pretty easy to do, and I think thats a fine starting point.

Cheers,
Jes

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 11:05:22AM +0200, Gleb Natapov wrote:
 On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote:
  Store irq routing table pointer in the irqfd object,
  and use that to inject MSI directly without bouncing out to
  a kernel thread.
  
  While we touch this structure, rearrange irqfd fields to make fastpath
  better packed for better cache utilization.
  
  Some notes on the design:
  - Use pointer into the rt instead of copying an entry,
to make it possible to use rcu, thus side-stepping
locking complexities.  We also save some memory this way.
 What locking complexity is there with copying entry approach?

Without RCU, we need two locks:
- irqfd lock to scan the list of irqfds
- eventfd wqh lock in the irqfd to update the entry
To update all irqfds on list, wqh lock would be nested within irqfd lock.
lock(kvm-irqfds.lock)
list_for_each(irqfd, kvm-irqfds.list)
lock(irqfd-wqh)
update(irqfd)
unlock(irqfd-wqh)
unlock(kvm-irqfds.lock)
Problem is, irqfd is nested within wqh for cleanup (POLLHUP) path.

With RCU we do assign and let sync take care of flushing old entries out.

  - Old workqueue code is still used for level irqs.
I don't think we DTRT with level anyway, however,
it seems easier to keep the code around as
it has been thought through and debugged, and fix level later than
rip out and re-instate it later.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  The below is compile tested only.  Sending out for early
  flames/feedback.  Please review!
  
   include/linux/kvm_host.h |4 ++
   virt/kvm/eventfd.c   |   81 
  +++--
   virt/kvm/irq_comm.c  |6 ++-
   3 files changed, 78 insertions(+), 13 deletions(-)
  
  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index a055742..b6f7047 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
  *ioapic,
 unsigned long *deliver_bitmask);
   #endif
   int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
  +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
  *kvm,
  +   int irq_source_id, int level);
   void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
   void kvm_register_irq_ack_notifier(struct kvm *kvm,
 struct kvm_irq_ack_notifier *kian);
  @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm 
  *kvm) {}
   void kvm_eventfd_init(struct kvm *kvm);
   int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
   void kvm_irqfd_release(struct kvm *kvm);
  +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table 
  *irq_rt);
   int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
   
   #else
  @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, 
  int gsi, int flags)
   }
   
   static inline void kvm_irqfd_release(struct kvm *kvm) {}
  +static inline void kvm_irqfd_update(struct kvm *kvm) {}
   static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
  *args)
   {
  return -ENOSYS;
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index c1f1e3c..49c1864 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -44,14 +44,18 @@
*/
   
   struct _irqfd {
  -   struct kvm   *kvm;
  -   struct eventfd_ctx   *eventfd;
  -   int   gsi;
  -   struct list_head  list;
  -   poll_tablept;
  -   wait_queue_t  wait;
  -   struct work_structinject;
  -   struct work_structshutdown;
  +   /* Used for MSI fast-path */
  +   struct kvm *kvm;
  +   wait_queue_t wait;
  +   struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
  +   /* Used for level IRQ fast-path */
  +   int gsi;
  +   struct work_struct inject;
  +   /* Used for setup/shutdown */
  +   struct eventfd_ctx *eventfd;
  +   struct list_head list;
  +   poll_table pt;
  +   struct work_struct shutdown;
   };
   
   static struct workqueue_struct *irqfd_cleanup_wq;
  @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 
  sync, void *key)
   {
  struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
  unsigned long flags = (unsigned long)key;
  +   struct kvm_kernel_irq_routing_entry *irq;
   
  -   if (flags  POLLIN)
  +   if (flags  POLLIN) {
  +   rcu_read_lock();
  +   irq = irqfd-irq_entry;
 Why not rcu_dereference()? And why it can't be zero here?
 
  /* An event has been signaled, inject an interrupt */
  -   schedule_work(irqfd-inject);
  +   if (irq)
  +   kvm_set_msi(irq, irqfd-kvm, 
  KVM_USERSPACE_IRQ_SOURCE_ID, 1);
  +   else
  +  

Re: 2.6.37-rc2 after KVM shutdown - unregister_netdevice: waiting for vmtst01eth0 to become free. Usage count = 1

2010-11-18 Thread Eric Dumazet
Le jeudi 18 novembre 2010 à 07:28 +0100, Nikola Ciprich a écrit :
  Yep, this is a known problem, thanks !
  
  fix is there : 
  
  http://patchwork.ozlabs.org/patch/71354/
 Thanks Eric, this indeed fixes the problem..
 I noticed the fix didn't make it to 2.6.37-rc2-git3 though,
 maybe it just got omited?
 anyways, thanks for help!
 n.

Its in David Miller net-2.6 tree (all pending network patches for
current linux-2.6 version), so it'll be included next time David push
its tree to Linus, dont worry ;)



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 11:16:02AM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 11:05:22AM +0200, Gleb Natapov wrote:
  On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote:
   Store irq routing table pointer in the irqfd object,
   and use that to inject MSI directly without bouncing out to
   a kernel thread.
   
   While we touch this structure, rearrange irqfd fields to make fastpath
   better packed for better cache utilization.
   
   Some notes on the design:
   - Use pointer into the rt instead of copying an entry,
 to make it possible to use rcu, thus side-stepping
 locking complexities.  We also save some memory this way.
  What locking complexity is there with copying entry approach?
 
 Without RCU, we need two locks:
   - irqfd lock to scan the list of irqfds
   - eventfd wqh lock in the irqfd to update the entry
 To update all irqfds on list, wqh lock would be nested within irqfd lock.
   lock(kvm-irqfds.lock)
   list_for_each(irqfd, kvm-irqfds.list)
   lock(irqfd-wqh)
   update(irqfd)
   unlock(irqfd-wqh)
   unlock(kvm-irqfds.lock)
 Problem is, irqfd is nested within wqh for cleanup (POLLHUP) path.
 
 With RCU we do assign and let sync take care of flushing old entries out.
 
Make sense. What about other comments :)

   - Old workqueue code is still used for level irqs.
 I don't think we DTRT with level anyway, however,
 it seems easier to keep the code around as
 it has been thought through and debugged, and fix level later than
 rip out and re-instate it later.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
   
   The below is compile tested only.  Sending out for early
   flames/feedback.  Please review!
   
include/linux/kvm_host.h |4 ++
virt/kvm/eventfd.c   |   81 
   +++--
virt/kvm/irq_comm.c  |6 ++-
3 files changed, 78 insertions(+), 13 deletions(-)
   
   diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
   index a055742..b6f7047 100644
   --- a/include/linux/kvm_host.h
   +++ b/include/linux/kvm_host.h
   @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
   *ioapic,
unsigned long *deliver_bitmask);
#endif
int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
   +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct 
   kvm *kvm,
   + int irq_source_id, int level);
void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned 
   pin);
void kvm_register_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian);
   @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm 
   *kvm) {}
void kvm_eventfd_init(struct kvm *kvm);
int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
void kvm_irqfd_release(struct kvm *kvm);
   +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table 
   *irq_rt);
int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);

#else
   @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, 
   int gsi, int flags)
}

static inline void kvm_irqfd_release(struct kvm *kvm) {}
   +static inline void kvm_irqfd_update(struct kvm *kvm) {}
static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
   *args)
{
 return -ENOSYS;
   diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
   index c1f1e3c..49c1864 100644
   --- a/virt/kvm/eventfd.c
   +++ b/virt/kvm/eventfd.c
   @@ -44,14 +44,18 @@
 */

struct _irqfd {
   - struct kvm   *kvm;
   - struct eventfd_ctx   *eventfd;
   - int   gsi;
   - struct list_head  list;
   - poll_tablept;
   - wait_queue_t  wait;
   - struct work_structinject;
   - struct work_structshutdown;
   + /* Used for MSI fast-path */
   + struct kvm *kvm;
   + wait_queue_t wait;
   + struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
   + /* Used for level IRQ fast-path */
   + int gsi;
   + struct work_struct inject;
   + /* Used for setup/shutdown */
   + struct eventfd_ctx *eventfd;
   + struct list_head list;
   + poll_table pt;
   + struct work_struct shutdown;
};

static struct workqueue_struct *irqfd_cleanup_wq;
   @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 
   sync, void *key)
{
 struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
 unsigned long flags = (unsigned long)key;
   + struct kvm_kernel_irq_routing_entry *irq;

   - if (flags  POLLIN)
   + if (flags  POLLIN) {
   + rcu_read_lock();
   + irq = irqfd-irq_entry;
  Why not rcu_dereference()? And why it can't be zero here?
  
 /* An event has been signaled, inject an interrupt */
   - 

Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-18 Thread Avi Kivity

On 11/18/2010 03:58 AM, Sheng Yang wrote:

On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
  On 11/15/2010 11:15 AM, Sheng Yang wrote:
This patch enable per-vector mask for assigned devices using MSI-X.
  
This patch provided two new APIs: one is for guest to specific device's
MSI-X table address in MMIO, the other is for userspace to get
information about mask bit.
  
All the mask bit operation are kept in kernel, in order to accelerate.
Userspace shouldn't access the device MMIO directly for the information,
instead it should uses provided API to do so.
  
Signed-off-by: Sheng Yangsh...@linux.intel.com
---
  
  arch/x86/kvm/x86.c   |1 +
  include/linux/kvm.h  |   32 +
  include/linux/kvm_host.h |5 +
  virt/kvm/assigned-dev.c  |  318
  +- 4 files changed,
355
  insertions(+), 1 deletions(-)

  Documentation?

For we are keeping changing the API for last several versions, I'd like to 
settle
down the API first. Would bring back the document after API was agreed.


Maybe for APIs we should start with only the documentation patch, agree 
on that, and move on to the implementation.



  What if it's a 64-bit write on a 32-bit host?

In fact we haven't support QWORD(64bit) accessing now. The reason is we haven't
seen any OS is using it in this way now, so I think we can leave it later.

Also seems QEmu doesn't got the way to handle 64bit MMIO.


There's a difference, if the API doesn't support it, we can't add it 
later without changing both kernel and userspace.




  That's not very good.  We should do the entire thing in the kernel or in
  userspace.  We can have a new EXIT_REASON to let userspace know an msix
  entry changed, and it should read it from the kernel.

If you look it in this way:
1. Mask bit owned by kernel.
2. Routing owned by userspace.
3. Read the routing in kernel is an speed up for normal operation - because 
kernel
can read from them.

So I think the logic here is clear to understand.


Still, it's complicated and the state is split across multiple components.


But if we can modify the routing in kernel, it would be raise some sync issues 
due
to both kernel and userspace own routing. So maybe the better solution is move 
the
routing to kernel.


That may work, but I don't think we can do this for vfio.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-18 Thread Avi Kivity

On 11/18/2010 04:22 AM, Sheng Yang wrote:

On Wednesday 17 November 2010 22:01:41 Avi Kivity wrote:
  On 11/15/2010 11:15 AM, Sheng Yang wrote:
We need to query the entry later.
  
+int kvm_get_irq_routing_entry(struct kvm *kvm, int gsi,
+   struct kvm_kernel_irq_routing_entry *entry)
+{
+   int count = 0;
+   struct kvm_kernel_irq_routing_entry *ei = NULL;
+   struct kvm_irq_routing_table *irq_rt;
+   struct hlist_node *n;
+
+   rcu_read_lock();
+   irq_rt = rcu_dereference(kvm-irq_routing);
+   if (gsi   irq_rt-nr_rt_entries)
+   hlist_for_each_entry(ei, n,irq_rt-map[gsi], link)
+   count++;
+   if (count == 1)
+   *entry = *ei;
+   rcu_read_unlock();
+
+   return (count != 1);
+}
+

  Not good form to rely on ei being valid after the loop.

  I guess this is only useful for msi?  Need to document it.

May can be used for others later, it's somehow generic. Where should I document
it?


Non-msi interrupts (wires) can be wired to more than one interrupt line 
(and often are - pic/ioapic).


You can document it by adding _msi to the name.



  *entry may be stale after rcu_read_unlock().  Is this a problem?

I suppose not. All MSI-X MMIO accessing would be executed without delay, so no 
re-
order issue would happen. If the guest is reading and writing the field at the 
same
time(from two cpus), it should got some kinds of sync method for itself - or it
may not care what's the reading result(like the one after msix_mask_irq()).


I guess so.  Michael/Alex?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/6] KVM: MMU: fix forgot flush vcpu tlbs

2010-11-18 Thread Avi Kivity

On 11/18/2010 09:17 AM, Xiao Guangrong wrote:

On 11/18/2010 01:36 AM, Marcelo Tosatti wrote:

  I don't think we need to flush immediately; set a tlb dirty bit
  somewhere that is cleareded when we flush the tlb.
  kvm_mmu_notifier_invalidate_page() can consult the bit and force a
  flush if set.

  Yep.


Great, i'll do it in the v3.

Do we need a simple bug fix patch(which immediately flush tlbs) for
backport first?


Oh yes.  Simple fix first, clever ideas later (which will likely need to 
be fixed anyway).



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 11:05:22AM +0200, Gleb Natapov wrote:
 On Thu, Nov 18, 2010 at 12:12:54AM +0200, Michael S. Tsirkin wrote:
  Store irq routing table pointer in the irqfd object,
  and use that to inject MSI directly without bouncing out to
  a kernel thread.
  
  While we touch this structure, rearrange irqfd fields to make fastpath
  better packed for better cache utilization.
  
  Some notes on the design:
  - Use pointer into the rt instead of copying an entry,
to make it possible to use rcu, thus side-stepping
locking complexities.  We also save some memory this way.
 What locking complexity is there with copying entry approach?
 
  - Old workqueue code is still used for level irqs.
I don't think we DTRT with level anyway, however,
it seems easier to keep the code around as
it has been thought through and debugged, and fix level later than
rip out and re-instate it later.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  The below is compile tested only.  Sending out for early
  flames/feedback.  Please review!
  
   include/linux/kvm_host.h |4 ++
   virt/kvm/eventfd.c   |   81 
  +++--
   virt/kvm/irq_comm.c  |6 ++-
   3 files changed, 78 insertions(+), 13 deletions(-)
  
  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index a055742..b6f7047 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -462,6 +462,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
  *ioapic,
 unsigned long *deliver_bitmask);
   #endif
   int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
  +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
  *kvm,
  +   int irq_source_id, int level);
   void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
   void kvm_register_irq_ack_notifier(struct kvm *kvm,
 struct kvm_irq_ack_notifier *kian);
  @@ -603,6 +605,7 @@ static inline void kvm_free_irq_routing(struct kvm 
  *kvm) {}
   void kvm_eventfd_init(struct kvm *kvm);
   int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
   void kvm_irqfd_release(struct kvm *kvm);
  +void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table 
  *irq_rt);
   int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
   
   #else
  @@ -614,6 +617,7 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, 
  int gsi, int flags)
   }
   
   static inline void kvm_irqfd_release(struct kvm *kvm) {}
  +static inline void kvm_irqfd_update(struct kvm *kvm) {}
   static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
  *args)
   {
  return -ENOSYS;
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index c1f1e3c..49c1864 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -44,14 +44,18 @@
*/
   
   struct _irqfd {
  -   struct kvm   *kvm;
  -   struct eventfd_ctx   *eventfd;
  -   int   gsi;
  -   struct list_head  list;
  -   poll_tablept;
  -   wait_queue_t  wait;
  -   struct work_structinject;
  -   struct work_structshutdown;
  +   /* Used for MSI fast-path */
  +   struct kvm *kvm;
  +   wait_queue_t wait;
  +   struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
  +   /* Used for level IRQ fast-path */
  +   int gsi;
  +   struct work_struct inject;
  +   /* Used for setup/shutdown */
  +   struct eventfd_ctx *eventfd;
  +   struct list_head list;
  +   poll_table pt;
  +   struct work_struct shutdown;
   };
   
   static struct workqueue_struct *irqfd_cleanup_wq;
  @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 
  sync, void *key)
   {
  struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
  unsigned long flags = (unsigned long)key;
  +   struct kvm_kernel_irq_routing_entry *irq;
   
  -   if (flags  POLLIN)
  +   if (flags  POLLIN) {
  +   rcu_read_lock();
  +   irq = irqfd-irq_entry;
 Why not rcu_dereference()?

Of course. Good catch, thanks.

 And why it can't be zero here?

It can, I check below.

  /* An event has been signaled, inject an interrupt */
  -   schedule_work(irqfd-inject);
  +   if (irq)
  +   kvm_set_msi(irq, irqfd-kvm, 
  KVM_USERSPACE_IRQ_SOURCE_ID, 1);
  +   else
  +   schedule_work(irqfd-inject);
  +   rcu_read_unlock();
  +   }
   
  if (flags  POLLHUP) {
  /* The eventfd is closing, detach from KVM */
  @@ -166,6 +178,7 @@ irqfd_ptable_queue_proc(struct file *file, 
  wait_queue_head_t *wqh,
   static int
   kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
   {
  +   struct kvm_irq_routing_table *irq_rt;
  struct _irqfd *irqfd, *tmp;
  struct file *file = NULL;
  struct eventfd_ctx *eventfd = NULL;
  @@ -215,6 +228,10 @@ 

Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 11:28:02AM +0200, Avi Kivity wrote:
 On 11/18/2010 03:58 AM, Sheng Yang wrote:
 On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
   On 11/15/2010 11:15 AM, Sheng Yang wrote:
 This patch enable per-vector mask for assigned devices using MSI-X.
   
 This patch provided two new APIs: one is for guest to specific device's
 MSI-X table address in MMIO, the other is for userspace to get
 information about mask bit.
   
 All the mask bit operation are kept in kernel, in order to accelerate.
 Userspace shouldn't access the device MMIO directly for the 
  information,
 instead it should uses provided API to do so.
   
 Signed-off-by: Sheng Yangsh...@linux.intel.com
 ---
   
   arch/x86/kvm/x86.c   |1 +
   include/linux/kvm.h  |   32 +
   include/linux/kvm_host.h |5 +
   virt/kvm/assigned-dev.c  |  318
   +- 4 files changed,
 355
   insertions(+), 1 deletions(-)
 
   Documentation?
 
 For we are keeping changing the API for last several versions, I'd like to 
 settle
 down the API first. Would bring back the document after API was agreed.
 
 Maybe for APIs we should start with only the documentation patch,
 agree on that, and move on to the implementation.
 
   What if it's a 64-bit write on a 32-bit host?
 
 In fact we haven't support QWORD(64bit) accessing now. The reason is we 
 haven't
 seen any OS is using it in this way now, so I think we can leave it later.
 
 Also seems QEmu doesn't got the way to handle 64bit MMIO.
 
 There's a difference, if the API doesn't support it, we can't add it
 later without changing both kernel and userspace.
 
 
   That's not very good.  We should do the entire thing in the kernel or in
   userspace.  We can have a new EXIT_REASON to let userspace know an msix
   entry changed, and it should read it from the kernel.
 
 If you look it in this way:
 1. Mask bit owned by kernel.
 2. Routing owned by userspace.
 3. Read the routing in kernel is an speed up for normal operation - because 
 kernel
 can read from them.
 
 So I think the logic here is clear to understand.
 
 Still, it's complicated and the state is split across multiple components.
 
 But if we can modify the routing in kernel, it would be raise some sync 
 issues due
 to both kernel and userspace own routing. So maybe the better solution is 
 move the
 routing to kernel.
 
 That may work, but I don't think we can do this for vfio.

Actually, if done right it might work for VFIO: we would need
2 eventfds to notify it that it has to mask/unmask entries.
The interface would need to be careful to keep programming of the guest
side emulation and the masking in the backend device completely
separate.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
 
   *entry may be stale after rcu_read_unlock().  Is this a problem?
 
 I suppose not. All MSI-X MMIO accessing would be executed without delay, so 
 no re-
 order issue would happen. If the guest is reading and writing the field at 
 the same
 time(from two cpus), it should got some kinds of sync method for itself - or 
 it
 may not care what's the reading result(like the one after msix_mask_irq()).
 
 I guess so.  Michael/Alex?

This is kvm_get_irq_routing_entry which is used for table reads,
correct?  Actually, the pci read *is* the sync method that guests use,
they rely on reads to flush out all previous writes.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-18 Thread Michael S. Tsirkin
On Wed, Nov 17, 2010 at 09:29:22AM +0800, Sheng Yang wrote:
   +#define KVM_MSIX_TYPE_ASSIGNED_DEV   1
   +
   +#define KVM_MSIX_FLAG_MASKBIT(1  0)
   +#define KVM_MSIX_FLAG_QUERY_MASKBIT  (1  0)
   +
   +struct kvm_msix_entry {
   + __u32 id;
   + __u32 type;
  
  Is type really necessary? Will it ever differ from
  KVM_MSIX_TYPE_ASSIGNED_DEV?
 
 This is the suggestion from Michael. He want it to be reused by emulated/pv 
 devices. So I add the type field here.

Maybe id field can be reused for this somehow.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Avi Kivity

On 11/18/2010 12:12 AM, Michael S. Tsirkin wrote:

Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.

While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.

Some notes on the design:
- Use pointer into the rt instead of copying an entry,
   to make it possible to use rcu, thus side-stepping
   locking complexities.  We also save some memory this way.
- Old workqueue code is still used for level irqs.
   I don't think we DTRT with level anyway, however,
   it seems easier to keep the code around as
   it has been thought through and debugged, and fix level later than
   rip out and re-instate it later.





@@ -166,6 +178,7 @@ irqfd_ptable_queue_proc(struct file *file, 
wait_queue_head_t *wqh,
  static int
  kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
  {
+   struct kvm_irq_routing_table *irq_rt;
struct _irqfd *irqfd, *tmp;
struct file *file = NULL;
struct eventfd_ctx *eventfd = NULL;
@@ -215,6 +228,10 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
goto fail;
}

+   rcu_read_lock();
+   irqfd_update(kvm, irqfd, rcu_dereference(kvm-irq_routing));
+   rcu_read_unlock();


Wow, complicated.  rcu_read_lock() protects kvm-irq_routing, while 
we're in the update side of rcu-protected irqfd-irq_entry.


A comment please.

The rest looks good, it's nice we finally got the irq injection path so 
streamlined.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Anthony Liguori

On 11/01/2010 10:14 AM, Alex Williamson wrote:

Register the actual VM RAM using the new API

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

  hw/pc.c |   12 ++--
  1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..0ea6d10 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
  /* allocate RAM */
  ram_addr = qemu_ram_alloc(NULL, pc.ram,
below_4g_mem_size + above_4g_mem_size);
-cpu_register_physical_memory(0, 0xa, ram_addr);
-cpu_register_physical_memory(0x10,
- below_4g_mem_size - 0x10,
- ram_addr + 0x10);
+
+qemu_ram_register(0, 0xa, ram_addr);
+qemu_ram_register(0x10, below_4g_mem_size - 0x10,
+  ram_addr + 0x10);
  #if TARGET_PHYS_ADDR_BITS  32
  if (above_4g_mem_size  0) {
-cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
- ram_addr + below_4g_mem_size);
+qemu_ram_register(0x1ULL, above_4g_mem_size,
+  ram_addr + below_4g_mem_size);
  }
   


Take a look at the memory shadowing in the i440fx.  The regions of 
memory in the BIOS area can temporarily become RAM.


That's because there is normally RAM backing this space but the memory 
controller redirects writes to the ROM space.


Not sure the best way to handle this, but the basic concept is, RAM 
always exists but if a device tries to access it, it may or may not be 
accessible as RAM at any given point in time.


Regards,

Anthony Liguori


  #endif




   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 11:34:26AM +0200, Michael S. Tsirkin wrote:
   @@ -125,10 +129,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 
   sync, void *key)
{
 struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
 unsigned long flags = (unsigned long)key;
   + struct kvm_kernel_irq_routing_entry *irq;

   - if (flags  POLLIN)
   + if (flags  POLLIN) {
   + rcu_read_lock();
   + irq = irqfd-irq_entry;
  Why not rcu_dereference()?
 
 Of course. Good catch, thanks.
 
  And why it can't be zero here?
 
 It can, I check below.
 
Yeah, missed that. Thanks.

 /* An event has been signaled, inject an interrupt */
   - schedule_work(irqfd-inject);
   + if (irq)
   + kvm_set_msi(irq, irqfd-kvm, 
   KVM_USERSPACE_IRQ_SOURCE_ID, 1);
   + else
   + schedule_work(irqfd-inject);
   + rcu_read_unlock();
   + }


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Gleb Natapov
On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote:
 2010/11/16 Gleb Natapov g...@redhat.com:
  On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
   Perhaps the FW path should use device class names if no name is 
   specified.
   What do you mean by device class name. We can do something like this:
   if (dev-child_bus.lh_first)
          return dev-child_bus.lh_first-info-name;
  
   i.e if there is child bus use its bus name as fw name. This will make
   all pci devices to have pci as fw name automatically. The problem is
   that theoretically same device can provide different buses.
 
  I meant PCI class name, like display for display controllers,
  network for NICs etc.
 
  That is what my pci bus related patch is doing already.
 
   I'll try Sparc32 to see how this fits there.
 
  Except bootindex is not implemented for SCSI.
  Will look into adding it.
 
 Thanks. The bootindex on Sparc32 looks like this:
 bootindex /e...@7880/d...@1,0
 /ether...@/ethernet-...@0
 
For arches other then x86 there is a lot of work left to be done :)
For starter exotic sparc buses should get their own get_fw_dev_path()
implementation.

 I don't think I got Lance setup right.
 
 OF paths for the devices would be:
 /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0
If qdev hierarchy does not correspond to real HW there is no much we can
do expect for fixing qdev.

 
 The logic for ESP is that ESP (registers at 0x7880, slot offset
 0x88) is handled by the DMA controller (registers at 0x7840,
 slot offset 0x84), they are in a SBus slot #5, and SBus (registers
 at 0x10001000) is in turn handled by IOMMU (registers at 0x1000).
 Lance should be handled the same way.
 
 This hierarchy is partly known by QEMU because DMA accesses use this
 flow, but not otherwise. There is no concept of SBus slots, DMA talks
 to IOMMU directly. Though in this case both ESP, Lance and their DMA
 controllers are on board devices in a MACIO chip. It may be possible
 to add the hierarchy information at each stage.
 
 It should also be possible for BIOS to determine the device just from
 the physical address if we ignored OF compatibility.
It would be nice to be OF compatible at least at some level. Of course OF
spec is not strict enough to have two different implementations produce
exactly same device path that can be compared by strcpy.  Can we apply
the series now? At least for x86 it provides useful paths and work can
be continue for other arches by interested parties.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-1841658 ] OpenSolaris 64bit panic with kvm-54

2010-11-18 Thread SourceForge.net
Bugs item #1841658, was opened at 2007-11-30 13:11
Message generated for change (Comment added) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1841658group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Works For Me
Priority: 3
Private: No
Submitted By: Carlo Marcelo Arenas Belon (carenas)
Assigned to: Nobody/Anonymous (nobody)
Summary: OpenSolaris 64bit panic with kvm-54

Initial Comment:
Wouldn't mark it as a regression per-se as vanilla kvm-53 wouldn't work 
(because of the need for IDE patches to get it to run/install), but vanilla 
kvm-54 or kvm-54 + the same patches added to kvm-53 and including pre-kvm-55 
patches like 71be592a14aa8d127315b2c47bf83cc0d810a341 wouldn't work.

The panic is observed in kvm-54 (--no-kvm runs ok, and --no-kvm-irqchip doesn't 
help) while running nexenta OpenSolaris alpha 7 or beta 1 (other OpenSolaris 
distributions most likely affected as well) and with the following trace :

panic[cpu0]/thread=fffec2de2260: BAD TRAP: type=e (#pf Page fault) 
rp=ff0001735f30 addr=0 occurred in module unix due to a NULL pointer 
dereference

dbus: #pf Page fault
Bad kernel fault at addr=0x0
pid=278, pc=0xfb83c189, sp=0xff0001736028, eflags=0x10246
cr0: 80050033pg,wp,ne,et,mp,pe cr4: 6b8xmme,fxsr,pge,pae,pse,de
cr2: 0 cr3: 7dc4000 cr8: 0
rdi:0 rsi: fffec0025630 rdx: fffec2de2260
rcx:1  r8: fffec0025630  r9:3
rax:0 rbx:0 rbp: ff0001736080
r10:1 r11: fffec1ad31e0 r12:0
r13: fffec0025680 r14: c0025488 r15:0
fsb:0 gsb: fbc26ef0  ds:   4b
 es:   4b  fs:0  gs:  1c3
trp:e err:0 rip: fb83c189
 cs:   30 rfl:10246 rsp: ff0001736028
 ss:   38

ff0001735e10 unix:die+c8 ()
ff0001735f20 unix:trap+135b ()
ff0001735f30 unix:cmntrap+e9 ()
ff0001736080 unix:mutex_exit+9 ()
ff00017360c0 genunix:kmem_alloc+88 ()
ff0001736110 zfs:zio_push_transform+3a ()
ff0001736190 zfs:zio_create+256 ()
ff0001736240 zfs:zio_vdev_child_io+97 ()
ff0001736320 zfs:vdev_cache_read+182 ()
ff0001736370 zfs:vdev_disk_io_start+41 ()
ff0001736390 zfs:vdev_io_start+1d ()
ff00017363d0 zfs:zio_vdev_io_start+123 ()
ff00017363f0 zfs:zio_next_stage_async+bb ()
ff0001736410 zfs:zio_nowait+11 ()
ff0001736450 zfs:vdev_mirror_io_start+18f ()
ff0001736490 zfs:zio_vdev_io_start+131 ()
ff00017364b0 zfs:zio_next_stage+b3 ()
ff00017364e0 zfs:zio_ready+10e ()
ff0001736500 zfs:zio_next_stage+b3 ()
ff0001736550 zfs:zio_wait_for_children+5d ()
ff0001736570 zfs:zio_wait_children_ready+20 ()
ff0001736590 zfs:zio_next_stage_async+bb ()
ff00017365b0 zfs:zio_nowait+11 ()
ff0001736660 zfs:arc_read+4e8 ()
ff0001736700 zfs:dbuf_read_impl+129 ()
ff0001736760 zfs:dbuf_read+c5 ()
ff0001736810 zfs:dmu_buf_hold_array_by_dnode+1c4 ()
ff00017368a0 zfs:dmu_buf_hold_array+74 ()
ff0001736930 zfs:dmu_read_uio+4d ()
ff00017369c0 zfs:zfs_read+15e ()
ff0001736a30 genunix:fop_read+69 ()
ff0001736af0 genunix:vn_rdwr+161 ()
ff0001736c70 genunix:gexec+11c ()
ff0001736e90 genunix:exec_common+41d ()
ff0001736ec0 genunix:exece+1b ()
ff0001736f10 unix:brand_sys_sysenter+1f2 ()

while running in a Gentoo Linux 2007.0 host with Intel(R) Core(TM)2 CPU 6320.

32bit OpenSolaris works fine

--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-11-18 11:23

Message:
Works for me per previous comments. If you see this again, please open a
new bug in Launchpad.

Thanks,
Jes


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-08-19 12:09

Message:
Hi,

I didn't see any replies to my question as of June 21, are you still
seeing this? I wasn't able 
to reproduce this in my testing.

Regards,
Jes


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-21 14:55

Message:
Hi,

I pulled down the iso image you mentioned, and it seems to boot fine for
me here. I was able to run the install to a local disk image and boot it
again afterwards. This is using a 64 bit guest CPU on a Fedora 12 system.

What flags are you using to launch it when you see the crash? Are you
running on an Intel or an AMD system and did you specify SMP by any
chance?

Cheers,
Jes



Re: [PATCH] ceph/rbd block driver for qemu-kvm (v8)

2010-11-18 Thread Stefan Hajnoczi
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HAL type for Win2003 Server on recent KVM versions?

2010-11-18 Thread Avi Kivity

On 11/18/2010 12:58 AM, Kenni Lund wrote:

Hi

I'm about to move a couple of virtual machines from a Fedora 11 system
to a new server with a more recent operating system and newer version
of KVM, etc.

One of the guests is a Windows Server 2003 Standard SP2, which is
currently running with the ACPI Multiprocessor PC HAL.

Considering moving to RHEL, I've been reading the virtualization
documentation for RHEL 6.0, which says that I need to set HAL to
Standard PC when installing a new Win2003 guest.

Since my current guest has been running perfectly fine for a long time
with its current HAL, I was wondering if the system will become
unstable, unbootable or what the disadvantage will be, if I move the
guest to for example RHEL 6.0, without reinstalling or upgrading the
guest to select another HAL mode?

On the other hand, it seems like I can upgrade from the current
ACPI Multiprocessor PC into Standard PC, but I'm not sure if I'll
gain anything by trying this.



I suggest using the default HAL, whatever it is.  That's what everyone 
else is using so you get the best tested configuration.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin

So the following on top will fix it all.
Any more comments befpre I bundle it up,
test and report?

kvm: fix up msi fastpath

This will be folded into the msi fastpath patch.
Changes:
- simplify irq_entry/irq_routing update rules:
  simply to it all under irqfds.lock
- document locking for rcu update side
- rcu_dereference for rcu pointer access

Still compile-tested only.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b6f7047..d13ced3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -16,6 +16,7 @@
 #include linux/mm.h
 #include linux/preempt.h
 #include linux/msi.h
+#include linux/rcupdate.h
 #include asm/signal.h
 
 #include linux/kvm.h
@@ -206,6 +207,8 @@ struct kvm {
 
struct mutex irq_lock;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
+   /* Update side is protected by irq_lock and,
+* if configured, irqfds.lock. */
struct kvm_irq_routing_table __rcu *irq_routing;
struct hlist_head mask_notifier_list;
struct hlist_head irq_ack_notifier_list;
@@ -605,7 +608,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 void kvm_eventfd_init(struct kvm *kvm);
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
 void kvm_irqfd_release(struct kvm *kvm);
-void kvm_irqfd_update(struct kvm *kvm, struct kvm_irq_routing_table *irq_rt);
+void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
 
 #else
@@ -617,7 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int 
gsi, int flags)
 }
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
-static inline void kvm_irqfd_update(struct kvm *kvm) {}
+static inline void kvm_irq_routing_update(struct kvm *kvm,
+ struct kvm_irq_routing_table *irq_rt)
+{
+   rcu_assign_pointer(kvm-irq_routing, irq_rt);
+}
+
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
return -ENOSYS;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 49c1864..b0cfae7 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -47,6 +47,7 @@ struct _irqfd {
/* Used for MSI fast-path */
struct kvm *kvm;
wait_queue_t wait;
+   /* Update side is protected by irqfds.lock */
struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
/* Used for level IRQ fast-path */
int gsi;
@@ -133,7 +134,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
 
if (flags  POLLIN) {
rcu_read_lock();
-   irq = irqfd-irq_entry;
+   irq = rcu_dereference(irqfd-irq_entry);
/* An event has been signaled, inject an interrupt */
if (irq)
kvm_set_msi(irq, irqfd-kvm, 
KVM_USERSPACE_IRQ_SOURCE_ID, 1);
@@ -175,6 +176,27 @@ irqfd_ptable_queue_proc(struct file *file, 
wait_queue_head_t *wqh,
add_wait_queue(wqh, irqfd-wait);
 }
 
+/* Must be called under irqfds.lock */
+static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd,
+struct kvm_irq_routing_table *irq_rt)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   struct hlist_node *n;
+
+   if (irqfd-gsi = irq_rt-nr_rt_entries) {
+   rcu_assign_pointer(irqfd-irq_entry, NULL);
+   return;
+   }
+
+   hlist_for_each_entry(e, n, irq_rt-map[irqfd-gsi], link) {
+   /* Only fast-path MSI. */
+   if (e-type == KVM_IRQ_ROUTING_MSI)
+   rcu_assign_pointer(irqfd-irq_entry, e);
+   else
+   rcu_assign_pointer(irqfd-irq_entry, NULL);
+   }
+}
+
 static int
 kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
 {
@@ -228,9 +250,9 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
goto fail;
}
 
-   rcu_read_lock();
-   irqfd_update(kvm, irqfd, rcu_dereference(kvm-irq_routing));
-   rcu_read_unlock();
+   irq_rt = rcu_dereference_protected(kvm-irq_routing,
+  lockdep_is_held(kvm-irqfds.lock));
+   irqfd_update(kvm, irqfd, irq_rt);
 
events = file-f_op-poll(file, irqfd-pt);
 
@@ -345,35 +367,17 @@ kvm_irqfd_release(struct kvm *kvm)
 
 }
 
-/* Must be called under irqfds.lock */
-static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd,
-struct kvm_irq_routing_table *irq_rt)
-{
-   struct kvm_kernel_irq_routing_entry *e;
-   struct hlist_node *n;
-
-   if (irqfd-gsi = irq_rt-nr_rt_entries) {
-   rcu_assign_pointer(irqfd-irq_entry, NULL);
-   return;
-   }
-
-   hlist_for_each_entry(e, n, irq_rt-map[irqfd-gsi], link) {
-   /* Only fast-path MSI. */
-   if (e-type == KVM_IRQ_ROUTING_MSI)
-   rcu_assign_pointer(irqfd-irq_entry, e);

[ kvm-Bugs-1998355 ] IO Performance

2010-11-18 Thread SourceForge.net
Bugs item #1998355, was opened at 2008-06-20 02:11
Message generated for change (Comment added) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1998355group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Out of Date
Priority: 5
Private: No
Submitted By: Joshua Rosen (bjrosen)
Assigned to: Nobody/Anonymous (nobody)
Summary: IO Performance 

Initial Comment:
Is there any way of mapping a host's directory into a KVM VM similar to 
VMware's Shared Folder feature?

I've been benchmarking the performance of NCVerilog under various VMs. The 
performance of KVM when using a virtual disk is excellent, in fact it's better 
than VMware Server or VMware Workstation, however if you use
an NFS mounted host directory the performance is unspeakably awful. An NFS 
mounted directory under VMware Server 2.0 (Beta 2) is also slow but it's still 
significantly better than KVM. Using a Shared Folder with VMware Workstation 
eliminates the IO bottleneck, the performance there is about the same as 
accessing a virtual disk.

The system that I did these benchmarks on is a 3GHz Core2 with 8G of RAM. 
VMware was running under CentOS5.1 with a 2.6.23.7 kernel. KVM is running on 
Fedora 9 with a 2.6.25.xx kernel. The Verilog simulation times for my test 
suite are as follows,

Native  06:34
VM Server 2, virtual disk   08:05
VM Server 2, NFS18:37
VM Workstation, shared folder   08:14
KVM, Virtual disk   07:42
KVM, NFS38:36


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-11-18 11:16

Message:
Per previous comment, bug is out of date.

There are a number of solutions in upstream and better performance. If you
still feel this is an issue, please feel free to open a new bug in
Launchpad.

Thanks,
Jes


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-08-19 12:18

Message:
Hi,

Did this get resolved? If so would you mind closing this bug. If I don't
hear back, I
will assume it is fixed and close it at some point :)

Thanks,
Jes


--

Comment By: Joshua Rosen (bjrosen)
Date: 2008-06-22 15:54

Message:
Logged In: YES 
user_id=39829
Originator: YES

I've tried the following command line, it brings up the VM but I can't
configure the NIC, there is an error message about vlan 0.

qemu-kvm -M pc -m 512 -smp 1 -monitor pty -net
nic,macaddr=a0:1e:37:84:b1:da,model=virtio  -boot c -hda /home/xen/panther
Warning: vlan 0 is not connected to host network

I've also tried the following, but the qemu-ifup command is missing an
argument

qemu-kvm -M pc -m 512 -smp 1 -monitor pty -net
nic,macaddr=a0:1e:37:84:b1:da,model=virtio -net
tap,script=/etc/xen/qemu-ifup -boot c -hda /home/xen/panther
config qemu network with xen bridge for  tap0
Incorrect number of arguments for command
Usage: brctl addif bridge deviceadd interface to bridge
char device redirected to /dev/pts/3

--

Comment By: Dor Laor (thekozmo)
Date: 2008-06-22 08:08

Message:
Logged In: YES 
user_id=2124464
Originator: NO

Example cmdline:
./qemu/x86_64-softmmu/qemu-system-x86_64 -boot c -drive
file=/images/xpbase.qcow2,if=ide,cache=on,format=qcow2,boot=on -m 384  -net
nic,macaddr=a0:1e:37:84:b1:da,model=virtio -net
tap,script=/etc/kvm/qemu-ifup -snapshot

btw: you can use 'ps' to discover libvirt cmdline

--

Comment By: Joshua Rosen (bjrosen)
Date: 2008-06-22 03:18

Message:
Logged In: YES 
user_id=39829
Originator: YES

How do I use virtio-net instead of virtio-blk?

I've been launching the VM using virt-manager which has no options. KVM
doesn't have a MAN page so I have no idea about how to launch the VM using
the CLI. Would you please give me specific step by step instructions.

Thanks,


--

Comment By: Joshua Rosen (bjrosen)
Date: 2008-06-21 16:28

Message:
Logged In: YES 
user_id=39829
Originator: YES

How do I use virtio-net instead of virtio-blk?

I've been launching the VM using virt-manager which has no options. KVM
doesn't have a MAN page so I have no idea about how to launch the VM using
the CLI. Would you please give me specific step by step instructions.

Thanks,


--

Comment By: Dor Laor (thekozmo)
Date: 2008-06-21 15:45

Message:
Logged In: YES 
user_id=2124464
Originator: NO

If you don't boot from virtio-blk there is no need for this
configuration.
Just use virtio-net in 

Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Avi Kivity

On 11/18/2010 12:57 PM, Michael S. Tsirkin wrote:

So the following on top will fix it all.
Any more comments befpre I bundle it up,
test and report?



Nope (not that I can comment on an incremental).

I guess I should create an empty Documentation/kvm/locking.txt and force 
everyone else to update it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 01:03:44PM +0200, Avi Kivity wrote:
 On 11/18/2010 12:57 PM, Michael S. Tsirkin wrote:
 So the following on top will fix it all.
 Any more comments befpre I bundle it up,
 test and report?
 
 
 Nope (not that I can comment on an incremental).

Here it is rolled up.

 I guess I should create an empty Documentation/kvm/locking.txt and
 force everyone else to update it.

Comments near the relevant fields not better?

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a055742..d13ced3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -16,6 +16,7 @@
 #include linux/mm.h
 #include linux/preempt.h
 #include linux/msi.h
+#include linux/rcupdate.h
 #include asm/signal.h
 
 #include linux/kvm.h
@@ -206,6 +207,8 @@ struct kvm {
 
struct mutex irq_lock;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
+   /* Update side is protected by irq_lock and,
+* if configured, irqfds.lock. */
struct kvm_irq_routing_table __rcu *irq_routing;
struct hlist_head mask_notifier_list;
struct hlist_head irq_ack_notifier_list;
@@ -462,6 +465,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
+   int irq_source_id, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
@@ -603,6 +608,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 void kvm_eventfd_init(struct kvm *kvm);
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
 void kvm_irqfd_release(struct kvm *kvm);
+void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
 
 #else
@@ -614,6 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int 
gsi, int flags)
 }
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
+static inline void kvm_irq_routing_update(struct kvm *kvm,
+ struct kvm_irq_routing_table *irq_rt)
+{
+   rcu_assign_pointer(kvm-irq_routing, irq_rt);
+}
+
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
return -ENOSYS;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c1f1e3c..b0cfae7 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -44,14 +44,19 @@
  */
 
 struct _irqfd {
-   struct kvm   *kvm;
-   struct eventfd_ctx   *eventfd;
-   int   gsi;
-   struct list_head  list;
-   poll_tablept;
-   wait_queue_t  wait;
-   struct work_structinject;
-   struct work_structshutdown;
+   /* Used for MSI fast-path */
+   struct kvm *kvm;
+   wait_queue_t wait;
+   /* Update side is protected by irqfds.lock */
+   struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
+   /* Used for level IRQ fast-path */
+   int gsi;
+   struct work_struct inject;
+   /* Used for setup/shutdown */
+   struct eventfd_ctx *eventfd;
+   struct list_head list;
+   poll_table pt;
+   struct work_struct shutdown;
 };
 
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -125,10 +130,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
 {
struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
unsigned long flags = (unsigned long)key;
+   struct kvm_kernel_irq_routing_entry *irq;
 
-   if (flags  POLLIN)
+   if (flags  POLLIN) {
+   rcu_read_lock();
+   irq = rcu_dereference(irqfd-irq_entry);
/* An event has been signaled, inject an interrupt */
-   schedule_work(irqfd-inject);
+   if (irq)
+   kvm_set_msi(irq, irqfd-kvm, 
KVM_USERSPACE_IRQ_SOURCE_ID, 1);
+   else
+   schedule_work(irqfd-inject);
+   rcu_read_unlock();
+   }
 
if (flags  POLLHUP) {
/* The eventfd is closing, detach from KVM */
@@ -163,9 +176,31 @@ irqfd_ptable_queue_proc(struct file *file, 
wait_queue_head_t *wqh,
add_wait_queue(wqh, irqfd-wait);
 }
 
+/* Must be called under irqfds.lock */
+static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd,
+struct kvm_irq_routing_table *irq_rt)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   struct hlist_node *n;
+
+   if (irqfd-gsi = irq_rt-nr_rt_entries) {
+   rcu_assign_pointer(irqfd-irq_entry, NULL);
+   return;
+   }
+
+   hlist_for_each_entry(e, n, irq_rt-map[irqfd-gsi], link) {
+   /* 

Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote:
 On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote:
  2010/11/16 Gleb Natapov g...@redhat.com:
   On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
Perhaps the FW path should use device class names if no name is 
specified.
What do you mean by device class name. We can do something like this:
if (dev-child_bus.lh_first)
       return dev-child_bus.lh_first-info-name;
   
i.e if there is child bus use its bus name as fw name. This will make
all pci devices to have pci as fw name automatically. The problem is
that theoretically same device can provide different buses.
  
   I meant PCI class name, like display for display controllers,
   network for NICs etc.
  
   That is what my pci bus related patch is doing already.
  
I'll try Sparc32 to see how this fits there.
  
   Except bootindex is not implemented for SCSI.
   Will look into adding it.
  
  Thanks. The bootindex on Sparc32 looks like this:
  bootindex /e...@7880/d...@1,0
  /ether...@/ethernet-...@0
  
 For arches other then x86 there is a lot of work left to be done :)
 For starter exotic sparc buses should get their own get_fw_dev_path()
 implementation.
 
  I don't think I got Lance setup right.
  
  OF paths for the devices would be:
  /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
  /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0
 If qdev hierarchy does not correspond to real HW there is no much we can
 do expect for fixing qdev.

That's bad.  This raises a concern: if these paths expose qdev
internals, any attempt to fix this will break migration.

  
  The logic for ESP is that ESP (registers at 0x7880, slot offset
  0x88) is handled by the DMA controller (registers at 0x7840,
  slot offset 0x84), they are in a SBus slot #5, and SBus (registers
  at 0x10001000) is in turn handled by IOMMU (registers at 0x1000).
  Lance should be handled the same way.
  
  This hierarchy is partly known by QEMU because DMA accesses use this
  flow, but not otherwise. There is no concept of SBus slots, DMA talks
  to IOMMU directly. Though in this case both ESP, Lance and their DMA
  controllers are on board devices in a MACIO chip. It may be possible
  to add the hierarchy information at each stage.
  
  It should also be possible for BIOS to determine the device just from
  the physical address if we ignored OF compatibility.
 It would be nice to be OF compatible at least at some level. Of course OF
 spec is not strict enough to have two different implementations produce
 exactly same device path that can be compared by strcpy.  Can we apply
 the series now? At least for x86 it provides useful paths and work can
 be continue for other arches by interested parties.
 
 --
   Gleb.

Something I only now realized is that we commit
to never changing the paths for any architecture
that supports migration.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote:
  On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote:
   2010/11/16 Gleb Natapov g...@redhat.com:
On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
 Perhaps the FW path should use device class names if no name is 
 specified.
 What do you mean by device class name. We can do something like 
 this:
 if (dev-child_bus.lh_first)
        return dev-child_bus.lh_first-info-name;

 i.e if there is child bus use its bus name as fw name. This will make
 all pci devices to have pci as fw name automatically. The problem 
 is
 that theoretically same device can provide different buses.
   
I meant PCI class name, like display for display controllers,
network for NICs etc.
   
That is what my pci bus related patch is doing already.
   
 I'll try Sparc32 to see how this fits there.
   
Except bootindex is not implemented for SCSI.
Will look into adding it.
   
   Thanks. The bootindex on Sparc32 looks like this:
   bootindex /e...@7880/d...@1,0
   /ether...@/ethernet-...@0
   
  For arches other then x86 there is a lot of work left to be done :)
  For starter exotic sparc buses should get their own get_fw_dev_path()
  implementation.
  
   I don't think I got Lance setup right.
   
   OF paths for the devices would be:
   /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
   /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0
  If qdev hierarchy does not correspond to real HW there is no much we can
  do expect for fixing qdev.
 
 That's bad.  This raises a concern: if these paths expose qdev
 internals, any attempt to fix this will break migration.
 
The path expose internal HW hierarchy. It is designed to do so. Qdev
designed to do the same: describe HW hierarchy. If qdev fails to do so it
is broken. I do not see connection to migration at all since the path is
not used in migration code.

   
   The logic for ESP is that ESP (registers at 0x7880, slot offset
   0x88) is handled by the DMA controller (registers at 0x7840,
   slot offset 0x84), they are in a SBus slot #5, and SBus (registers
   at 0x10001000) is in turn handled by IOMMU (registers at 0x1000).
   Lance should be handled the same way.
   
   This hierarchy is partly known by QEMU because DMA accesses use this
   flow, but not otherwise. There is no concept of SBus slots, DMA talks
   to IOMMU directly. Though in this case both ESP, Lance and their DMA
   controllers are on board devices in a MACIO chip. It may be possible
   to add the hierarchy information at each stage.
   
   It should also be possible for BIOS to determine the device just from
   the physical address if we ignored OF compatibility.
  It would be nice to be OF compatible at least at some level. Of course OF
  spec is not strict enough to have two different implementations produce
  exactly same device path that can be compared by strcpy.  Can we apply
  the series now? At least for x86 it provides useful paths and work can
  be continue for other arches by interested parties.
  
  --
  Gleb.
 
 Something I only now realized is that we commit
 to never changing the paths for any architecture
 that supports migration.
 
No connection to migration whatsoever.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote:
 On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote:
  On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote:
   On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote:
2010/11/16 Gleb Natapov g...@redhat.com:
 On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
  Perhaps the FW path should use device class names if no name is 
  specified.
  What do you mean by device class name. We can do something like 
  this:
  if (dev-child_bus.lh_first)
         return dev-child_bus.lh_first-info-name;
 
  i.e if there is child bus use its bus name as fw name. This will 
  make
  all pci devices to have pci as fw name automatically. The 
  problem is
  that theoretically same device can provide different buses.

 I meant PCI class name, like display for display controllers,
 network for NICs etc.

 That is what my pci bus related patch is doing already.

  I'll try Sparc32 to see how this fits there.

 Except bootindex is not implemented for SCSI.
 Will look into adding it.

Thanks. The bootindex on Sparc32 looks like this:
bootindex /e...@7880/d...@1,0
/ether...@/ethernet-...@0

   For arches other then x86 there is a lot of work left to be done :)
   For starter exotic sparc buses should get their own get_fw_dev_path()
   implementation.
   
I don't think I got Lance setup right.

OF paths for the devices would be:
/io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
/io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0
   If qdev hierarchy does not correspond to real HW there is no much we can
   do expect for fixing qdev.
  
  That's bad.  This raises a concern: if these paths expose qdev
  internals, any attempt to fix this will break migration.
  
 The path expose internal HW hierarchy. It is designed to do so. Qdev
 designed to do the same: describe HW hierarchy. If qdev fails to do so it
 is broken.

Yes. But since you use qdev to build up the path, a broken
qdev will give you a broken path.

 I do not see connection to migration at all since the path is
 not used in migration code.

The connection is that if we pass the list with path 1 which you define
as broken to BIOS, then migrate to a machine with an updated qemu
which has a correct path, BIOS won't be able to complete the boot.
Right? Same in reverse direction.
As solution could be a fuzzy matching
of paths that wiull let us recover.


The logic for ESP is that ESP (registers at 0x7880, slot offset
0x88) is handled by the DMA controller (registers at 0x7840,
slot offset 0x84), they are in a SBus slot #5, and SBus (registers
at 0x10001000) is in turn handled by IOMMU (registers at 0x1000).
Lance should be handled the same way.

This hierarchy is partly known by QEMU because DMA accesses use this
flow, but not otherwise. There is no concept of SBus slots, DMA talks
to IOMMU directly. Though in this case both ESP, Lance and their DMA
controllers are on board devices in a MACIO chip. It may be possible
to add the hierarchy information at each stage.

It should also be possible for BIOS to determine the device just from
the physical address if we ignored OF compatibility.
   It would be nice to be OF compatible at least at some level. Of course OF
   spec is not strict enough to have two different implementations produce
   exactly same device path that can be compared by strcpy.  Can we apply
   the series now? At least for x86 it provides useful paths and work can
   be continue for other arches by interested parties.
   
   --
 Gleb.
  
  Something I only now realized is that we commit
  to never changing the paths for any architecture
  that supports migration.
  
 No connection to migration whatsoever.

It just seems silly to use different paths for the same thing.

Besides the connection above, I was hoping to use these paths
for section names in migration. If we can't guarantee they are
stable, we'll have to roll our own, and if we do this,
with stability guarantees required for migration format,
maybe use it for other things like BIOS as well?

 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-18 Thread Sheng Yang
On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
 
   *entry may be stale after rcu_read_unlock().  Is this a problem?
 
 I suppose not. All MSI-X MMIO accessing would be executed without delay, so 
 no re-
 order issue would happen. If the guest is reading and writing the field at 
 the same
 time(from two cpus), it should got some kinds of sync method for itself - 
 or it
 may not care what's the reading result(like the one after msix_mask_irq()).

 I guess so.  Michael/Alex?

 This is kvm_get_irq_routing_entry which is used for table reads,
 correct?  Actually, the pci read *is* the sync method that guests use,
 they rely on reads to flush out all previous writes.

Michael, I think the *sync* you are talking about is not the one I
meant. I was talking about two cpus case, one is reading and the other
is writing, the order can't be determined if guest doesn't use lock or
some other synchronize methods; and you're talking about to flush out
all previous writes of the only one CPU...

-- 
regards,
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: assigned dev: MSI-X mask support

2010-11-18 Thread Sheng Yang
On Thu, Nov 18, 2010 at 5:28 PM, Avi Kivity a...@redhat.com wrote:
 On 11/18/2010 03:58 AM, Sheng Yang wrote:

 On Wednesday 17 November 2010 21:58:00 Avi Kivity wrote:
   On 11/15/2010 11:15 AM, Sheng Yang wrote:
     This patch enable per-vector mask for assigned devices using MSI-X.
   
     This patch provided two new APIs: one is for guest to specific
  device's
     MSI-X table address in MMIO, the other is for userspace to get
     information about mask bit.
   
     All the mask bit operation are kept in kernel, in order to
  accelerate.
     Userspace shouldn't access the device MMIO directly for the
  information,
     instead it should uses provided API to do so.
   
     Signed-off-by: Sheng Yangsh...@linux.intel.com
     ---
   
       arch/x86/kvm/x86.c       |    1 +
       include/linux/kvm.h      |   32 +
       include/linux/kvm_host.h |    5 +
       virt/kvm/assigned-dev.c  |  318
       +- 4 files changed,
 355
       insertions(+), 1 deletions(-)
 
   Documentation?

 For we are keeping changing the API for last several versions, I'd like to
 settle
 down the API first. Would bring back the document after API was agreed.

 Maybe for APIs we should start with only the documentation patch, agree on
 that, and move on to the implementation.

Yes, would follow it next time. And I would bring back the documents
in the next edition, for Michael and I have reached agreement on API.

   What if it's a 64-bit write on a 32-bit host?

 In fact we haven't support QWORD(64bit) accessing now. The reason is we
 haven't
 seen any OS is using it in this way now, so I think we can leave it later.

 Also seems QEmu doesn't got the way to handle 64bit MMIO.

 There's a difference, if the API doesn't support it, we can't add it later
 without changing both kernel and userspace.

Um... Which API you're talking about? I think userspace API(set msix
mmio, and get mask bit status) is unrelated here?

 
   That's not very good.  We should do the entire thing in the kernel or
  in
   userspace.  We can have a new EXIT_REASON to let userspace know an msix
   entry changed, and it should read it from the kernel.

 If you look it in this way:
 1. Mask bit owned by kernel.
 2. Routing owned by userspace.
 3. Read the routing in kernel is an speed up for normal operation -
 because kernel
 can read from them.

 So I think the logic here is clear to understand.

 Still, it's complicated and the state is split across multiple components.

So how about removing the reading acceleration part in the patch
temporarily? Kernel owns mask bit and userspace owns others. That
should be better. I can add the reading part later when we can find an
elegant way to do so.


 But if we can modify the routing in kernel, it would be raise some sync
 issues due
 to both kernel and userspace own routing. So maybe the better solution is
 move the
 routing to kernel.

 That may work, but I don't think we can do this for vfio.

-- 
regards,
Yang, Sheng

 --
 error compiling committee.c: too many arguments to function

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 01:52:30PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote:
  On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote:
   On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote:
On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote:
 2010/11/16 Gleb Natapov g...@redhat.com:
  On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
   Perhaps the FW path should use device class names if no name is 
   specified.
   What do you mean by device class name. We can do something 
   like this:
   if (dev-child_bus.lh_first)
          return dev-child_bus.lh_first-info-name;
  
   i.e if there is child bus use its bus name as fw name. This will 
   make
   all pci devices to have pci as fw name automatically. The 
   problem is
   that theoretically same device can provide different buses.
 
  I meant PCI class name, like display for display controllers,
  network for NICs etc.
 
  That is what my pci bus related patch is doing already.
 
   I'll try Sparc32 to see how this fits there.
 
  Except bootindex is not implemented for SCSI.
  Will look into adding it.
 
 Thanks. The bootindex on Sparc32 looks like this:
 bootindex /e...@7880/d...@1,0
 /ether...@/ethernet-...@0
 
For arches other then x86 there is a lot of work left to be done :)
For starter exotic sparc buses should get their own get_fw_dev_path()
implementation.

 I don't think I got Lance setup right.
 
 OF paths for the devices would be:
 /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
 /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0
If qdev hierarchy does not correspond to real HW there is no much we can
do expect for fixing qdev.
   
   That's bad.  This raises a concern: if these paths expose qdev
   internals, any attempt to fix this will break migration.
   
  The path expose internal HW hierarchy. It is designed to do so. Qdev
  designed to do the same: describe HW hierarchy. If qdev fails to do so it
  is broken.
 
 Yes. But since you use qdev to build up the path, a broken
 qdev will give you a broken path.
 
Qdev bug. Fix it like any other bug. The nice is that when you compare
device path produced by qdev and real HW you can see when qdev is wrong.

  I do not see connection to migration at all since the path is
  not used in migration code.
 
 The connection is that if we pass the list with path 1 which you define
 as broken to BIOS, then migrate to a machine with an updated qemu
 which has a correct path, BIOS won't be able to complete the boot.
You solve it like you solve all such issue with -M machine type.
But the problem exists only if migration happens in a short window
between start of the boot process and BIOS reading boot order string.
After reboot new qemu should have new BIOS.

 Right? Same in reverse direction.
Reverse direction is not and never was supported.

 As solution could be a fuzzy matching
 of paths that wiull let us recover.
 
Firmware can try its best of course, but nothing is guarantied.

 
 The logic for ESP is that ESP (registers at 0x7880, slot offset
 0x88) is handled by the DMA controller (registers at 0x7840,
 slot offset 0x84), they are in a SBus slot #5, and SBus (registers
 at 0x10001000) is in turn handled by IOMMU (registers at 0x1000).
 Lance should be handled the same way.
 
 This hierarchy is partly known by QEMU because DMA accesses use this
 flow, but not otherwise. There is no concept of SBus slots, DMA talks
 to IOMMU directly. Though in this case both ESP, Lance and their DMA
 controllers are on board devices in a MACIO chip. It may be possible
 to add the hierarchy information at each stage.
 
 It should also be possible for BIOS to determine the device just from
 the physical address if we ignored OF compatibility.
It would be nice to be OF compatible at least at some level. Of course 
OF
spec is not strict enough to have two different implementations produce
exactly same device path that can be compared by strcpy.  Can we apply
the series now? At least for x86 it provides useful paths and work can
be continue for other arches by interested parties.

--
Gleb.
   
   Something I only now realized is that we commit
   to never changing the paths for any architecture
   that supports migration.
   
  No connection to migration whatsoever.
 
 It just seems silly to use different paths for the same thing.
 
 Besides the connection above, I was hoping to use these paths
 for section names in migration. If we can't guarantee they are
 stable, we'll have to roll our own, and if we do this,
 with stability guarantees required for 

[PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()

2010-11-18 Thread Avi Kivity
cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run()
to prevent multiple copies of the context switch from being generated (causing
problems due to a label).  This patch folds them back together again and adds
the __noclone attribute to prevent the label from being duplicated.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/vmx.c |   63 ---
 1 files changed, 25 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9367abc..b4b66a8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3902,17 +3902,33 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 #define Q l
 #endif
 
-/*
- * We put this into a separate noinline function to prevent the compiler
- * from duplicating the code. This is needed because this code
- * uses non local labels that cannot be duplicated.
- * Do not put any flow control into this function.
- * Better would be to put this whole monstrosity into a .S file.
- */
-static void noinline do_vmx_vcpu_run(struct kvm_vcpu *vcpu)
+static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   asm volatile(
+
+   /* Record the guest's net vcpu time for enforced NMI injections. */
+   if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
+   vmx-entry_time = ktime_get();
+
+   /* Don't enter VMX if guest state is invalid, let the exit handler
+  start emulation until we arrive back to a valid state */
+   if (vmx-emulation_required  emulate_invalid_guest_state)
+   return;
+
+   if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty))
+   vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]);
+   if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty))
+   vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]);
+
+   /* When single-stepping over STI and MOV SS, we must clear the
+* corresponding interruptibility bits in the guest state. Otherwise
+* vmentry fails as it then expects bit 14 (BS) in pending debug
+* exceptions being set, but that's not correct for the guest debugging
+* case. */
+   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
+   vmx_set_interrupt_shadow(vcpu, 0);
+
+   asm(
/* Store host registers */
push %%Rdx; push %%Rbp;
push %%Rcx \n\t
@@ -4007,35 +4023,6 @@ static void noinline do_vmx_vcpu_run(struct kvm_vcpu 
*vcpu)
, r8, r9, r10, r11, r12, r13, r14, r15
 #endif
  );
-}
-
-static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
-{
-   struct vcpu_vmx *vmx = to_vmx(vcpu);
-
-   /* Record the guest's net vcpu time for enforced NMI injections. */
-   if (unlikely(!cpu_has_virtual_nmis()  vmx-soft_vnmi_blocked))
-   vmx-entry_time = ktime_get();
-
-   /* Don't enter VMX if guest state is invalid, let the exit handler
-  start emulation until we arrive back to a valid state */
-   if (vmx-emulation_required  emulate_invalid_guest_state)
-   return;
-
-   if (test_bit(VCPU_REGS_RSP, (unsigned long *)vcpu-arch.regs_dirty))
-   vmcs_writel(GUEST_RSP, vcpu-arch.regs[VCPU_REGS_RSP]);
-   if (test_bit(VCPU_REGS_RIP, (unsigned long *)vcpu-arch.regs_dirty))
-   vmcs_writel(GUEST_RIP, vcpu-arch.regs[VCPU_REGS_RIP]);
-
-   /* When single-stepping over STI and MOV SS, we must clear the
-* corresponding interruptibility bits in the guest state. Otherwise
-* vmentry fails as it then expects bit 14 (BS) in pending debug
-* exceptions being set, but that's not correct for the guest debugging
-* case. */
-   if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
-   vmx_set_interrupt_shadow(vcpu, 0);
-
-   do_vmx_vcpu_run(vcpu);
 
vcpu-arch.regs_avail = ~((1  VCPU_REGS_RIP) | (1  VCPU_REGS_RSP)
  | (1  VCPU_EXREG_PDPTR));
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 02:16:05PM +0200, Gleb Natapov wrote:
 On Thu, Nov 18, 2010 at 01:52:30PM +0200, Michael S. Tsirkin wrote:
  On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote:
   On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote:
On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote:
 On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote:
  2010/11/16 Gleb Natapov g...@redhat.com:
   On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
Perhaps the FW path should use device class names if no name 
is specified.
What do you mean by device class name. We can do something 
like this:
if (dev-child_bus.lh_first)
       return dev-child_bus.lh_first-info-name;
   
i.e if there is child bus use its bus name as fw name. This 
will make
all pci devices to have pci as fw name automatically. The 
problem is
that theoretically same device can provide different buses.
  
   I meant PCI class name, like display for display controllers,
   network for NICs etc.
  
   That is what my pci bus related patch is doing already.
  
I'll try Sparc32 to see how this fits there.
  
   Except bootindex is not implemented for SCSI.
   Will look into adding it.
  
  Thanks. The bootindex on Sparc32 looks like this:
  bootindex /e...@7880/d...@1,0
  /ether...@/ethernet-...@0
  
 For arches other then x86 there is a lot of work left to be done :)
 For starter exotic sparc buses should get their own get_fw_dev_path()
 implementation.
 
  I don't think I got Lance setup right.
  
  OF paths for the devices would be:
  /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
  /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0
 If qdev hierarchy does not correspond to real HW there is no much we 
 can
 do expect for fixing qdev.

That's bad.  This raises a concern: if these paths expose qdev
internals, any attempt to fix this will break migration.

   The path expose internal HW hierarchy. It is designed to do so. Qdev
   designed to do the same: describe HW hierarchy. If qdev fails to do so it
   is broken.
  
  Yes. But since you use qdev to build up the path, a broken
  qdev will give you a broken path.
  
 Qdev bug. Fix it like any other bug. The nice is that when you compare
 device path produced by qdev and real HW you can see when qdev is wrong.
 
   I do not see connection to migration at all since the path is
   not used in migration code.
  
  The connection is that if we pass the list with path 1 which you define
  as broken to BIOS, then migrate to a machine with an updated qemu
  which has a correct path, BIOS won't be able to complete the boot.
 You solve it like you solve all such issue with -M machine type.

So that's unavoidable if we think paths are correct.
But if we know they are wrong, we are better off
correcting them first IMO.

 But the problem exists only if migration happens in a short window
 between start of the boot process and BIOS reading boot order string.
 After reboot new qemu should have new BIOS.

That makes it even more nasty, doesn't it?

  Right? Same in reverse direction.
 Reverse direction is not and never was supported.
 
  As solution could be a fuzzy matching
  of paths that wiull let us recover.
  
 Firmware can try its best of course, but nothing is guarantied.

No I mean qemu could do matching fuzzily.
This way if we get a path from the old BIOS we can
survive.

  
  The logic for ESP is that ESP (registers at 0x7880, slot offset
  0x88) is handled by the DMA controller (registers at 0x7840,
  slot offset 0x84), they are in a SBus slot #5, and SBus 
  (registers
  at 0x10001000) is in turn handled by IOMMU (registers at 
  0x1000).
  Lance should be handled the same way.
  
  This hierarchy is partly known by QEMU because DMA accesses use this
  flow, but not otherwise. There is no concept of SBus slots, DMA 
  talks
  to IOMMU directly. Though in this case both ESP, Lance and their DMA
  controllers are on board devices in a MACIO chip. It may be possible
  to add the hierarchy information at each stage.
  
  It should also be possible for BIOS to determine the device just 
  from
  the physical address if we ignored OF compatibility.
 It would be nice to be OF compatible at least at some level. Of 
 course OF
 spec is not strict enough to have two different implementations 
 produce
 exactly same device path that can be compared by strcpy.  Can we apply
 the series now? At least for x86 it provides useful paths and work can
 be continue for other arches by interested parties.
 
 --
   Gleb.
  

Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Avi Kivity

On 11/18/2010 01:10 PM, Michael S. Tsirkin wrote:

  I guess I should create an empty Documentation/kvm/locking.txt and
  force everyone else to update it.

Comments near the relevant fields not better?



Not an either/or.  You can't understand the system from random source 
comments.



diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a055742..d13ced3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -16,6 +16,7 @@
  #includelinux/mm.h
  #includelinux/preempt.h
  #includelinux/msi.h
+#includelinux/rcupdate.h
  #includeasm/signal.h

  #includelinux/kvm.h
@@ -206,6 +207,8 @@ struct kvm {

struct mutex irq_lock;
  #ifdef CONFIG_HAVE_KVM_IRQCHIP
+   /* Update side is protected by irq_lock and,
+* if configured, irqfds.lock. */


/*
 * kernel style comment
 * here and elsewhere
 */




struct kvm_irq_routing_table __rcu *irq_routing;
struct hlist_head mask_notifier_list;
struct hlist_head irq_ack_notifier_list;
@@ -462,6 +465,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
  #endif
  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
+   int irq_source_id, int level);


No point in the level argument for an msi specific function.



  #else
@@ -614,6 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, int 
gsi, int flags)
  }

  static inline void kvm_irqfd_release(struct kvm *kvm) {}


blank line


+static inline void kvm_irq_routing_update(struct kvm *kvm,
+ struct kvm_irq_routing_table *irq_rt)
+{
+   rcu_assign_pointer(kvm-irq_routing, irq_rt);
+}
+
  static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
  {
return -ENOSYS;


Apart from these minor issues, looks good.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 07:59:10PM +0800, Sheng Yang wrote:
 On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
  
    *entry may be stale after rcu_read_unlock().  Is this a problem?
  
  I suppose not. All MSI-X MMIO accessing would be executed without delay, 
  so no re-
  order issue would happen. If the guest is reading and writing the field 
  at the same
  time(from two cpus), it should got some kinds of sync method for itself - 
  or it
  may not care what's the reading result(like the one after 
  msix_mask_irq()).
 
  I guess so.  Michael/Alex?
 
  This is kvm_get_irq_routing_entry which is used for table reads,
  correct?  Actually, the pci read *is* the sync method that guests use,
  they rely on reads to flush out all previous writes.
 
 Michael, I think the *sync* you are talking about is not the one I
 meant. I was talking about two cpus case, one is reading and the other
 is writing, the order can't be determined if guest doesn't use lock or
 some other synchronize methods; and you're talking about to flush out
 all previous writes of the only one CPU...

Yes, but you don't seem to flush out writes on a read, either.

 -- 
 regards,
 Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 02:23:20PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 02:16:05PM +0200, Gleb Natapov wrote:
  On Thu, Nov 18, 2010 at 01:52:30PM +0200, Michael S. Tsirkin wrote:
   On Thu, Nov 18, 2010 at 01:45:04PM +0200, Gleb Natapov wrote:
On Thu, Nov 18, 2010 at 01:38:31PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 12:18:27PM +0200, Gleb Natapov wrote:
  On Wed, Nov 17, 2010 at 09:54:27PM +, Blue Swirl wrote:
   2010/11/16 Gleb Natapov g...@redhat.com:
On Tue, Nov 16, 2010 at 06:30:19PM +, Blue Swirl wrote:
 Perhaps the FW path should use device class names if no 
 name is specified.
 What do you mean by device class name. We can do something 
 like this:
 if (dev-child_bus.lh_first)
        return dev-child_bus.lh_first-info-name;

 i.e if there is child bus use its bus name as fw name. This 
 will make
 all pci devices to have pci as fw name automatically. The 
 problem is
 that theoretically same device can provide different buses.
   
I meant PCI class name, like display for display controllers,
network for NICs etc.
   
That is what my pci bus related patch is doing already.
   
 I'll try Sparc32 to see how this fits there.
   
Except bootindex is not implemented for SCSI.
Will look into adding it.
   
   Thanks. The bootindex on Sparc32 looks like this:
   bootindex /e...@7880/d...@1,0
   /ether...@/ethernet-...@0
   
  For arches other then x86 there is a lot of work left to be done :)
  For starter exotic sparc buses should get their own 
  get_fw_dev_path()
  implementation.
  
   I don't think I got Lance setup right.
   
   OF paths for the devices would be:
   /io...@0,1000/s...@0,10001000/esp...@5,840/e...@5,880/s...@1,0
   /io...@0,1000/s...@0,10001000/le...@5,8400010/l...@5,8c0
  If qdev hierarchy does not correspond to real HW there is no much 
  we can
  do expect for fixing qdev.
 
 That's bad.  This raises a concern: if these paths expose qdev
 internals, any attempt to fix this will break migration.
 
The path expose internal HW hierarchy. It is designed to do so. Qdev
designed to do the same: describe HW hierarchy. If qdev fails to do so 
it
is broken.
   
   Yes. But since you use qdev to build up the path, a broken
   qdev will give you a broken path.
   
  Qdev bug. Fix it like any other bug. The nice is that when you compare
  device path produced by qdev and real HW you can see when qdev is wrong.
  
I do not see connection to migration at all since the path is
not used in migration code.
   
   The connection is that if we pass the list with path 1 which you define
   as broken to BIOS, then migrate to a machine with an updated qemu
   which has a correct path, BIOS won't be able to complete the boot.
  You solve it like you solve all such issue with -M machine type.
 
 So that's unavoidable if we think paths are correct.
 But if we know they are wrong, we are better off
 correcting them first IMO.
 
They are correct for x86. My patch set does not even tries to cover all
HW. If sparc want to use them to it better be fixed. Or if there is enough
info in the path to determine device it may choose to use it as is.

  But the problem exists only if migration happens in a short window
  between start of the boot process and BIOS reading boot order string.
  After reboot new qemu should have new BIOS.
 
 That makes it even more nasty, doesn't it?
No.

 
   Right? Same in reverse direction.
  Reverse direction is not and never was supported.
  
   As solution could be a fuzzy matching
   of paths that wiull let us recover.
   
  Firmware can try its best of course, but nothing is guarantied.
 
 No I mean qemu could do matching fuzzily.
 This way if we get a path from the old BIOS we can
 survive.
Qemu does not take paths from BIOS so I don't know what are you talking
about here.

 
   
   The logic for ESP is that ESP (registers at 0x7880, slot 
   offset
   0x88) is handled by the DMA controller (registers at 
   0x7840,
   slot offset 0x84), they are in a SBus slot #5, and SBus 
   (registers
   at 0x10001000) is in turn handled by IOMMU (registers at 
   0x1000).
   Lance should be handled the same way.
   
   This hierarchy is partly known by QEMU because DMA accesses use 
   this
   flow, but not otherwise. There is no concept of SBus slots, DMA 
   talks
   to IOMMU directly. Though in this case both ESP, Lance and their 
   DMA
   controllers are on board devices in a MACIO chip. It may be 
   possible
   to add the hierarchy information at each stage.
   
   It should also be possible 

Re: [PATCH 4/6] KVM: Add kvm_get_irq_routing_entry() func

2010-11-18 Thread Sheng Yang
On Thu, Nov 18, 2010 at 8:33 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Nov 18, 2010 at 07:59:10PM +0800, Sheng Yang wrote:
 On Thu, Nov 18, 2010 at 5:41 PM, Michael S. Tsirkin m...@redhat.com wrote:
  On Thu, Nov 18, 2010 at 11:30:47AM +0200, Avi Kivity wrote:
  
    *entry may be stale after rcu_read_unlock().  Is this a problem?
  
  I suppose not. All MSI-X MMIO accessing would be executed without delay, 
  so no re-
  order issue would happen. If the guest is reading and writing the field 
  at the same
  time(from two cpus), it should got some kinds of sync method for itself 
  - or it
  may not care what's the reading result(like the one after 
  msix_mask_irq()).
 
  I guess so.  Michael/Alex?
 
  This is kvm_get_irq_routing_entry which is used for table reads,
  correct?  Actually, the pci read *is* the sync method that guests use,
  they rely on reads to flush out all previous writes.

 Michael, I think the *sync* you are talking about is not the one I
 meant. I was talking about two cpus case, one is reading and the other
 is writing, the order can't be determined if guest doesn't use lock or
 some other synchronize methods; and you're talking about to flush out
 all previous writes of the only one CPU...

 Yes, but you don't seem to flush out writes on a read, either.

... I don't understand... We are emulating the writing operation using
software and make it in effect immediately... What should we supposed
to do with this flush?

-- 
regards,
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot

2010-11-18 Thread Avi Kivity

On 11/18/2010 07:14 AM, Takuya Yoshikawa wrote:

This patch introduces the counter to hold the number of dirty bits in each
memslot. We will use this to optimize dirty logging later.


@@ -3217,11 +3216,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,

n = kvm_dirty_bitmap_bytes(memslot);

-   for (i = 0; !is_dirty  i  n/sizeof(long); i++)
-   is_dirty = memslot-dirty_bitmap[i];
-


This can already be an improvement.

@@ -152,6 +152,7 @@ struct kvm_memory_slot {
unsigned long *rmap;
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_head;
+   unsigned long num_dirty_bits;


The bits themselves are not dirty; only the pages are dirty.

(+ we usually use 'nr' for 'number')



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 02:29:11PM +0200, Avi Kivity wrote:
 On 11/18/2010 01:10 PM, Michael S. Tsirkin wrote:
   I guess I should create an empty Documentation/kvm/locking.txt and
   force everyone else to update it.
 
 Comments near the relevant fields not better?
 
 
 Not an either/or.  You can't understand the system from random
 source comments.
 
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index a055742..d13ced3 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -16,6 +16,7 @@
   #includelinux/mm.h
   #includelinux/preempt.h
   #includelinux/msi.h
 +#includelinux/rcupdate.h
   #includeasm/signal.h
 
   #includelinux/kvm.h
 @@ -206,6 +207,8 @@ struct kvm {
 
  struct mutex irq_lock;
   #ifdef CONFIG_HAVE_KVM_IRQCHIP
 +/* Update side is protected by irq_lock and,
 + * if configured, irqfds.lock. */
 
 /*
  * kernel style comment
  * here and elsewhere
  */
 
 
 
  struct kvm_irq_routing_table __rcu *irq_routing;
  struct hlist_head mask_notifier_list;
  struct hlist_head irq_ack_notifier_list;
 @@ -462,6 +465,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
 *ioapic,
 unsigned long *deliver_bitmask);
   #endif
   int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
 +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
 *kvm,
 +int irq_source_id, int level);
 
 No point in the level argument for an msi specific function.

This is an existing function I made non-static.
We have per-gsi callbacks so level is required there to match.
I could add a wrapper I guess:

int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
int irq_source_id, int level)
{
if (!level)
return -1;
return kvm_send_msi(irq_entry, kvm, irq_source_id);
}

This results in less code for irqfd but more code for ioctl injection
... is it worth it?

 
   #else
 @@ -614,6 +620,12 @@ static inline int kvm_irqfd(struct kvm *kvm, int fd, 
 int gsi, int flags)
   }
 
   static inline void kvm_irqfd_release(struct kvm *kvm) {}
 
 blank line
 

There's no line before kvm_eventfd_init either ...
I added one.

 +static inline void kvm_irq_routing_update(struct kvm *kvm,
 +  struct kvm_irq_routing_table *irq_rt)
 +{
 +rcu_assign_pointer(kvm-irq_routing, irq_rt);
 +}
 +
   static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
  *args)
   {
  return -ENOSYS;
 
 Apart from these minor issues, looks good.


Something we should consider improving is the loop over all VCPUs that
kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
interrupts) it should be possible to precompute an store the CPU
in question as part of the routing entry.

Something for a separate patch ... comments?

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap

2010-11-18 Thread Avi Kivity

On 11/18/2010 07:15 AM, Takuya Yoshikawa wrote:

Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using
rmap: kvm: rework remove-write-access for a slot
   http://www.spinics.net/lists/kvm/msg35871.html

One problem pointed out there was that this approach might hurt cache locality
and make things slow down.

But if we restrict the story to dirty logging, we notice that only small
portion of pages are actually needed to be write protected.

For example, I have confirmed that even when we are playing with tools like
x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%.

In the case of live-migration, we will see more sparseness in the usual
workload because the RAM size is really big.

So this patch uses his approach with small modification to use switched out
dirty bitmap as a hint to restrict the rmap travel.

We can also use this to selectively write protect pages to reduce unwanted page
faults in the future.



Looks like a good approach.  Any measurements?



+static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte = rmap_next(kvm, rmapp, NULL);
+
+   while (spte) {
+   /* avoid RMW */
+   if (is_writable_pte(*spte))
+   *spte= ~PT_WRITABLE_MASK;


This is racy, *spte can be modified concurrently by hardware.

update_spte() can be used for this.


+   spte = rmap_next(kvm, rmapp, spte);
+   }
+}
+
+/*
+ * Write protect the pages set dirty in a given bitmap.
+ */
+void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm,
+  struct kvm_memory_slot *memslot,
+  unsigned long *dirty_bitmap)
+{
+   int i;
+   unsigned long gfn_offset;
+
+   for_each_set_bit(gfn_offset, dirty_bitmap, memslot-npages) {
+   rmapp_remove_write_access(kvm,memslot-rmap[gfn_offset]);
+
+   for (i = 0; i  KVM_NR_PAGE_SIZES - 1; i++) {
+   unsigned long gfn = memslot-base_gfn + gfn_offset;
+   unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2);
+   int idx = gfn / huge - memslot-base_gfn / huge;


Better to use a shift than a division here.


+
+   if (!(gfn_offset || (gfn % huge)))
+   break;


Why?


+   rmapp_remove_write_access(kvm,
+   memslot-lpage_info[i][idx].rmap_pde);
+   }
+   }
+   kvm_flush_remote_tlbs(kvm);
+}
+
  void kvm_mmu_zap_all(struct kvm *kvm)
  {
struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 038d719..3556b4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
  }

  /*
+ * Check the dirty bit ratio of a given memslot.
+ *   0: clean
+ *   1: sparse
+ *   2: dense
+ */
+static int dirty_bitmap_density(struct kvm_memory_slot *memslot)
+{
+   if (!memslot-num_dirty_bits)
+   return 0;
+   if (memslot-num_dirty_bits  memslot-npages / 128)
+   return 1;
+   return 2;
+}


Use an enum please.


+
+/*
   * Get (and clear) the dirty memory log for a memory slot.
   */
  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  struct kvm_dirty_log *log)
  {
-   int r;
+   int r, density;
struct kvm_memory_slot *memslot;
unsigned long n;

@@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
n = kvm_dirty_bitmap_bytes(memslot);

/* If nothing is dirty, don't bother messing with page tables. */
-   if (memslot-num_dirty_bits) {
+   density = dirty_bitmap_density(memslot);
+   if (density) {
struct kvm_memslots *slots, *old_slots;
unsigned long *dirty_bitmap;

@@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
kfree(old_slots);

spin_lock(kvm-mmu_lock);
-   kvm_mmu_slot_remove_write_access(kvm, log-slot);
+   if (density == 2)
+   kvm_mmu_slot_remove_write_access(kvm, log-slot);
+   else
+   kvm_mmu_slot_remove_write_access_mask(kvm,
+   slots-memslots[log-slot],
+   dirty_bitmap);
spin_unlock(kvm-mmu_lock);


wrt. O(1) write protection: hard to tell if the two methods can 
coexist.  For direct mapped shadow pages (i.e. ep/npt) I think we can 
use the mask to speed up clearing of an individual sp's sptes.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Avi Kivity

On 11/18/2010 03:03 PM, Michael S. Tsirkin wrote:

 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
  +int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
  + int irq_source_id, int level);

  No point in the level argument for an msi specific function.

This is an existing function I made non-static.
We have per-gsi callbacks so level is required there to match.


Right.


I could add a wrapper I guess:

int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm,
int irq_source_id, int level)
{
if (!level)
return -1;
return kvm_send_msi(irq_entry, kvm, irq_source_id);
}

This results in less code for irqfd but more code for ioctl injection
... is it worth it?


IMO not.



  Apart from these minor issues, looks good.


Something we should consider improving is the loop over all VCPUs that
kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
interrupts) it should be possible to precompute an store the CPU
in question as part of the routing entry.



Something for a separate patch ... comments?


Yes.  Either precompute, or compute on first use and cache.  Precompute 
is more realtime-friendly so I prefer it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot

2010-11-18 Thread Jan Kiszka
Am 18.11.2010 13:54, Avi Kivity wrote:
 On 11/18/2010 07:14 AM, Takuya Yoshikawa wrote:
 This patch introduces the counter to hold the number of dirty bits in
 each
 memslot. We will use this to optimize dirty logging later.


 @@ -3217,11 +3216,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,

   n = kvm_dirty_bitmap_bytes(memslot);

 -for (i = 0; !is_dirty  i  n/sizeof(long); i++)
 -is_dirty = memslot-dirty_bitmap[i];
 -
 
 This can already be an improvement.

/Me wonders if it wouldn't make sense to expand this optimization to the
user space interface as well, i.e. signaling there are no dirty pages
via some flag instead of writing zeros in a bitmap. Of course, this
means supporting both interfaces for a longer period.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 02:37:08PM +0200, Gleb Natapov wrote:
  So that's unavoidable if we think paths are correct.
  But if we know they are wrong, we are better off
  correcting them first IMO.
  
 They are correct for x86. My patch set does not even tries to cover all
 HW. If sparc want to use them to it better be fixed. Or if there is enough
 info in the path to determine device it may choose to use it as is.

Fair enough I guess.

   But the problem exists only if migration happens in a short window
   between start of the boot process and BIOS reading boot order string.
   After reboot new qemu should have new BIOS.
  
  That makes it even more nasty, doesn't it?
 No.

Nasty as in hard to reproduce.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] virtio-9p: fix build on !CONFIG_UTIMENSAT

2010-11-18 Thread Philipp Hahn
Hello,

Am Donnerstag 18 November 2010 10:05:59 schrieb Hidetoshi Seto:
  Your have to be careful about compile-time-detection and
  runtime-detection:
...
 -#ifdef CONFIG_UTIMENSAT
 -return utimensat(dirfd, path, times, flags);
 -#else
 +{
 +int ret = utimensat(dirfd, path, times, flags);
 +if (ret != -1 || errno != ENOSYS) {
 +return ret;
 +}
 +}

You might still want to do the compile-time-check, something like:
#ifdef CONFIG_UTIMENSAT
{
int ret = utimensat(dirfd, path, times, flags);
if (ret != -1 || errno != ENOSYS) {
return ret;
}
}
#endif
// fallback

Sincerely
Philipp Hahn
-- 
Philipp Hahn   Open Source Software Engineer  h...@univention.de   
Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0
Mary-Somerville-Str.1  28359 Bremen   fax: +49 421 22 232-99
http://www.univention.de


signature.asc
Description: This is a digitally signed message part.


Re: [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot

2010-11-18 Thread Avi Kivity

On 11/18/2010 03:10 PM, Jan Kiszka wrote:

Am 18.11.2010 13:54, Avi Kivity wrote:
  On 11/18/2010 07:14 AM, Takuya Yoshikawa wrote:
  This patch introduces the counter to hold the number of dirty bits in
  each
  memslot. We will use this to optimize dirty logging later.


  @@ -3217,11 +3216,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,

n = kvm_dirty_bitmap_bytes(memslot);

  -for (i = 0; !is_dirty   i   n/sizeof(long); i++)
  -is_dirty = memslot-dirty_bitmap[i];
  -

  This can already be an improvement.

/Me wonders if it wouldn't make sense to expand this optimization to the
user space interface as well, i.e. signaling there are no dirty pages
via some flag instead of writing zeros in a bitmap. Of course, this
means supporting both interfaces for a longer period.


An 8MB framebuffer is 2K bits, or 256 bytes wide.  Comparing 256 cache 
hot bytes against zero is not worth a new interface.


Larger memory slots are very unlikely to be always unmodified.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote:
  +static inline void kvm_irq_routing_update(struct kvm *kvm,
  +struct kvm_irq_routing_table *irq_rt)
  +{
  +  rcu_assign_pointer(kvm-irq_routing, irq_rt);
  +}
  +
static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
   *args)
{
 return -ENOSYS;
  
  Apart from these minor issues, looks good.
 
 
 Something we should consider improving is the loop over all VCPUs that
 kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
 interrupts) it should be possible to precompute an store the CPU
 in question as part of the routing entry.
 
 Something for a separate patch ... comments?
 
I do not think this info should be part of routing entry. Routing entry
is more about describing wires on the board. Other then that
this is a good idea that, IIRC, we already discussed once.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4 15/15] Pass boot device list to firmware.

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 03:12:02PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 02:37:08PM +0200, Gleb Natapov wrote:
   So that's unavoidable if we think paths are correct.
   But if we know they are wrong, we are better off
   correcting them first IMO.
   
  They are correct for x86. My patch set does not even tries to cover all
  HW. If sparc want to use them to it better be fixed. Or if there is enough
  info in the path to determine device it may choose to use it as is.
 
 Fair enough I guess.
 
But the problem exists only if migration happens in a short window
between start of the boot process and BIOS reading boot order string.
After reboot new qemu should have new BIOS.
   
   That makes it even more nasty, doesn't it?
  No.
 
 Nasty as in hard to reproduce.
 
It is very easy to reproduce if you know what you are looking for :).
Just stick sleep() in correct place in the BIOS.
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Avi Kivity

On 11/18/2010 03:14 PM, Gleb Natapov wrote:

On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote:
+static inline void kvm_irq_routing_update(struct kvm *kvm,
+ struct kvm_irq_routing_table 
*irq_rt)
+{
+   rcu_assign_pointer(kvm-irq_routing, irq_rt);
+}
+
   static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
   {
return -ENOSYS;
  
Apart from these minor issues, looks good.


  Something we should consider improving is the loop over all VCPUs that
  kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
  interrupts) it should be possible to precompute an store the CPU
  in question as part of the routing entry.

  Something for a separate patch ... comments?

I do not think this info should be part of routing entry. Routing entry
is more about describing wires on the board. Other then that
this is a good idea that, IIRC, we already discussed once.



Not as part of the routing entry exposed to userspace.  But as a private 
kernel field, why not?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote:
 On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote:
   +static inline void kvm_irq_routing_update(struct kvm *kvm,
   +  struct kvm_irq_routing_table 
   *irq_rt)
   +{
   +rcu_assign_pointer(kvm-irq_routing, irq_rt);
   +}
   +
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
 {
return -ENOSYS;
   
   Apart from these minor issues, looks good.
  
  
  Something we should consider improving is the loop over all VCPUs that
  kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
  interrupts) it should be possible to precompute an store the CPU
  in question as part of the routing entry.
  
  Something for a separate patch ... comments?
  
 I do not think this info should be part of routing entry. Routing entry
 is more about describing wires on the board.

Not for msi. kvm_kernel_irq_routing_entry seems to just keep an
address/data pair in that case. So

union {
struct {
unsigned irqchip;
unsigned pin;
} irqchip;
struct msi_msg msi;
};

would become

union {
struct {
unsigned irqchip;
unsigned pin;
} irqchip;
struct {
struct msi_msg msi;
struct kvm_vpcu *dest;
} msi;
};

or something like this.

 Other then that
 this is a good idea that, IIRC, we already discussed once.
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote:
  On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote:
+static inline void kvm_irq_routing_update(struct kvm *kvm,
+struct kvm_irq_routing_table 
*irq_rt)
+{
+  rcu_assign_pointer(kvm-irq_routing, irq_rt);
+}
+
  static inline int kvm_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
  {
   return -ENOSYS;

Apart from these minor issues, looks good.
   
   
   Something we should consider improving is the loop over all VCPUs that
   kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
   interrupts) it should be possible to precompute an store the CPU
   in question as part of the routing entry.
   
   Something for a separate patch ... comments?
   
  I do not think this info should be part of routing entry. Routing entry
  is more about describing wires on the board.
 
 Not for msi. kvm_kernel_irq_routing_entry seems to just keep an
 address/data pair in that case. So
 
Yeah. Using routing_entry for MSI was miss design. We discussed that too :)

   union {
   struct {
   unsigned irqchip;
   unsigned pin;
   } irqchip;
   struct msi_msg msi;
   };
 
 would become
 
   union {
   struct {
   unsigned irqchip;
   unsigned pin;
   } irqchip;
   struct {
   struct msi_msg msi;
   struct kvm_vpcu *dest;
   } msi;
   };
 
 or something like this.
Ah so you want to do it only for MSI? For MSI it makes sense. Remember
though that sometimes destination depend on message itself (specifically
on delivery mode).

 
  Other then that
  this is a good idea that, IIRC, we already discussed once.
  
  --
  Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Avi Kivity

On 11/18/2010 03:35 PM, Gleb Natapov wrote:


  or something like this.
Ah so you want to do it only for MSI? For MSI it makes sense. Remember
though that sometimes destination depend on message itself (specifically
on delivery mode).


Yes, broadcast or multicast or lowest priority wouldn't get this treatment.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()

2010-11-18 Thread Andi Kleen

On 11/18/2010 1:17 PM, Avi Kivity wrote:

cea15c2 (KVM: Move KVM context switch into own function) split vmx_vcpu_run()
to prevent multiple copies of the context switch from being generated (causing
problems due to a label).  This patch folds them back together again and adds
the __noclone attribute to prevent the label from being duplicated.


That won't work on gcc versions that didn't have __noclone yet. Noclone 
is fairly recent

(4.5 or 4.4)

-Andi

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 03:35:01PM +0200, Gleb Natapov wrote:
 On Thu, Nov 18, 2010 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
  On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote:
   On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote:
 +static inline void kvm_irq_routing_update(struct kvm *kvm,
 +  struct kvm_irq_routing_table 
 *irq_rt)
 +{
 +rcu_assign_pointer(kvm-irq_routing, irq_rt);
 +}
 +
   static inline int kvm_ioeventfd(struct kvm *kvm, struct 
  kvm_ioeventfd *args)
   {
  return -ENOSYS;
 
 Apart from these minor issues, looks good.


Something we should consider improving is the loop over all VCPUs that
kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
interrupts) it should be possible to precompute an store the CPU
in question as part of the routing entry.

Something for a separate patch ... comments?

   I do not think this info should be part of routing entry. Routing entry
   is more about describing wires on the board.
  
  Not for msi. kvm_kernel_irq_routing_entry seems to just keep an
  address/data pair in that case. So
  
 Yeah. Using routing_entry for MSI was miss design. We discussed that too :)
 
  union {
  struct {
  unsigned irqchip;
  unsigned pin;
  } irqchip;
  struct msi_msg msi;
  };
  
  would become
  
  union {
  struct {
  unsigned irqchip;
  unsigned pin;
  } irqchip;
  struct {
  struct msi_msg msi;
  struct kvm_vpcu *dest;
  } msi;
  };
  
  or something like this.
 Ah so you want to do it only for MSI? For MSI it makes sense. Remember
 though that sometimes destination depend on message itself (specifically
 on delivery mode).

Of course. We'll take message/data and precompute destination.
Set to NULL for e.g. broadcast and recompute at injection time
in that case.  BTW SELF doesn't work for MSI at the moment, not sure
whether it's relevant or when is it used.

  
   Other then that
   this is a good idea that, IIRC, we already discussed once.
   
   --
 Gleb.
 
 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 03:39:15PM +0200, Avi Kivity wrote:
 On 11/18/2010 03:35 PM, Gleb Natapov wrote:
 
   or something like this.
 Ah so you want to do it only for MSI? For MSI it makes sense. Remember
 though that sometimes destination depend on message itself (specifically
 on delivery mode).
 
 Yes, broadcast or multicast or lowest priority wouldn't get this treatment.

Unless there's a single online VCPU :)

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HAL type for Win2003 Server on recent KVM versions?

2010-11-18 Thread Kenni Lund
2010/11/18 Avi Kivity a...@redhat.com:
 On 11/18/2010 12:58 AM, Kenni Lund wrote:

 Hi

 I'm about to move a couple of virtual machines from a Fedora 11 system
 to a new server with a more recent operating system and newer version
 of KVM, etc.

 One of the guests is a Windows Server 2003 Standard SP2, which is
 currently running with the ACPI Multiprocessor PC HAL.

 Considering moving to RHEL, I've been reading the virtualization
 documentation for RHEL 6.0, which says that I need to set HAL to
 Standard PC when installing a new Win2003 guest.

 Since my current guest has been running perfectly fine for a long time
 with its current HAL, I was wondering if the system will become
 unstable, unbootable or what the disadvantage will be, if I move the
 guest to for example RHEL 6.0, without reinstalling or upgrading the
 guest to select another HAL mode?

 On the other hand, it seems like I can upgrade from the current
 ACPI Multiprocessor PC into Standard PC, but I'm not sure if I'll
 gain anything by trying this.


 I suggest using the default HAL, whatever it is.  That's what everyone else
 is using so you get the best tested configuration.

Thanks Avi, ACPI Multiprocessor PC was/is the default HAL, I didn't
change anything when the system was originally installed.

I'm curious why the RHEL 6 documentation claims that you actively need
to select the Standard PC HAL on installation, if it's not even the
recommended/preferred HAL...(?):
Windows 2003 requires a specific computer type in order to install
properly on a fully-virtualized guest. This needs to be specified at
the beginning of the installation process.[1]

[1] 
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization_Windows2003.html

Best regards
Kenni
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()

2010-11-18 Thread Avi Kivity

On 11/18/2010 03:48 PM, Andi Kleen wrote:

On 11/18/2010 1:17 PM, Avi Kivity wrote:
cea15c2 (KVM: Move KVM context switch into own function) split 
vmx_vcpu_run()
to prevent multiple copies of the context switch from being generated 
(causing
problems due to a label).  This patch folds them back together again 
and adds

the __noclone attribute to prevent the label from being duplicated.


That won't work on gcc versions that didn't have __noclone yet. 
Noclone is fairly recent

(4.5 or 4.4)


Are the gcc versions that don't have noclone susceptible to cloning?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Gleb Natapov
On Thu, Nov 18, 2010 at 03:48:43PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 18, 2010 at 03:35:01PM +0200, Gleb Natapov wrote:
  On Thu, Nov 18, 2010 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
   On Thu, Nov 18, 2010 at 03:14:53PM +0200, Gleb Natapov wrote:
On Thu, Nov 18, 2010 at 03:03:37PM +0200, Michael S. Tsirkin wrote:
  +static inline void kvm_irq_routing_update(struct kvm *kvm,
  +struct kvm_irq_routing_table 
  *irq_rt)
  +{
  +  rcu_assign_pointer(kvm-irq_routing, irq_rt);
  +}
  +
static inline int kvm_ioeventfd(struct kvm *kvm, struct 
   kvm_ioeventfd *args)
{
 return -ENOSYS;
  
  Apart from these minor issues, looks good.
 
 
 Something we should consider improving is the loop over all VCPUs that
 kvm_irq_delivery_to_apic invokes.  I think that (for non-broadcast
 interrupts) it should be possible to precompute an store the CPU
 in question as part of the routing entry.
 
 Something for a separate patch ... comments?
 
I do not think this info should be part of routing entry. Routing entry
is more about describing wires on the board.
   
   Not for msi. kvm_kernel_irq_routing_entry seems to just keep an
   address/data pair in that case. So
   
  Yeah. Using routing_entry for MSI was miss design. We discussed that too :)
  
 union {
 struct {
 unsigned irqchip;
 unsigned pin;
 } irqchip;
 struct msi_msg msi;
 };
   
   would become
   
 union {
 struct {
 unsigned irqchip;
 unsigned pin;
 } irqchip;
 struct {
 struct msi_msg msi;
 struct kvm_vpcu *dest;
 } msi;
 };
   
   or something like this.
  Ah so you want to do it only for MSI? For MSI it makes sense. Remember
  though that sometimes destination depend on message itself (specifically
  on delivery mode).
 
 Of course. We'll take message/data and precompute destination.
 Set to NULL for e.g. broadcast and recompute at injection time
 in that case.  BTW SELF doesn't work for MSI at the moment, not sure
 whether it's relevant or when is it used.
 
Yes, only lowest prio is defined for MSI. Self or all but self has
not meaning for MSI.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HAL type for Win2003 Server on recent KVM versions?

2010-11-18 Thread Cole Robinson
On 11/18/2010 09:05 AM, Kenni Lund wrote:
 2010/11/18 Avi Kivity a...@redhat.com:
 On 11/18/2010 12:58 AM, Kenni Lund wrote:

 Hi

 I'm about to move a couple of virtual machines from a Fedora 11 system
 to a new server with a more recent operating system and newer version
 of KVM, etc.

 One of the guests is a Windows Server 2003 Standard SP2, which is
 currently running with the ACPI Multiprocessor PC HAL.

 Considering moving to RHEL, I've been reading the virtualization
 documentation for RHEL 6.0, which says that I need to set HAL to
 Standard PC when installing a new Win2003 guest.

 Since my current guest has been running perfectly fine for a long time
 with its current HAL, I was wondering if the system will become
 unstable, unbootable or what the disadvantage will be, if I move the
 guest to for example RHEL 6.0, without reinstalling or upgrading the
 guest to select another HAL mode?

 On the other hand, it seems like I can upgrade from the current
 ACPI Multiprocessor PC into Standard PC, but I'm not sure if I'll
 gain anything by trying this.


 I suggest using the default HAL, whatever it is.  That's what everyone else
 is using so you get the best tested configuration.
 
 Thanks Avi, ACPI Multiprocessor PC was/is the default HAL, I didn't
 change anything when the system was originally installed.
 
 I'm curious why the RHEL 6 documentation claims that you actively need
 to select the Standard PC HAL on installation, if it's not even the
 recommended/preferred HAL...(?):
 Windows 2003 requires a specific computer type in order to install
 properly on a fully-virtualized guest. This needs to be specified at
 the beginning of the installation process.[1]
 
 [1] 
 http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization_Windows2003.html
 

I'm pretty sure that was incorrectly copied over from the RHEL5 xen
documentation. The docs people have been informed so it should be fixed
soon-ish.

- Cole

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: unattended installation cleanup

2010-11-18 Thread Lucas Meneghel Rodrigues
On Thu, 2010-11-18 at 12:44 -0200, Lucas Meneghel Rodrigues wrote:
 From: Jason Wang jasow...@redhat.com
 
 This patch does the following things:
 
 - Drop the built-in tftp/dhcp based unatteded installation for the
 following reason:
 
   1 It's based on slirp and was not supported by major
   distributions. It's only used for linux guest installation and we
   can simply replace it with the -kernel method used by network
   installation.
   2 The configuration was complex and hard to be shared with network
   based installation. After using -kernel method, most of the
   configurations could be shared and easy to be configurated.
 
   In order to achieve this:
 
   1 a new option 'boot_path' is used to specifiy the path of the
   kernel/initrd from the medium.
   2 autoyast file is detected through the extra_params instead of
   kernel_args (which is dropped with tftp option).
 
 - Re-strucutre the unattaneded installation related configurations and
   make them easy to be used and configurated.
 
 - Move cdrom related params into unattended_install.cdrom variants,
   as there's no need to launch with cdrom when testing a guest
   installed from network.
 
 Changes from v1:
 - Make possible to execute parallel guest installation (the 1st version
 of the patch was using a unique path for initrd.img and vmlinuz inside
 the host filesystem).
 - Reorganize some of the KVM autotest defaults
 
 Tested with RHEL/Fedora/Windows installation. OpenSUSE/SLES is untested.

Oh, by the way, I've tested the patch with OpenSUSE 11.3 and it works
like a charm. I am really really happy to get rid of slirp code
dependency, thank you very much Jason!

Lucas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM updates for 2.6.37-rc2

2010-11-18 Thread Marcelo Tosatti

Linus, please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git kvm-updates/2.6.37

To receive the following updates:

Avi Kivity (2):
  KVM: Correct ordering of ldt reload wrt fs/gs reload
  KVM: VMX: Fix host userspace gsbase corruption

 arch/x86/kvm/svm.c |2 +-
 arch/x86/kvm/vmx.c |   19 +--
 2 files changed, 10 insertions(+), 11 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()

2010-11-18 Thread Andi Kleen

On 11/18/2010 3:32 PM, Avi Kivity wrote:

On 11/18/2010 03:48 PM, Andi Kleen wrote:

On 11/18/2010 1:17 PM, Avi Kivity wrote:
cea15c2 (KVM: Move KVM context switch into own function) split 
vmx_vcpu_run()
to prevent multiple copies of the context switch from being 
generated (causing
problems due to a label).  This patch folds them back together again 
and adds

the __noclone attribute to prevent the label from being duplicated.


That won't work on gcc versions that didn't have __noclone yet. 
Noclone is fairly recent

(4.5 or 4.4)


Are the gcc versions that don't have noclone susceptible to cloning?


I believe the problem can happen due to inlining already

-Andi


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()

2010-11-18 Thread Avi Kivity

On 11/18/2010 05:00 PM, Andi Kleen wrote:

On 11/18/2010 3:32 PM, Avi Kivity wrote:

On 11/18/2010 03:48 PM, Andi Kleen wrote:

On 11/18/2010 1:17 PM, Avi Kivity wrote:
cea15c2 (KVM: Move KVM context switch into own function) split 
vmx_vcpu_run()
to prevent multiple copies of the context switch from being 
generated (causing
problems due to a label).  This patch folds them back together 
again and adds

the __noclone attribute to prevent the label from being duplicated.


That won't work on gcc versions that didn't have __noclone yet. 
Noclone is fairly recent

(4.5 or 4.4)


Are the gcc versions that don't have noclone susceptible to cloning?


I believe the problem can happen due to inlining already


vmx_vcpu_run() cannot be inlined (it is only called via a function 
pointer; call site is in a different module)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Avi Kivity

On 11/18/2010 01:42 AM, Anthony Liguori wrote:

Gack.  For the benefit of those that want to join the fun without
digging up the spec, these magic flippable segments the i440fx can
toggle are 12 fixed 16k segments from 0xc to 0xe and a single
64k segment from 0xf to 0xf.  There are read-enable and
write-enable bits for each, so the chipset can be configured to read
from the bios and write to memory (to setup BIOS-RAM caching), and read
from memory and write to the bios (to enable BIOS-RAM caching).  The
other bit combinations are also available.


Yup.  As Gleb mentions, there's the SDRAM register which controls 
whether 0xa is mapped to PCI or whether it's mapped to RAM (but 
KVM explicitly disabled SMM support).


KVM not supporting SMM is a bug (albeit one that is likely to remain 
unresolved for a while).  Let's pretend that kvm smm support is not an 
issue.


IIUC, SMM means that there two memory maps when the cpu accesses memory, 
one for SMM, one for non-SMM.





For my purpose in using this to program the IOMMU with guest physical to
host virtual addresses for device assignment, it doesn't really matter
since there should never be a DMA in this range of memory.  But for a
general RAM API, I'm not sure either.  I'm tempted to say that while
this is in fact a use of RAM, the RAM is never presented to the guest as
usable system memory (E820_RAM for x86), and should therefore be
excluded from the RAM API if we're using it only to track regions that
are actual guest usable physical memory.

We had talked on irc that pc.c should be registering 0x0 to
below_4g_mem_size as ram, but now I tend to disagree with that.  The
memory backing 0xa-0x10 is present, but it's not presented to
the guest as usable RAM.  What's your strict definition of what the RAM
API includes?  Is it only what the guest could consider usable RAM or
does it also include quirky chipset accelerator features like this
(everything with a guest physical address)?  Thanks,


Today we model on flat space that's a mixed of device memory, RAM, or 
ROM.  This is not how machines work and the limitations of this model 
is holding us back.


IRL, there's a block of RAM that's connected to a memory controller.  
The CPU is also connected to the memory controller.  Devices are 
connected to another controller which is in turn connected to the 
memory controller.  There may, in fact, be more than one controller 
between a device and the memory controller.


A controller may change the way a device sees memory in arbitrary 
ways.  In fact, two controllers accessing the same page might see 
something totally different.


The idea behind the RAM API is to begin to establish this hierarchy.  
RAM is not what any particular device sees--it's actual RAM.  IOW, the 
RAM API should represent what address mapping I would get if I talked 
directly to DIMMs.


This is not what RamBlock is even though the name would suggest 
otherwise.  RamBlocks are anything that qemu represents as cache 
consistency directly accessable memory.  Device ROMs and areas of 
device RAM are all allocated from the RamBlock space.


So the very first task of a RAM API is to simplify differentiate these 
two things.  Once we have the base RAM API, we can start adding the 
proper APIs that sit on top of it (like a PCI memory API).


Things aren't that bad - a ram_addr_t and a physical address are already 
different things, so we already have one level of translation.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Anthony Liguori

On 11/18/2010 09:22 AM, Avi Kivity wrote:

On 11/18/2010 01:42 AM, Anthony Liguori wrote:

Gack.  For the benefit of those that want to join the fun without
digging up the spec, these magic flippable segments the i440fx can
toggle are 12 fixed 16k segments from 0xc to 0xe and a single
64k segment from 0xf to 0xf.  There are read-enable and
write-enable bits for each, so the chipset can be configured to read
from the bios and write to memory (to setup BIOS-RAM caching), and read
from memory and write to the bios (to enable BIOS-RAM caching).  The
other bit combinations are also available.


Yup.  As Gleb mentions, there's the SDRAM register which controls 
whether 0xa is mapped to PCI or whether it's mapped to RAM (but 
KVM explicitly disabled SMM support).


KVM not supporting SMM is a bug (albeit one that is likely to remain 
unresolved for a while).  Let's pretend that kvm smm support is not an 
issue.


IIUC, SMM means that there two memory maps when the cpu accesses 
memory, one for SMM, one for non-SMM.


No.  That's not what it means.  With the i440fx, when the CPU accesses 
0xa, it gets forwarded to the PCI bus no different than an access to 
0xe.


If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU 
accesses to 0xa to RAM instead of the PCI bus.


Alternatively, if the SMRAM register is activated, then the i440fx will 
redirect 0xa to RAM regardless of whether the CPU asserts that 
signal.  That means that even without KVM supporting SMM, this mode can 
happen.


In general, the memory controller can redirect IO accesses to RAM or to 
the PCI bus.  The PCI bus may redirect the access to the ISA bus.


For my purpose in using this to program the IOMMU with guest 
physical to

host virtual addresses for device assignment, it doesn't really matter
since there should never be a DMA in this range of memory.  But for a
general RAM API, I'm not sure either.  I'm tempted to say that while
this is in fact a use of RAM, the RAM is never presented to the 
guest as

usable system memory (E820_RAM for x86), and should therefore be
excluded from the RAM API if we're using it only to track regions that
are actual guest usable physical memory.

We had talked on irc that pc.c should be registering 0x0 to
below_4g_mem_size as ram, but now I tend to disagree with that.  The
memory backing 0xa-0x10 is present, but it's not presented to
the guest as usable RAM.  What's your strict definition of what the RAM
API includes?  Is it only what the guest could consider usable RAM or
does it also include quirky chipset accelerator features like this
(everything with a guest physical address)?  Thanks,


Today we model on flat space that's a mixed of device memory, RAM, or 
ROM.  This is not how machines work and the limitations of this model 
is holding us back.


IRL, there's a block of RAM that's connected to a memory controller.  
The CPU is also connected to the memory controller.  Devices are 
connected to another controller which is in turn connected to the 
memory controller.  There may, in fact, be more than one controller 
between a device and the memory controller.


A controller may change the way a device sees memory in arbitrary 
ways.  In fact, two controllers accessing the same page might see 
something totally different.


The idea behind the RAM API is to begin to establish this hierarchy.  
RAM is not what any particular device sees--it's actual RAM.  IOW, 
the RAM API should represent what address mapping I would get if I 
talked directly to DIMMs.


This is not what RamBlock is even though the name would suggest 
otherwise.  RamBlocks are anything that qemu represents as cache 
consistency directly accessable memory.  Device ROMs and areas of 
device RAM are all allocated from the RamBlock space.


So the very first task of a RAM API is to simplify differentiate 
these two things.  Once we have the base RAM API, we can start adding 
the proper APIs that sit on top of it (like a PCI memory API).


Things aren't that bad - a ram_addr_t and a physical address are 
already different things, so we already have one level of translation.


Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
internal implementation detail.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Gleb Natapov
On Wed, Nov 17, 2010 at 05:42:28PM -0600, Anthony Liguori wrote:
 For my purpose in using this to program the IOMMU with guest physical to
 host virtual addresses for device assignment, it doesn't really matter
 since there should never be a DMA in this range of memory.  But for a
 general RAM API, I'm not sure either.  I'm tempted to say that while
 this is in fact a use of RAM, the RAM is never presented to the guest as
 usable system memory (E820_RAM for x86), and should therefore be
 excluded from the RAM API if we're using it only to track regions that
 are actual guest usable physical memory.
 
 We had talked on irc that pc.c should be registering 0x0 to
 below_4g_mem_size as ram, but now I tend to disagree with that.  The
 memory backing 0xa-0x10 is present, but it's not presented to
 the guest as usable RAM.  What's your strict definition of what the RAM
 API includes?  Is it only what the guest could consider usable RAM or
 does it also include quirky chipset accelerator features like this
 (everything with a guest physical address)?  Thanks,
 
 Today we model on flat space that's a mixed of device memory, RAM,
 or ROM.  This is not how machines work and the limitations of this
 model is holding us back.
 
 IRL, there's a block of RAM that's connected to a memory controller.
 The CPU is also connected to the memory controller.  Devices are
 connected to another controller which is in turn connected to the
 memory controller.  There may, in fact, be more than one controller
 between a device and the memory controller.
 
 A controller may change the way a device sees memory in arbitrary
 ways.  In fact, two controllers accessing the same page might see
 something totally different.
 
 The idea behind the RAM API is to begin to establish this hierarchy.
 RAM is not what any particular device sees--it's actual RAM.  IOW,
 the RAM API should represent what address mapping I would get if I
 talked directly to DIMMs.
 
 This is not what RamBlock is even though the name would suggest
 otherwise.  RamBlocks are anything that qemu represents as cache
 consistency directly accessable memory.  Device ROMs and areas of
 device RAM are all allocated from the RamBlock space.
 
 So the very first task of a RAM API is to simplify differentiate
 these two things.  Once we have the base RAM API, we can start
 adding the proper APIs that sit on top of it (like a PCI memory
 API).
 
+1 for all above. What happens when device access some address is
completely different from what happens when CPU access the same address
(or even another device on another bus). For instance how MSI is
implemented now CPU can send MSI by writing to 0xfee0 memory range.
I do not think you can do that on real HW.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Avi Kivity

On 11/18/2010 05:46 PM, Anthony Liguori wrote:

On 11/18/2010 09:22 AM, Avi Kivity wrote:

On 11/18/2010 01:42 AM, Anthony Liguori wrote:

Gack.  For the benefit of those that want to join the fun without
digging up the spec, these magic flippable segments the i440fx can
toggle are 12 fixed 16k segments from 0xc to 0xe and a single
64k segment from 0xf to 0xf.  There are read-enable and
write-enable bits for each, so the chipset can be configured to read
from the bios and write to memory (to setup BIOS-RAM caching), and 
read

from memory and write to the bios (to enable BIOS-RAM caching).  The
other bit combinations are also available.


Yup.  As Gleb mentions, there's the SDRAM register which controls 
whether 0xa is mapped to PCI or whether it's mapped to RAM (but 
KVM explicitly disabled SMM support).


KVM not supporting SMM is a bug (albeit one that is likely to remain 
unresolved for a while).  Let's pretend that kvm smm support is not 
an issue.


IIUC, SMM means that there two memory maps when the cpu accesses 
memory, one for SMM, one for non-SMM.


No.  That's not what it means.  With the i440fx, when the CPU accesses 
0xa, it gets forwarded to the PCI bus no different than an access 
to 0xe.


If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU 
accesses to 0xa to RAM instead of the PCI bus.


That's what two memory maps mean.  If you have one cpu in SMM and 
another outside SMM, then those two maps are active simultaneously.




Alternatively, if the SMRAM register is activated, then the i440fx 
will redirect 0xa to RAM regardless of whether the CPU asserts 
that signal.  That means that even without KVM supporting SMM, this 
mode can happen.


That's a single memory map that is modified under hardware control, it's 
no different than BARs and such.


Things aren't that bad - a ram_addr_t and a physical address are 
already different things, so we already have one level of translation.


Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
internal implementation detail.




Does it matter?  We can say those are addresses on the memory bus.  
Since they are not observable anyway, who cares if the correspond with 
reality or not?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Anthony Liguori

On 11/18/2010 09:57 AM, Avi Kivity wrote:

On 11/18/2010 05:46 PM, Anthony Liguori wrote:

On 11/18/2010 09:22 AM, Avi Kivity wrote:

On 11/18/2010 01:42 AM, Anthony Liguori wrote:

Gack.  For the benefit of those that want to join the fun without
digging up the spec, these magic flippable segments the i440fx can
toggle are 12 fixed 16k segments from 0xc to 0xe and a single
64k segment from 0xf to 0xf.  There are read-enable and
write-enable bits for each, so the chipset can be configured to read
from the bios and write to memory (to setup BIOS-RAM caching), and 
read

from memory and write to the bios (to enable BIOS-RAM caching).  The
other bit combinations are also available.


Yup.  As Gleb mentions, there's the SDRAM register which controls 
whether 0xa is mapped to PCI or whether it's mapped to RAM (but 
KVM explicitly disabled SMM support).


KVM not supporting SMM is a bug (albeit one that is likely to remain 
unresolved for a while).  Let's pretend that kvm smm support is not 
an issue.


IIUC, SMM means that there two memory maps when the cpu accesses 
memory, one for SMM, one for non-SMM.


No.  That's not what it means.  With the i440fx, when the CPU 
accesses 0xa, it gets forwarded to the PCI bus no different than 
an access to 0xe.


If the CPU asserts the EXF4#/Ab7# signal, then the i440fx directs CPU 
accesses to 0xa to RAM instead of the PCI bus.


That's what two memory maps mean.  If you have one cpu in SMM and 
another outside SMM, then those two maps are active simultaneously.


I'm not sure if more modern memory controllers do special things here, 
but for the i440fx, if any CPU asserts SMM mode, then any memory access 
to that space is going to access SMRAM.




Alternatively, if the SMRAM register is activated, then the i440fx 
will redirect 0xa to RAM regardless of whether the CPU asserts 
that signal.  That means that even without KVM supporting SMM, this 
mode can happen.


That's a single memory map that is modified under hardware control, 
it's no different than BARs and such.


There is a single block of RAM.

The memory controller may either forward an address unmodified to the 
RAM block or it may forward the address to the PCI bus[1].  A non CPU 
access goes through a controller hierarchy and may be modified while it 
transverses the hierarchy.


So really, we should have a big chunk of RAM that we associate with a 
guest, with a list of intercepts that changes as the devices are 
modified.  Instead of having that list dispatch directly to a device, we 
should send all intercepted accesses to the memory controller and let 
the memory controller propagate out the access to the appropriate device.


[1] The except is access to the local APIC.  That's handled directly by 
the CPU (or immediately outside of the CPU before the access gets to the 
memory controller if the local APIC is external to the CPU).


Things aren't that bad - a ram_addr_t and a physical address are 
already different things, so we already have one level of translation.


Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
internal implementation detail.




Does it matter?  We can say those are addresses on the memory bus.  
Since they are not observable anyway, who cares if the correspond with 
reality or not?


It matters a lot because the life cycle of RAM is different from the 
life cycle of ROM.


For instance, the original goal was to madvise(MADV_DONTNEED) RAM on 
reboot.  You can't do that to ROM because the contents matter.


But for PV devices, we can be loose in how we define the way the devices 
interact with the rest of the system.  For instance, we can say that 
virtio-pci devices are directly connected to RAM and do not go through 
the memory controllers.  That means we could get stable mappings of the 
virtio ring.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Minor emulator cleanups

2010-11-18 Thread Marcelo Tosatti
On Wed, Nov 17, 2010 at 01:40:49PM +0200, Avi Kivity wrote:
 A couple of trivial patches that clean up a bit of cruft from the emulator.
 
 Avi Kivity (2):
   KVM: x86 emulator: drop unused #ifndef __KERNEL__
   KVM: x86 emulator: drop DPRINTF()
 
  arch/x86/kvm/emulate.c |   14 +-
  1 files changed, 1 insertions(+), 13 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/6] KVM: MMU: don't drop spte if overwrite it from W to RO

2010-11-18 Thread Marcelo Tosatti
On Thu, Nov 18, 2010 at 03:12:56PM +0800, Xiao Guangrong wrote:
 On 11/17/2010 11:57 PM, Avi Kivity wrote:
 
set_pte:
update_spte(sptep, spte);
  +/*
  + * If we overwrite a writable spte with a read-only one we
  + * should flush remote TLBs. Otherwise rmap_write_protect
  + * will find a read-only spte, even though the writable spte
  + * might be cached on a CPU's TLB.
  + */
  +if (is_writable_pte(entry)  !is_writable_pte(*sptep))
  +kvm_flush_remote_tlbs(vcpu-kvm);
  There is no need to flush on sync_page path since the guest is
  responsible for it.
 
  
   If we don't, the next rmap_write_protect() will incorrectly decide that
  there's no need to flush tlbs.
  
 
 Maybe it's not a problem if guest can flush all tlbs after overwrite it?
 Marcelo, what's your comment about this?

It can, but there is no guarantee. Your patch is correct.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: fix and cleanup: kvm_lock and hardware_disable

2010-11-18 Thread Marcelo Tosatti
On Tue, Nov 16, 2010 at 05:32:44PM +0900, Takuya Yoshikawa wrote:
 Hello!
 
 During investigating kvm's mutual exclusions, starting from checking
 kvm's srcu grace periods, I could not understand some of the locking rules.
 
 This one is an example which I doubt.
 But I'm not so sure. Please check!
 
 Thanks,
   Takuya

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] Introduce segmented addresses to the emulator

2010-11-18 Thread Marcelo Tosatti
On Wed, Nov 17, 2010 at 03:28:20PM +0200, Avi Kivity wrote:
 Currently we lose segment information associated with memory operands.  This
 prevents us from doing proper segment checks.
 
 This patchset prepares the way by remembering which segment is associated
 with a memory operand.
 
 Avi Kivity (2):
   KVM: x86 emulator: preserve an operand's segment identity
   v2: truncate linear address to 32 bits if not in long mode (thanks Gleb)
   KVM: x86 emulator: do not perform address calculations on linear
 addresses
   v2: fix typo
 
  arch/x86/include/asm/kvm_emulate.h |5 +-
  arch/x86/kvm/emulate.c |  107 
 +++-
  2 files changed, 60 insertions(+), 52 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]KVM: VMX: Inform user about INTEL_TXT dependency

2010-11-18 Thread Marcelo Tosatti
On Wed, Nov 17, 2010 at 11:40:17AM +0800, Shane Wang wrote:
 Inform user to either disable TXT in the BIOS or do TXT launch with tboot 
 before enabling KVM since some BIOSes do not set 
 FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled.
 
 Signed-off-by: Shane Wang shane.w...@intel.com
 ---
  arch/x86/kvm/vmx.c |5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Avi Kivity

On 11/18/2010 06:09 PM, Anthony Liguori wrote:
That's what two memory maps mean.  If you have one cpu in SMM and 
another outside SMM, then those two maps are active simultaneously.



I'm not sure if more modern memory controllers do special things here, 
but for the i440fx, if any CPU asserts SMM mode, then any memory 
access to that space is going to access SMRAM.


How does SMP work then?

SMM Space Open (DOPEN). When DOPEN=1 and DLCK=0, SMM space DRAM is 
made visible even
when CPU cycle does not indicate SMM mode access via EXF4#/Ab7# 
signal. This is intended to help
BIOS initialize SMM space. Software should ensure that DOPEN=1 is 
mutually exclusive with DCLS=1.

When DLCK is set to a 1, DOPEN is set to 0 and becomes read only.


The words cpu cycle does not indicate SMM mode seem to say that SMM 
accesses are made on a per-transaction basis, or so my lawyers tell me.







Alternatively, if the SMRAM register is activated, then the i440fx 
will redirect 0xa to RAM regardless of whether the CPU asserts 
that signal.  That means that even without KVM supporting SMM, this 
mode can happen.


That's a single memory map that is modified under hardware control, 
it's no different than BARs and such.


There is a single block of RAM.

The memory controller may either forward an address unmodified to the 
RAM block or it may forward the address to the PCI bus[1].  A non CPU 
access goes through a controller hierarchy and may be modified while 
it transverses the hierarchy.


So really, we should have a big chunk of RAM that we associate with a 
guest, with a list of intercepts that changes as the devices are 
modified.  Instead of having that list dispatch directly to a device, 
we should send all intercepted accesses to the memory controller and 
let the memory controller propagate out the access to the appropriate 
device.


[1] The except is access to the local APIC.  That's handled directly 
by the CPU (or immediately outside of the CPU before the access gets 
to the memory controller if the local APIC is external to the CPU).




Agree.  However the point with SMM is that the dispatch is made not only 
based on the address, but also based on SMM mode (and, unfortunately, 
can also be different based on read vs write).


Things aren't that bad - a ram_addr_t and a physical address are 
already different things, so we already have one level of translation.


Yeah, but ram_addr_t doesn't model anything meaningful IRL.  It's an 
internal implementation detail.




Does it matter?  We can say those are addresses on the memory bus.  
Since they are not observable anyway, who cares if the correspond 
with reality or not?


It matters a lot because the life cycle of RAM is different from the 
life cycle of ROM.


For instance, the original goal was to madvise(MADV_DONTNEED) RAM on 
reboot.  You can't do that to ROM because the contents matter.


I don't think you can do that to RAM either.



But for PV devices, we can be loose in how we define the way the 
devices interact with the rest of the system.  For instance, we can 
say that virtio-pci devices are directly connected to RAM and do not 
go through the memory controllers.  That means we could get stable 
mappings of the virtio ring.


That wouldn't work once we have an iommu and start to assign them to 
nested guests.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Michael S. Tsirkin
On Thu, Nov 18, 2010 at 06:18:06PM +0200, Avi Kivity wrote:
 But for PV devices, we can be loose in how we define the way the
 devices interact with the rest of the system.  For instance, we
 can say that virtio-pci devices are directly connected to RAM and
 do not go through the memory controllers.  That means we could get
 stable mappings of the virtio ring.
 
 That wouldn't work once we have an iommu and start to assign them to
 nested guests.

Yea. Not sure whether I'm worried about that though.
Mixing in all the problems inherent in nested virt, PV and assigned
devices seems especially masochistic.

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/6] KVM: MMU: don't drop spte if overwrite it from W to RO

2010-11-18 Thread Avi Kivity

On 11/18/2010 05:32 PM, Marcelo Tosatti wrote:

There is no need to flush on sync_page path since the guest is
responsible for it.
  
  
 If we don't, the next rmap_write_protect() will incorrectly decide that
there's no need to flush tlbs.
  

  Maybe it's not a problem if guest can flush all tlbs after overwrite it?
  Marcelo, what's your comment about this?

It can, but there is no guarantee. Your patch is correct.


We keep tripping on the same problem again and again.  spte.w (and 
tlb.pte.w) is multiplexed between guest and host, hence we cannot trust 
the guest regarding its consistency.


I wish we had a systematic way of dealing with this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: fast-path msi injection with irqfd

2010-11-18 Thread Michael S. Tsirkin
Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.

While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.

This also adds some comments about locking rules and rcu usage in code.

Some notes on the design:
- Use pointer into the rt instead of copying an entry,
  to make it possible to use rcu, thus side-stepping
  locking complexities.  We also save some memory this way.
- Old workqueue code is still used for level irqs.
  I don't think we DTRT with level anyway, however,
  it seems easier to keep the code around as
  it has been thought through and debugged, and fix level later than
  rip out and re-instate it later.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

OK, this seems to work fine for me. Tested with virtio-net in guest
with and without vhost-net.  Pls review/apply if appropriate.

 include/linux/kvm_host.h |   16 
 virt/kvm/eventfd.c   |   91 --
 virt/kvm/irq_comm.c  |7 ++--
 3 files changed, 99 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a055742..4393c1b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -16,6 +16,7 @@
 #include linux/mm.h
 #include linux/preempt.h
 #include linux/msi.h
+#include linux/rcupdate.h
 #include asm/signal.h
 
 #include linux/kvm.h
@@ -206,6 +207,10 @@ struct kvm {
 
struct mutex irq_lock;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
+   /*
+* Update side is protected by irq_lock and,
+* if configured, irqfds.lock.
+*/
struct kvm_irq_routing_table __rcu *irq_routing;
struct hlist_head mask_notifier_list;
struct hlist_head irq_ack_notifier_list;
@@ -462,6 +467,8 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
+   int irq_source_id, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
@@ -603,17 +610,26 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) 
{}
 void kvm_eventfd_init(struct kvm *kvm);
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
 void kvm_irqfd_release(struct kvm *kvm);
+void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
 
 #else
 
 static inline void kvm_eventfd_init(struct kvm *kvm) {}
+
 static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
 {
return -EINVAL;
 }
 
 static inline void kvm_irqfd_release(struct kvm *kvm) {}
+
+static inline void kvm_irq_routing_update(struct kvm *kvm,
+ struct kvm_irq_routing_table *irq_rt)
+{
+   rcu_assign_pointer(kvm-irq_routing, irq_rt);
+}
+
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
return -ENOSYS;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c1f1e3c..2ca4535 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -44,14 +44,19 @@
  */
 
 struct _irqfd {
-   struct kvm   *kvm;
-   struct eventfd_ctx   *eventfd;
-   int   gsi;
-   struct list_head  list;
-   poll_tablept;
-   wait_queue_t  wait;
-   struct work_structinject;
-   struct work_structshutdown;
+   /* Used for MSI fast-path */
+   struct kvm *kvm;
+   wait_queue_t wait;
+   /* Update side is protected by irqfds.lock */
+   struct kvm_kernel_irq_routing_entry __rcu *irq_entry;
+   /* Used for level IRQ fast-path */
+   int gsi;
+   struct work_struct inject;
+   /* Used for setup/shutdown */
+   struct eventfd_ctx *eventfd;
+   struct list_head list;
+   poll_table pt;
+   struct work_struct shutdown;
 };
 
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -125,14 +130,22 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
 {
struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
unsigned long flags = (unsigned long)key;
+   struct kvm_kernel_irq_routing_entry *irq;
+   struct kvm *kvm = irqfd-kvm;
 
-   if (flags  POLLIN)
+   if (flags  POLLIN) {
+   rcu_read_lock();
+   irq = rcu_dereference(irqfd-irq_entry);
/* An event has been signaled, inject an interrupt */
-   schedule_work(irqfd-inject);
+   if (irq)
+   kvm_set_msi(irq, kvm, 

Re: HAL type for Win2003 Server on recent KVM versions?

2010-11-18 Thread Kenni Lund
2010/11/18 Cole Robinson crobi...@redhat.com:
 On 11/18/2010 09:05 AM, Kenni Lund wrote:
 I'm curious why the RHEL 6 documentation claims that you actively need
 to select the Standard PC HAL on installation, if it's not even the
 recommended/preferred HAL...(?):
 Windows 2003 requires a specific computer type in order to install
 properly on a fully-virtualized guest. This needs to be specified at
 the beginning of the installation process.[1]

 [1] 
 http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization_Windows2003.html


 I'm pretty sure that was incorrectly copied over from the RHEL5 xen
 documentation. The docs people have been informed so it should be fixed
 soon-ish.

Perfect, thanks! :)

Best regards
Kenni
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] Minimal RAM API support

2010-11-18 Thread Alex Williamson
v3:

 - Address review comments
 - pc registers all memory below 4G in one chunk

Let me know if there are any further issues.  Thanks,

Alex

v2:

 - Move to Makefile.objs
 - Move structures to memory.c and create a callback function
 - Fix memory leak

I haven't moved to the state parameter because there should only
be a single instance of this per VM.  The state parameter seems
like it would add complications in setup and function calling, but
maybe point me to an example if I'm off base.  Thanks,

Alex

v1:

For VFIO based device assignment, we need to know what guest memory
areas are actual RAM.  RAMBlocks have long since become a grab bag
of misc allocations, so aren't effective for this.  Anthony has had
a RAM API in mind for a while now that addresses this problem.  This
implements just enough of it so that we have an interface to get
actual guest memory physical addresses to setup the host IOMMU.  We
can continue building a full RAM API on top of this stub.

Anthony, feel free to add copyright to memory.c as it's based on
your initial implementation.  I had to add something since the file
in your branch just copies a header with Frabrice's copywrite.
Thanks,

Alex

---

Alex Williamson (2):
  RAM API: Make use of it for x86 PC
  Minimal RAM API support


 Makefile.objs |1 +
 cpu-common.h  |2 +
 hw/pc.c   |9 ++---
 memory.c  |   97 +
 memory.h  |   44 ++
 5 files changed, 147 insertions(+), 6 deletions(-)
 create mode 100644 memory.c
 create mode 100644 memory.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] Minimal RAM API support

2010-11-18 Thread Alex Williamson
This adds a minimum chunk of Anthony's RAM API support so that we
can identify actual VM RAM versus all the other things that make
use of qemu_ram_alloc.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Makefile.objs |1 +
 cpu-common.h  |2 +
 memory.c  |   97 +
 memory.h  |   44 ++
 4 files changed, 144 insertions(+), 0 deletions(-)
 create mode 100644 memory.c
 create mode 100644 memory.h

diff --git a/Makefile.objs b/Makefile.objs
index f07fb01..33fae0b 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -154,6 +154,7 @@ hw-obj-y += vl.o loader.o
 hw-obj-y += virtio.o virtio-console.o
 hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o
 hw-obj-y += watchdog.o
+hw-obj-y += memory.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 hw-obj-$(CONFIG_ECC) += ecc.o
 hw-obj-$(CONFIG_NAND) += nand.o
diff --git a/cpu-common.h b/cpu-common.h
index a543b5d..6aa2738 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -23,6 +23,8 @@
 /* address in the RAM (different from a physical address) */
 typedef unsigned long ram_addr_t;
 
+#include memory.h
+
 /* memory API */
 
 typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
uint32_t value);
diff --git a/memory.c b/memory.c
new file mode 100644
index 000..742776f
--- /dev/null
+++ b/memory.c
@@ -0,0 +1,97 @@
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamson alex.william...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+#include memory.h
+#include range.h
+
+typedef struct ram_slot {
+target_phys_addr_t start_addr;
+ram_addr_t size;
+ram_addr_t offset;
+QLIST_ENTRY(ram_slot) next;
+} ram_slot;
+
+static QLIST_HEAD(ram_slots, ram_slot) ram_slots =
+QLIST_HEAD_INITIALIZER(ram_slots);
+
+static ram_slot *qemu_ram_find_slot(target_phys_addr_t start_addr,
+   ram_addr_t size)
+{
+ram_slot *slot;
+
+QLIST_FOREACH(slot, ram_slots, next) {
+if (slot-start_addr == start_addr  slot-size == size) {
+return slot;
+}
+
+if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) {
+hw_error(Ram range overlaps existing slot\n);
+}
+}
+
+return NULL;
+}
+
+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+  ram_addr_t phys_offset)
+{
+ram_slot *slot;
+
+if (!size) {
+return -EINVAL;
+}
+
+assert(!qemu_ram_find_slot(start_addr, size));
+
+slot = qemu_mallocz(sizeof(ram_slot));
+
+slot-start_addr = start_addr;
+slot-size = size;
+slot-offset = phys_offset;
+
+QLIST_INSERT_HEAD(ram_slots, slot, next);
+
+cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset);
+
+return 0;
+}
+
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
+{
+ram_slot *slot;
+
+if (!size) {
+return;
+}
+
+slot = qemu_ram_find_slot(start_addr, size);
+assert(slot != NULL);
+
+QLIST_REMOVE(slot, next);
+qemu_free(slot);
+cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
+
+return;
+}
+
+int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn)
+{
+ram_slot *slot;
+
+QLIST_FOREACH(slot, ram_slots, next) {
+int ret = fn(opaque, slot-start_addr, slot-size, slot-offset);
+if (ret) {
+return ret;
+}
+}
+return 0;
+}
diff --git a/memory.h b/memory.h
new file mode 100644
index 000..e7aa5cb
--- /dev/null
+++ b/memory.h
@@ -0,0 +1,44 @@
+#ifndef QEMU_MEMORY_H
+#define QEMU_MEMORY_H
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamson alex.william...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include qemu-common.h
+#include cpu-common.h
+
+typedef int (*qemu_ram_for_each_slot_fn)(void *opaque,
+ target_phys_addr_t start_addr,
+ ram_addr_t size,
+ ram_addr_t phys_offset);
+
+/**
+ * qemu_ram_register() : Register a region of guest physical memory
+ *
+ * The new region must not overlap an existing region.
+ */
+int qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+  ram_addr_t phys_offset);
+
+/**
+ * qemu_ram_unregister() : Unregister a region of guest physical memory
+ */
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size);
+
+/**
+ * qemu_ram_for_each_slot() : Call fn() on each registered region
+ *
+ * Stop on non-zero return from fn().
+ */
+int qemu_ram_for_each_slot(void *opaque, qemu_ram_for_each_slot_fn fn);
+
+#endif /* QEMU_MEMORY_H */

--
To unsubscribe 

[PATCH v3 2/2] RAM API: Make use of it for x86 PC

2010-11-18 Thread Alex Williamson
Register the actual VM RAM using the new API

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/pc.c |9 +++--
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..fb7ee21 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -912,14 +912,11 @@ void pc_memory_init(ram_addr_t ram_size,
 /* allocate RAM */
 ram_addr = qemu_ram_alloc(NULL, pc.ram,
   below_4g_mem_size + above_4g_mem_size);
-cpu_register_physical_memory(0, 0xa, ram_addr);
-cpu_register_physical_memory(0x10,
- below_4g_mem_size - 0x10,
- ram_addr + 0x10);
+qemu_ram_register(0, below_4g_mem_size, ram_addr);
 #if TARGET_PHYS_ADDR_BITS  32
 if (above_4g_mem_size  0) {
-cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
- ram_addr + below_4g_mem_size);
+qemu_ram_register(0x1ULL, above_4g_mem_size,
+  ram_addr + below_4g_mem_size);
 }
 #endif
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ppc32 build failed

2010-11-18 Thread Yang Rui Rui

Hi,

I searched the archive found some discutions about this, not fixed yet?
could someone tell, is g4 kvm available now?

powerpc g4 build failed (host kernel 2.6.37-rc2):

  CCppc-softmmu/kvm.o
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu':
/home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no 
member named 'pvr'
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers':
/home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c: At top level:
/home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 
'kvm_arch_process_irqchip_events'
/home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 
'kvm_arch_process_irqchip_events' was here
make[1]: *** [kvm.o] Error 1
make: *** [subdir-ppc-softmmu] Error 2

--
To adhere means to yield. To yield means to adhere.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppc32 build failed

2010-11-18 Thread Takuya Yoshikawa

(2010/11/19 15:01), Yang Rui Rui wrote:

Hi,

I searched the archive found some discutions about this, not fixed yet?
could someone tell, is g4 kvm available now?


Hi, (added kvm-ppc to Cc)

I'm using g4 (Mac mini box) to run KVM.
  - though not tried 2.6.37-rc2 yet.

Aren't you using upstream qemu?
IIRC, ppc kvm needs to use upstream qemu.


You can see useful information on KVM PowerPC port page.
  http://www.linux-kvm.org/page/PowerPC

  - no g4 example but we can find enough information.


BTW, why the entry Book3S PPC32 in the processor support table
is still NO, anyone?
  http://www.linux-kvm.org/page/Processor_support


I don't know well about PPC, so you need to ask Alex about
more technical issues.


Takuya




powerpc g4 build failed (host kernel 2.6.37-rc2):

CC ppc-softmmu/kvm.o
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu':
/home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no 
member named 'pvr'
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers':
/home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c: At top level:
/home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 
'kvm_arch_process_irqchip_events'
/home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 
'kvm_arch_process_irqchip_events' was here
make[1]: *** [kvm.o] Error 1
make: *** [subdir-ppc-softmmu] Error 2



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppc32 build failed

2010-11-18 Thread Yang Rui Rui

On 11/19/2010 02:24 PM, Takuya Yoshikawa wrote:

(2010/11/19 15:01), Yang Rui Rui wrote:

Hi,

I searched the archive found some discutions about this, not fixed yet?
could someone tell, is g4 kvm available now?


Hi, (added kvm-ppc to Cc)

I'm using g4 (Mac mini box) to run KVM.
- though not tried 2.6.37-rc2 yet.

Aren't you using upstream qemu?
IIRC, ppc kvm needs to use upstream qemu.


I use qemu-kvm git version. Do you means qemu instead of qemu-kvm?




You can see useful information on KVM PowerPC port page.
http://www.linux-kvm.org/page/PowerPC

-  no g4 example but we can find enough information.


BTW, why the entry Book3S PPC32 in the processor support table
is still NO, anyone?
http://www.linux-kvm.org/page/Processor_support



Thanks a lot



I don't know well about PPC, so you need to ask Alex about
more technical issues.


Takuya




powerpc g4 build failed (host kernel 2.6.37-rc2):

CC ppc-softmmu/kvm.o
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu':
/home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no 
member named 'pvr'
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers':
/home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c: At top level:
/home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 
'kvm_arch_process_irqchip_events'
/home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 
'kvm_arch_process_irqchip_events' was here
make[1]: *** [kvm.o] Error 1
make: *** [subdir-ppc-softmmu] Error 2






--
To adhere means to yield. To yield means to adhere.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppc32 build failed

2010-11-18 Thread Yang Rui Rui

On 11/19/2010 02:37 PM, Yang Ruirui R wrote:

On 11/19/2010 02:24 PM, Takuya Yoshikawa wrote:

(2010/11/19 15:01), Yang Rui Rui wrote:

Hi,

I searched the archive found some discutions about this, not fixed yet?
could someone tell, is g4 kvm available now?


Hi, (added kvm-ppc to Cc)

I'm using g4 (Mac mini box) to run KVM.
 - though not tried 2.6.37-rc2 yet.

Aren't you using upstream qemu?
IIRC, ppc kvm needs to use upstream qemu.


I use qemu-kvm git version. Do you means qemu instead of qemu-kvm?


Hi, qemu 0.13.0 build passed






You can see useful information on KVM PowerPC port page.
 http://www.linux-kvm.org/page/PowerPC

 -   no g4 example but we can find enough information.


BTW, why the entry Book3S PPC32 in the processor support table
is still NO, anyone?
 http://www.linux-kvm.org/page/Processor_support



Thanks a lot



I don't know well about PPC, so you need to ask Alex about
more technical issues.


Takuya




powerpc g4 build failed (host kernel 2.6.37-rc2):

CC ppc-softmmu/kvm.o
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu':
/home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no 
member named 'pvr'
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers':
/home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c: At top level:
/home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 
'kvm_arch_process_irqchip_events'
/home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 
'kvm_arch_process_irqchip_events' was here
make[1]: *** [kvm.o] Error 1
make: *** [subdir-ppc-softmmu] Error 2









--
To adhere means to yield. To yield means to adhere.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppc32 build failed

2010-11-18 Thread Takuya Yoshikawa

Aren't you using upstream qemu?
IIRC, ppc kvm needs to use upstream qemu.


I use qemu-kvm git version. Do you means qemu instead of qemu-kvm?


Hi, qemu 0.13.0 build passed


Yes, that what I meant!


Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2010-11-18 Thread satimis

http://maralemprendimientos.com/important.php
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppc32 build failed

2010-11-18 Thread Takuya Yoshikawa

(2010/11/19 15:01), Yang Rui Rui wrote:

Hi,

I searched the archive found some discutions about this, not fixed yet?
could someone tell, is g4 kvm available now?


Hi, (added kvm-ppc to Cc)

I'm using g4 (Mac mini box) to run KVM.
  - though not tried 2.6.37-rc2 yet.

Aren't you using upstream qemu?
IIRC, ppc kvm needs to use upstream qemu.


You can see useful information on KVM PowerPC port page.
  http://www.linux-kvm.org/page/PowerPC

  - no g4 example but we can find enough information.


BTW, why the entry Book3S PPC32 in the processor support table
is still NO, anyone?
  http://www.linux-kvm.org/page/Processor_support


I don't know well about PPC, so you need to ask Alex about
more technical issues.


Takuya




powerpc g4 build failed (host kernel 2.6.37-rc2):

CC ppc-softmmu/kvm.o
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_init_vcpu':
/home/dave/qemu-kvm/target-ppc/kvm.c:81: error: 'struct kvm_sregs' has no 
member named 'pvr'
/home/dave/qemu-kvm/target-ppc/kvm.c: In function 'kvm_arch_get_registers':
/home/dave/qemu-kvm/target-ppc/kvm.c:168: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:180: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:185: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:186: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:187: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c:188: error: 'struct kvm_sregs' has no 
member named 'u'
/home/dave/qemu-kvm/target-ppc/kvm.c: At top level:
/home/dave/qemu-kvm/target-ppc/kvm.c:261: error: conflicting types for 
'kvm_arch_process_irqchip_events'
/home/dave/qemu-kvm/qemu-kvm.h:692: error: previous declaration of 
'kvm_arch_process_irqchip_events' was here
make[1]: *** [kvm.o] Error 1
make: *** [subdir-ppc-softmmu] Error 2



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html