[Xen-devel] [PATCH 3.19.y-ckt 53/66] x86/mm/xen: Suppress hugetlbfs in PV guests

2016-04-26 Thread Kamal Mostafa
3.19.8-ckt20 -stable review patch.  If anyone has any objections, please let me 
know.

---8<

From: Jan Beulich <jbeul...@suse.com>

commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode:  [#1] SMP
  ...
  RIP: e030:[]  [] 
remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [] hugetlbfs_evict_inode+0x15/0x40
   [] evict+0xbd/0x1b0
   [] __dentry_kill+0x19a/0x1f0
   [] dput+0x1fe/0x220
   [] __fput+0x155/0x200
   [] task_work_run+0x60/0xa0
   [] do_exit+0x160/0x400
   [] do_group_exit+0x3b/0xa0
   [] get_signal+0x1ed/0x470
   [] do_signal+0x14/0x110
   [] prepare_exit_to_usermode+0xe9/0xf0
   [] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: David Vrabel <david.vra...@citrix.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Luis R. Rodriguez <mcg...@suse.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Toshi Kani <toshi.k...@hp.com>
Cc: xen-devel <xen-de...@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ed802000078000e4...@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/hugetlb.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 68c0539..7aadd3c 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 
+#define hugepages_supported() cpu_has_pse
 
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr,
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [3.19.y-ckt stable] Patch "x86/mm/xen: Suppress hugetlbfs in PV guests" has been added to the 3.19.y-ckt tree

2016-04-26 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/mm/xen: Suppress hugetlbfs in PV guests

to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue

This patch is scheduled to be released in version 3.19.8-ckt20.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.19.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

---8<

From d34f71f0ba7a50c24907c38797f54bc303b4ca44 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeul...@suse.com>
Date: Thu, 21 Apr 2016 00:27:04 -0600
Subject: x86/mm/xen: Suppress hugetlbfs in PV guests

commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode:  [#1] SMP
  ...
  RIP: e030:[]  [] 
remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [] hugetlbfs_evict_inode+0x15/0x40
   [] evict+0xbd/0x1b0
   [] __dentry_kill+0x19a/0x1f0
   [] dput+0x1fe/0x220
   [] __fput+0x155/0x200
   [] task_work_run+0x60/0xa0
   [] do_exit+0x160/0x400
   [] do_group_exit+0x3b/0xa0
   [] get_signal+0x1ed/0x470
   [] do_signal+0x14/0x110
   [] prepare_exit_to_usermode+0xe9/0xf0
   [] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: David Vrabel <david.vra...@citrix.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Luis R. Rodriguez <mcg...@suse.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Toshi Kani <toshi.k...@hp.com>
Cc: xen-devel <xen-de...@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ed802000078000e4...@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/hugetlb.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 68c0539..7aadd3c 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -4,6 +4,7 @@
 #include 
 #include 

+#define hugepages_supported() cpu_has_pse

 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr,
--
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4.2.y-ckt 77/93] x86/mm/xen: Suppress hugetlbfs in PV guests

2016-04-26 Thread Kamal Mostafa
4.2.8-ckt9 -stable review patch.  If anyone has any objections, please let me 
know.

---8<

From: Jan Beulich <jbeul...@suse.com>

commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode:  [#1] SMP
  ...
  RIP: e030:[]  [] 
remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [] hugetlbfs_evict_inode+0x15/0x40
   [] evict+0xbd/0x1b0
   [] __dentry_kill+0x19a/0x1f0
   [] dput+0x1fe/0x220
   [] __fput+0x155/0x200
   [] task_work_run+0x60/0xa0
   [] do_exit+0x160/0x400
   [] do_group_exit+0x3b/0xa0
   [] get_signal+0x1ed/0x470
   [] do_signal+0x14/0x110
   [] prepare_exit_to_usermode+0xe9/0xf0
   [] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: David Vrabel <david.vra...@citrix.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Luis R. Rodriguez <mcg...@suse.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Toshi Kani <toshi.k...@hp.com>
Cc: xen-devel <xen-de...@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ed802000078000e4...@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/hugetlb.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index f8a29d2..e6a8613 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 
+#define hugepages_supported() cpu_has_pse
 
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr,
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [4.2.y-ckt stable] Patch "x86/mm/xen: Suppress hugetlbfs in PV guests" has been added to the 4.2.y-ckt tree

2016-04-25 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/mm/xen: Suppress hugetlbfs in PV guests

to the linux-4.2.y-queue branch of the 4.2.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-4.2.y-queue

This patch is scheduled to be released in version 4.2.8-ckt9.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 4.2.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

---8<

From 6b573233de517e24ffd20d02374f0a5048285f5b Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeul...@suse.com>
Date: Thu, 21 Apr 2016 00:27:04 -0600
Subject: x86/mm/xen: Suppress hugetlbfs in PV guests

commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode:  [#1] SMP
  ...
  RIP: e030:[]  [] 
remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [] hugetlbfs_evict_inode+0x15/0x40
   [] evict+0xbd/0x1b0
   [] __dentry_kill+0x19a/0x1f0
   [] dput+0x1fe/0x220
   [] __fput+0x155/0x200
   [] task_work_run+0x60/0xa0
   [] do_exit+0x160/0x400
   [] do_group_exit+0x3b/0xa0
   [] get_signal+0x1ed/0x470
   [] do_signal+0x14/0x110
   [] prepare_exit_to_usermode+0xe9/0xf0
   [] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Signed-off-by: Jan Beulich <jbeul...@suse.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: David Vrabel <david.vra...@citrix.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Luis R. Rodriguez <mcg...@suse.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Toshi Kani <toshi.k...@hp.com>
Cc: xen-devel <xen-de...@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ed802000078000e4...@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/hugetlb.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index f8a29d2..e6a8613 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -4,6 +4,7 @@
 #include 
 #include 

+#define hugepages_supported() cpu_has_pse

 static inline int is_hugepage_only_range(struct mm_struct *mm,
 unsigned long addr,
--
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4.2.y-ckt 286/305] x86/paravirt: Prevent rtc_cmos platform device init on PV guests

2016-01-15 Thread Kamal Mostafa
4.2.8-ckt2 -stable review patch.  If anyone has any objections, please let me 
know.

---8<

From: David Vrabel <david.vra...@citrix.com>

commit d8c98a1d1488747625ad6044d423406e17e99b7a upstream.

Adding the rtc platform device in non-privileged Xen PV guests causes
an IRQ conflict because these guests do not have legacy PIC and may
allocate irqs in the legacy range.

In a single VCPU Xen PV guest we should have:

/proc/interrupts:
   CPU0
  0:   4934  xen-percpu-virq  timer0
  1:  0  xen-percpu-ipi   spinlock0
  2:  0  xen-percpu-ipi   resched0
  3:  0  xen-percpu-ipi   callfunc0
  4:  0  xen-percpu-virq  debug0
  5:  0  xen-percpu-ipi   callfuncsingle0
  6:  0  xen-percpu-ipi   irqwork0
  7:321   xen-dyn-event xenbus
  8: 90   xen-dyn-event hvc_console
  ...

But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.

  genirq: Flags mismatch irq 8.  (hvc_console) vs.  (rtc0)

We can avoid this problem by realizing that unprivileged PV guests (both
Xen and lguests) are not supposed to have rtc_cmos device and so
adding it is not necessary.

Privileged guests (i.e. Xen's dom0) do use it but they should not have
irq conflicts since they allocate irqs above legacy range (above
gsi_top, in fact).

Instead of explicitly testing whether the guest is privileged we can
extend pv_info structure to include information about guest's RTC
support.

Reported-and-tested-by: Sander Eikelenboom <li...@eikelenboom.it>
Signed-off-by: David Vrabel <david.vra...@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: vkuzn...@redhat.com
Cc: xen-de...@lists.xenproject.org
Cc: konrad.w...@oracle.com
Link: 
http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrov...@oracle.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/paravirt.h   | 6 ++
 arch/x86/include/asm/paravirt_types.h | 5 +
 arch/x86/include/asm/processor.h  | 1 +
 arch/x86/kernel/rtc.c | 3 +++
 arch/x86/lguest/boot.c| 1 +
 arch/x86/xen/enlighten.c  | 4 +++-
 6 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d143bfa..d0791ac 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -19,6 +19,12 @@ static inline int paravirt_enabled(void)
return pv_info.paravirt_enabled;
 }
 
+static inline int paravirt_has_feature(unsigned int feature)
+{
+   WARN_ON_ONCE(!pv_info.paravirt_enabled);
+   return (pv_info.features & feature);
+}
+
 static inline void load_sp0(struct tss_struct *tss,
 struct thread_struct *thread)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index a6b8f9f..af3b3b7 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -70,9 +70,14 @@ struct pv_info {
 #endif
 
int paravirt_enabled;
+   unsigned int features;/* valid only if paravirt_enabled is set */
const char *name;
 };
 
+#define paravirt_has(x) paravirt_has_feature(PV_SUPPORTED_##x)
+/* Supported features */
+#define PV_SUPPORTED_RTC(1<<0)
+
 struct pv_init_ops {
/*
 * Patch may replace one of the defined code sequences with
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 944f178..9b3bb51 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -478,6 +478,7 @@ static inline unsigned long current_top_of_stack(void)
 #else
 #define __cpuidnative_cpuid
 #define paravirt_enabled() 0
+#define paravirt_has(x)0
 
 static inline void load_sp0(struct tss_struct *tss,
struct thread_struct *thread)
diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index cd96852..4af8d06 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -200,6 +200,9 @@ static __init int add_rtc_cmos(void)
}
 #endif
 
+   if (paravirt_enabled() && !paravirt_has(RTC))
+   return -ENODEV;
+
platform_device_register(_device);
dev_info(_device.dev,
 "registered platform RTC device (no PNP device found)\n");
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index f2dc08c..5c4792e 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1419,6 +1419,7 @@ __init void lguest_init(void)
pv_info.kernel_rpl = 1;
/* Everyone except Xen runs with this set. */
pv_info.shared_kernel_pmd = 1;
+   pv_info.features = 0;
 
   

[Xen-devel] [4.2.y-ckt stable] Patch "x86/paravirt: Prevent rtc_cmos platform device init on PV guests" has been added to the 4.2.y-ckt tree

2016-01-15 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/paravirt: Prevent rtc_cmos platform device init on PV guests

to the linux-4.2.y-queue branch of the 4.2.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-4.2.y-queue

This patch is scheduled to be released in version 4.2.8-ckt2.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 4.2.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

---8<

>From 0dd88836fb7f14a4d4a3d5e179e70f4aff8d9e6c Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vra...@citrix.com>
Date: Fri, 11 Dec 2015 09:07:53 -0500
Subject: x86/paravirt: Prevent rtc_cmos platform device init on PV guests

commit d8c98a1d1488747625ad6044d423406e17e99b7a upstream.

Adding the rtc platform device in non-privileged Xen PV guests causes
an IRQ conflict because these guests do not have legacy PIC and may
allocate irqs in the legacy range.

In a single VCPU Xen PV guest we should have:

/proc/interrupts:
   CPU0
  0:   4934  xen-percpu-virq  timer0
  1:  0  xen-percpu-ipi   spinlock0
  2:  0  xen-percpu-ipi   resched0
  3:  0  xen-percpu-ipi   callfunc0
  4:  0  xen-percpu-virq  debug0
  5:  0  xen-percpu-ipi   callfuncsingle0
  6:  0  xen-percpu-ipi   irqwork0
  7:321   xen-dyn-event xenbus
  8: 90   xen-dyn-event hvc_console
  ...

But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.

  genirq: Flags mismatch irq 8.  (hvc_console) vs.  (rtc0)

We can avoid this problem by realizing that unprivileged PV guests (both
Xen and lguests) are not supposed to have rtc_cmos device and so
adding it is not necessary.

Privileged guests (i.e. Xen's dom0) do use it but they should not have
irq conflicts since they allocate irqs above legacy range (above
gsi_top, in fact).

Instead of explicitly testing whether the guest is privileged we can
extend pv_info structure to include information about guest's RTC
support.

Reported-and-tested-by: Sander Eikelenboom <li...@eikelenboom.it>
Signed-off-by: David Vrabel <david.vra...@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: vkuzn...@redhat.com
Cc: xen-de...@lists.xenproject.org
Cc: konrad.w...@oracle.com
Link: 
http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrov...@oracle.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/paravirt.h   | 6 ++
 arch/x86/include/asm/paravirt_types.h | 5 +
 arch/x86/include/asm/processor.h  | 1 +
 arch/x86/kernel/rtc.c | 3 +++
 arch/x86/lguest/boot.c| 1 +
 arch/x86/xen/enlighten.c  | 4 +++-
 6 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d143bfa..d0791ac 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -19,6 +19,12 @@ static inline int paravirt_enabled(void)
return pv_info.paravirt_enabled;
 }

+static inline int paravirt_has_feature(unsigned int feature)
+{
+   WARN_ON_ONCE(!pv_info.paravirt_enabled);
+   return (pv_info.features & feature);
+}
+
 static inline void load_sp0(struct tss_struct *tss,
 struct thread_struct *thread)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index a6b8f9f..af3b3b7 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -70,9 +70,14 @@ struct pv_info {
 #endif

int paravirt_enabled;
+   unsigned int features;/* valid only if paravirt_enabled is set */
const char *name;
 };

+#define paravirt_has(x) paravirt_has_feature(PV_SUPPORTED_##x)
+/* Supported features */
+#define PV_SUPPORTED_RTC(1<<0)
+
 struct pv_init_ops {
/*
 * Patch may replace one of the defined code sequences with
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 944f178..9b3bb51 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -478,6 +478,7 @@ static inline unsigned long current_top_of_stack(void)
 #else
 #define __cpuidnative_cpuid
 #define paravirt_enabled() 0
+#define paravirt_has(x)0

 static inline void load_sp0(struct tss_struct *tss,
struct thread_struct *thread)
diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index cd96852..4af8d06 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ 

[Xen-devel] [PATCH 3.19.y-ckt 028/128] x86/cpu: Fix SMAP check in PVOPS environments

2015-12-16 Thread Kamal Mostafa
3.19.8-ckt12 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: Andrew Cooper <andrew.coop...@citrix.com>

commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream.

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: David Vrabel <david.vra...@citrix.com>
Tested-by: Rusty Russell <ru...@rustcorp.com.au>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: <lgu...@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.coop...@citrix.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/kernel/cpu/common.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 382a097..547ba45 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -290,10 +290,9 @@ __setup("nosmap", setup_disable_smap);
 
 static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 {
-   unsigned long eflags;
+   unsigned long eflags = native_save_fl();
 
/* This should have been cleared long ago */
-   raw_local_save_flags(eflags);
BUG_ON(eflags & X86_EFLAGS_AC);
 
if (cpu_has(c, X86_FEATURE_SMAP)) {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [3.13.y-ckt stable] Patch "x86/cpu: Fix SMAP check in PVOPS environments" has been added to staging queue

2015-12-16 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/cpu: Fix SMAP check in PVOPS environments

to the linux-3.13.y-queue branch of the 3.13.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.13.y-queue

This patch is scheduled to be released in version 3.13.11-ckt32.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.13.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

--

>From b42893a1c237d2422fe81303afc4e25001d2bfb5 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.coop...@citrix.com>
Date: Wed, 3 Jun 2015 10:31:14 +0100
Subject: x86/cpu: Fix SMAP check in PVOPS environments

commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream.

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: David Vrabel <david.vra...@citrix.com>
Tested-by: Rusty Russell <ru...@rustcorp.com.au>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: <lgu...@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.coop...@citrix.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/kernel/cpu/common.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 37c4f31..a1f7c91 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -280,10 +280,9 @@ __setup("nosmap", setup_disable_smap);

 static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 {
-   unsigned long eflags;
+   unsigned long eflags = native_save_fl();

/* This should have been cleared long ago */
-   raw_local_save_flags(eflags);
BUG_ON(eflags & X86_EFLAGS_AC);

if (cpu_has(c, X86_FEATURE_SMAP)) {
--
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3.13.y-ckt 24/78] x86/cpu: Fix SMAP check in PVOPS environments

2015-12-16 Thread Kamal Mostafa
3.13.11-ckt32 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Andrew Cooper <andrew.coop...@citrix.com>

commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream.

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: David Vrabel <david.vra...@citrix.com>
Tested-by: Rusty Russell <ru...@rustcorp.com.au>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: <lgu...@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.coop...@citrix.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/kernel/cpu/common.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 37c4f31..a1f7c91 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -280,10 +280,9 @@ __setup("nosmap", setup_disable_smap);
 
 static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 {
-   unsigned long eflags;
+   unsigned long eflags = native_save_fl();
 
/* This should have been cleared long ago */
-   raw_local_save_flags(eflags);
BUG_ON(eflags & X86_EFLAGS_AC);
 
if (cpu_has(c, X86_FEATURE_SMAP)) {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [3.19.y-ckt stable] Patch "x86/cpu: Fix SMAP check in PVOPS environments" has been added to staging queue

2015-12-14 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/cpu: Fix SMAP check in PVOPS environments

to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue

This patch is scheduled to be released in version 3.19.8-ckt12.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.19.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

--

>From 136faa8d9086a925df4b86ea9ba2f2b57e1a8179 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.coop...@citrix.com>
Date: Wed, 3 Jun 2015 10:31:14 +0100
Subject: x86/cpu: Fix SMAP check in PVOPS environments

commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream.

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: David Vrabel <david.vra...@citrix.com>
Tested-by: Rusty Russell <ru...@rustcorp.com.au>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: <lgu...@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.coop...@citrix.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/kernel/cpu/common.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 382a097..547ba45 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -290,10 +290,9 @@ __setup("nosmap", setup_disable_smap);

 static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 {
-   unsigned long eflags;
+   unsigned long eflags = native_save_fl();

/* This should have been cleared long ago */
-   raw_local_save_flags(eflags);
BUG_ON(eflags & X86_EFLAGS_AC);

if (cpu_has(c, X86_FEATURE_SMAP)) {
--
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3.19.y-ckt 002/102] x86/ldt: Make modify_ldt synchronous

2015-09-22 Thread Kamal Mostafa
3.19.8-ckt7 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: Andy Lutomirski <l...@kernel.org>

commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream.

modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <l...@kernel.org>
Reviewed-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Sasha Levin <sasha.le...@oracle.com>
Cc: Steven Rostedt <rost...@goodmis.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: secur...@kernel.org <secur...@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
[ kamal: backport to 3.13-stable: dropped comment changes in switch_mm();
  included asm/mmu_context.h in perf_event.c; context ]
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/desc.h|  15 ---
 arch/x86/include/asm/mmu.h |   3 +-
 arch/x86/include/asm/mmu_context.h |  48 ++-
 arch/x86/kernel/cpu/common.c   |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  13 +-
 arch/x86/kernel/ldt.c  | 262 -
 arch/x86/kernel/process_64.c   |   4 +-
 arch/x86/kernel/step.c |   6 +-
 arch/x86/power/cpu.c   |   3 +-
 9 files changed, 208 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a94b82e..6912618 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-   set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-   preempt_disable();
-   load_LDT_nolock(pc);
-   preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) 
<< 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 876e74e..b6b7bc3 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-   void *ldt;
-   int size;
+   struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 4b75d59..1ab38a4 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -19,6 +19,50 @@ static inline void paravirt_activate_mm(struct mm_struct 
*prev,
 #endif /* !CONFIG_PARAVIRT */
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+   /*
+* Xen requires page-aligned LDTs with special permissions.  This is
+* needed to prevent us from installing evil descriptors such as
+* call gates.  On native, we could merge the ldt_struct and LDT
+* allocations, but it's not worth trying to optimize.
+*/
+   struct desc_struct *entries;
+   int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+   struct ldt_struct *ldt;
+
+   /* lockless_dereference synchronizes with smp_store_release */
+   ldt = lockless_dereference(mm->context.ldt);
+
+   /*
+* Any change to mm->context.ldt is followed by an IPI to all
+* CPUs with the mm active.  The LDT will not be freed until
+* after the IPI is handled by all such CPUs.  This means that,
+* if the ldt_struct changes before we return, the values we see
+* will be safe, and the new values will be loaded before we run
+* any user code.
+*
+* NB: don't try to convert this to us

[Xen-devel] [3.19.y-ckt stable] Patch "x86/ldt: Make modify_ldt synchronous" has been added to staging queue

2015-09-01 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/ldt: Make modify_ldt synchronous

to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue

This patch is scheduled to be released in version 3.19.8-ckt7.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.19.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

--

>From 5bd39ec190eb78679a13c12c3baa0e7a6263bfe2 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <l...@kernel.org>
Date: Thu, 30 Jul 2015 14:31:32 -0700
Subject: x86/ldt: Make modify_ldt synchronous

commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream.

modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <l...@kernel.org>
Reviewed-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Sasha Levin <sasha.le...@oracle.com>
Cc: Steven Rostedt <rost...@goodmis.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: secur...@kernel.org <secur...@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
[ kamal: backport to 3.13-stable: dropped comment changes in switch_mm();
  context ]
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/include/asm/desc.h|  15 ---
 arch/x86/include/asm/mmu.h |   3 +-
 arch/x86/include/asm/mmu_context.h |  48 ++-
 arch/x86/kernel/cpu/common.c   |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  12 +-
 arch/x86/kernel/ldt.c  | 262 -
 arch/x86/kernel/process_64.c   |   4 +-
 arch/x86/kernel/step.c |   6 +-
 arch/x86/power/cpu.c   |   3 +-
 9 files changed, 207 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a94b82e..6912618 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
 }

-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-   set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-   preempt_disable();
-   load_LDT_nolock(pc);
-   preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) 
<< 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 876e74e..b6b7bc3 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-   void *ldt;
-   int size;
+   struct ldt_struct *ldt;

 #ifdef CONFIG_X86_64
/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 4b75d59..1ab38a4 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -19,6 +19,50 @@ static inline void paravirt_activate_mm(struct mm_struct 
*prev,
 #endif /* !CONFIG_PARAVIRT */

 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+   /*
+* Xen requires page-aligned LDTs with special permissions.  This is
+* needed to prevent us from installing evil descriptors such as
+* call gates.  On native, we could merge the ldt_struct and LDT
+* allocations, but it's not worth trying to optimize.
+*/
+   struct desc_struct *entries;
+   int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+   struct ldt_struct *ldt;
+
+ 

[Xen-devel] [PATCH 3.13.y-ckt 50/60] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

2015-09-01 Thread Kamal Mostafa
3.13.11-ckt26 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Andy Lutomirski <l...@kernel.org>

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <l...@kernel.org>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: David Vrabel <dvra...@cantab.net>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Sasha Levin <sasha.le...@oracle.com>
Cc: Steven Rostedt <rost...@goodmis.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: secur...@kernel.org <secur...@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fa6ade7..2cbc2f2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -480,6 +480,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;
 
ptep = lookup_address((unsigned long)v, );
BUG_ON(ptep == NULL);
@@ -489,6 +490,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
pte = pfn_pte(pfn, prot);
 
+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();
 
@@ -500,6 +527,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -507,6 +536,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, 
unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;
 
+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases out the TLBs, not the
+* page tables, and Xen will flush the TLB for us if needed.
+*
+* To avoid confusing future readers: none of this is necessary
+* to load the LDT.  The hypervisor only checks this when the
+* LDT is faulted in due to subsequent descriptor access.
+*/
+
for(i = 0; i < entries; i += entries_per_page)
set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [3.13.y-ckt stable] Patch "x86/xen: Probe target addresses in set_aliased_prot() before the hypercall" has been added to staging queue

2015-09-01 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

to the linux-3.13.y-queue branch of the 3.13.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.13.y-queue

This patch is scheduled to be released in version 3.13.11-ckt26.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.13.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

--

>From f93fdf9dea4a12083cdd7a962de939933f901053 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <l...@kernel.org>
Date: Thu, 30 Jul 2015 14:31:31 -0700
Subject: x86/xen: Probe target addresses in set_aliased_prot() before the
 hypercall

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski <l...@kernel.org>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: David Vrabel <dvra...@cantab.net>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Sasha Levin <sasha.le...@oracle.com>
Cc: Steven Rostedt <rost...@goodmis.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: secur...@kernel.org <secur...@kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Kamal Mostafa <ka...@canonical.com>
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fa6ade7..2cbc2f2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -480,6 +480,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;

ptep = lookup_address((unsigned long)v, );
BUG_ON(ptep == NULL);
@@ -489,6 +490,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)

pte = pfn_pte(pfn, prot);

+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();

@@ -500,6 +527,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }

 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -507,6 +536,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, 
unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;

+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases ou

[Xen-devel] [3.19.y-ckt stable] Patch x86/xen: Probe target addresses in set_aliased_prot() before the hypercall has been added to staging queue

2015-08-27 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

to the linux-3.19.y-queue branch of the 3.19.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y-queue

This patch is scheduled to be released in version 3.19.8-ckt6.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.19.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

--

From 922c946e2cad7cdaec82a9ae963c5eb47f7992b0 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski l...@kernel.org
Date: Thu, 30 Jul 2015 14:31:31 -0700
Subject: x86/xen: Probe target addresses in set_aliased_prot() before the
 hypercall

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski l...@kernel.org
Cc: Andrew Cooper andrew.coop...@citrix.com
Cc: Andy Lutomirski l...@amacapital.net
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Brian Gerst brge...@gmail.com
Cc: David Vrabel dvra...@cantab.net
Cc: Denys Vlasenko dvlas...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Sasha Levin sasha.le...@oracle.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: secur...@kernel.org secur...@kernel.org
Cc: xen-devel xen-devel@lists.xen.org
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Kamal Mostafa ka...@canonical.com
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 78a881b..f94ad30 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;

ptep = lookup_address((unsigned long)v, level);
BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)

pte = pfn_pte(pfn, prot);

+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();

@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }

 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, 
unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;

+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases out the TLBs, not the
+* page tables, and Xen will flush the TLB for us if needed.
+*
+* To avoid confusing future readers: none

[Xen-devel] [PATCH 3.19.y-ckt 111/130] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

2015-08-27 Thread Kamal Mostafa
3.19.8-ckt6 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: Andy Lutomirski l...@kernel.org

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski l...@kernel.org
Cc: Andrew Cooper andrew.coop...@citrix.com
Cc: Andy Lutomirski l...@amacapital.net
Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: Borislav Petkov b...@alien8.de
Cc: Brian Gerst brge...@gmail.com
Cc: David Vrabel dvra...@cantab.net
Cc: Denys Vlasenko dvlas...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Sasha Levin sasha.le...@oracle.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: secur...@kernel.org secur...@kernel.org
Cc: xen-devel xen-devel@lists.xen.org
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Kamal Mostafa ka...@canonical.com
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 78a881b..f94ad30 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -483,6 +483,7 @@ static void set_aliased_prot(void *v, pgprot_t prot)
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;
 
ptep = lookup_address((unsigned long)v, level);
BUG_ON(ptep == NULL);
@@ -492,6 +493,32 @@ static void set_aliased_prot(void *v, pgprot_t prot)
 
pte = pfn_pte(pfn, prot);
 
+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();
 
@@ -503,6 +530,8 @@ static void set_aliased_prot(void *v, pgprot_t prot)
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -510,6 +539,17 @@ static void xen_alloc_ldt(struct desc_struct *ldt, 
unsigned entries)
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;
 
+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases out the TLBs, not the
+* page tables, and Xen will flush the TLB for us if needed.
+*
+* To avoid confusing future readers: none of this is necessary
+* to load the LDT.  The hypervisor only checks this when the
+* LDT is faulted in due to subsequent descriptor access.
+*/
+
for(i = 0; i  entries; i += entries_per_page)
set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3.13.y-ckt 017/139] xen-netfront: Fix handling packets on compound pages with skb_linearize

2015-01-28 Thread Kamal Mostafa
3.13.11-ckt15 -stable review patch.  If anyone has any objections, please let 
me know.

--

From: Zoltan Kiss zoltan.k...@citrix.com

commit 97a6d1bb2b658ac85ed88205ccd1ab809899884d upstream.

There is a long known problem with the netfront/netback interface: if the guest
tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
it gets dropped. The reason is that netback maps these slots to a frag in the
frags array, which is limited by size. Having so many slots can occur since
compound pages were introduced, as the ring protocol slice them up into
individual (non-compound) page aligned slots. The theoretical worst case
scenario looks like this (note, skbs are limited to 64 Kb here):
linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
using 2 slots
first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
end and the beginning of a page, therefore they use 3 * 15 = 45 slots
last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
Although I don't think this 51 slots skb can really happen, we need a solution
which can deal with every scenario. In real life there is only a few slots
overdue, but usually it causes the TCP stream to be blocked, as the retry will
most likely have the same buffer layout.
This patch solves this problem by linearizing the packet. This is not the
fastest way, and it can fail much easier as it tries to allocate a big linear
area for the whole packet, but probably easier by an order of magnitude than
anything else. Probably this code path is not touched very frequently anyway.

Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Paul Durrant paul.durr...@citrix.com
Cc: net...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
Signed-off-by: David S. Miller da...@davemloft.net
BugLink: http://bugs.launchpad.net/bugs/1317811
Signed-off-by: Stefan Bader stefan.ba...@canonical.com
Signed-off-by: Kamal Mostafa ka...@canonical.com
---
 drivers/net/xen-netfront.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d58830b..a3ed8de 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -568,9 +568,10 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
xennet_count_skb_frag_slots(skb);
if (unlikely(slots  MAX_SKB_FRAGS + 1)) {
-   net_alert_ratelimited(
-   xennet: skb rides the rocket: %d slots\n, slots);
-   goto drop;
+   net_dbg_ratelimited(xennet: skb rides the rocket: %d slots, %d 
bytes\n,
+   slots, skb-len);
+   if (skb_linearize(skb))
+   goto drop;
}
 
spin_lock_irqsave(np-tx_lock, flags);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [3.13.y-ckt stable] Patch xen-netfront: Fix handling packets on compound pages with skb_linearize has been added to staging queue

2015-01-21 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled

xen-netfront: Fix handling packets on compound pages with skb_linearize

to the linux-3.13.y-queue branch of the 3.13.y-ckt extended stable tree 
which can be found at:

 
http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.13.y-queue

This patch is scheduled to be released in version 3.13.11-ckt15.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.13.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

--

From 7d113a745b53b4ff8fe7eac931bbab376d66993e Mon Sep 17 00:00:00 2001
From: Zoltan Kiss zoltan.k...@citrix.com
Date: Mon, 11 Aug 2014 18:32:23 +0100
Subject: xen-netfront: Fix handling packets on compound pages with
 skb_linearize

commit 97a6d1bb2b658ac85ed88205ccd1ab809899884d upstream.

There is a long known problem with the netfront/netback interface: if the guest
tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
it gets dropped. The reason is that netback maps these slots to a frag in the
frags array, which is limited by size. Having so many slots can occur since
compound pages were introduced, as the ring protocol slice them up into
individual (non-compound) page aligned slots. The theoretical worst case
scenario looks like this (note, skbs are limited to 64 Kb here):
linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
using 2 slots
first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
end and the beginning of a page, therefore they use 3 * 15 = 45 slots
last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
Although I don't think this 51 slots skb can really happen, we need a solution
which can deal with every scenario. In real life there is only a few slots
overdue, but usually it causes the TCP stream to be blocked, as the retry will
most likely have the same buffer layout.
This patch solves this problem by linearizing the packet. This is not the
fastest way, and it can fail much easier as it tries to allocate a big linear
area for the whole packet, but probably easier by an order of magnitude than
anything else. Probably this code path is not touched very frequently anyway.

Signed-off-by: Zoltan Kiss zoltan.k...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Paul Durrant paul.durr...@citrix.com
Cc: net...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
Signed-off-by: David S. Miller da...@davemloft.net
BugLink: http://bugs.launchpad.net/bugs/1317811
Signed-off-by: Stefan Bader stefan.ba...@canonical.com
Signed-off-by: Kamal Mostafa ka...@canonical.com
---
 drivers/net/xen-netfront.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d58830b..a3ed8de 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -568,9 +568,10 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
xennet_count_skb_frag_slots(skb);
if (unlikely(slots  MAX_SKB_FRAGS + 1)) {
-   net_alert_ratelimited(
-   xennet: skb rides the rocket: %d slots\n, slots);
-   goto drop;
+   net_dbg_ratelimited(xennet: skb rides the rocket: %d slots, %d 
bytes\n,
+   slots, skb-len);
+   if (skb_linearize(skb))
+   goto drop;
}

spin_lock_irqsave(np-tx_lock, flags);
--
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel