Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Julien Grall
Hi Luis,

On 22/01/15 02:17, Luis R. Rodriguez wrote:
 diff --git a/drivers/xen/events/events_base.c 
 b/drivers/xen/events/events_base.c
 index b4bca2d..23c526b 100644
 --- a/drivers/xen/events/events_base.c
 +++ b/drivers/xen/events/events_base.c
 @@ -32,6 +32,8 @@
  #include linux/slab.h
  #include linux/irqnr.h
  #include linux/pci.h
 +#include linux/sched.h
 +#include linux/kprobes.h
  
  #ifdef CONFIG_X86
  #include asm/desc.h
 @@ -1243,6 +1245,17 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
   set_irq_regs(old_regs);
  }
  
 +notrace void xen_end_upcall(struct pt_regs *regs)
 +{
 + if (!xen_is_preemptible_hypercall(regs) ||

I don't see any definition of xen_is_preemptible_hypercall for ARM32/ARM64.

As this function is called from the generic code, you have at least to
stub this function for those architectures.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread David Vrabel
On 22/01/15 03:18, Andy Lutomirski wrote:
 --- a/drivers/xen/events/events_base.c
 +++ b/drivers/xen/events/events_base.c
 @@ -32,6 +32,8 @@
  #include linux/slab.h
  #include linux/irqnr.h
  #include linux/pci.h
 +#include linux/sched.h
 +#include linux/kprobes.h

  #ifdef CONFIG_X86
  #include asm/desc.h
 @@ -1243,6 +1245,17 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
 set_irq_regs(old_regs);
  }

 +notrace void xen_end_upcall(struct pt_regs *regs)
 +{
 +   if (!xen_is_preemptible_hypercall(regs) ||
 +   __this_cpu_read(xed_nesting_count))
 +   return;
 
 What's xed_nesting_count?

It used to prevent nested upcalls when a hypercall called from an upcall
triggers another upcall.

There's no way a such a nested hypercall can be preemptible so the check
cfor xed_nesting_count an be removed from here.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Andrew Cooper
On 22/01/15 02:17, Luis R. Rodriguez wrote:
 --- a/drivers/xen/events/events_base.c
 +++ b/drivers/xen/events/events_base.c
 @@ -32,6 +32,8 @@
  #include linux/slab.h
  #include linux/irqnr.h
  #include linux/pci.h
 +#include linux/sched.h
 +#include linux/kprobes.h
  
  #ifdef CONFIG_X86
  #include asm/desc.h
 @@ -1243,6 +1245,17 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
   set_irq_regs(old_regs);
  }
  
 +notrace void xen_end_upcall(struct pt_regs *regs)
 +{
 + if (!xen_is_preemptible_hypercall(regs) ||
 + __this_cpu_read(xed_nesting_count))
 + return;
 +
 + if (_cond_resched())
 + printk(KERN_DEBUG xen hypercall preempted\n);

I wouldn't even put this at debug level.  On a large server with plenty
of domains being created/migrated/destroyed, it is quite likely that a
toolstack task might get preempted in this way.

I don't believe the message is of any practical use.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Luis R. Rodriguez
On Thu, Jan 22, 2015 at 08:56:49AM -0500, Steven Rostedt wrote:
 On Thu, 22 Jan 2015 11:50:10 +
 Andrew Cooper andrew.coop...@citrix.com wrote:
 
  On 22/01/15 02:17, Luis R. Rodriguez wrote:
   --- a/drivers/xen/events/events_base.c
   +++ b/drivers/xen/events/events_base.c
   @@ -32,6 +32,8 @@
#include linux/slab.h
#include linux/irqnr.h
#include linux/pci.h
   +#include linux/sched.h
   +#include linux/kprobes.h

#ifdef CONFIG_X86
#include asm/desc.h
   @@ -1243,6 +1245,17 @@ void xen_evtchn_do_upcall(struct pt_regs
   *regs) set_irq_regs(old_regs);
}

   +notrace void xen_end_upcall(struct pt_regs *regs)
   +{
   + if (!xen_is_preemptible_hypercall(regs) ||
   + __this_cpu_read(xed_nesting_count))
   + return;
   +
   + if (_cond_resched())
   + printk(KERN_DEBUG xen hypercall preempted\n);
  
  I wouldn't even put this at debug level.  On a large server with
  plenty of domains being created/migrated/destroyed, it is quite
  likely that a toolstack task might get preempted in this way.
  
  I don't believe the message is of any practical use.
  
 
 Why not make this a tracepoint? Then you can enable it only when you
 want to. As tracepoints are also hooks, you could add you own code that
 hooks to it and does a printk as well. The advantage of doing it via a
 tracepoint is that you can turn it on and off regardless of what the
 loglevel is set at.

This uses NOKPROBE_SYMBOL and notrace since based on Andy's advice
we are not confident that tracing and kprobes are safe to use in what
might be an extended RCU quiescent state (i.e. where we're outside
irq_enter and irq_exit).

 That is, if there is any practical use for that message. Tracing just
 sched_switch will give you the same info.

IMHO it may be more useful if we knew exactly what hypercalls were
being preempted but perhaps all that can be left as a secondary
exercise and for now I'll just nuke the print.

  Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Luis R. Rodriguez
On Thu, Jan 22, 2015 at 11:50:10AM +, Andrew Cooper wrote:
 On 22/01/15 02:17, Luis R. Rodriguez wrote:
  --- a/drivers/xen/events/events_base.c
  +++ b/drivers/xen/events/events_base.c
  @@ -32,6 +32,8 @@
   #include linux/slab.h
   #include linux/irqnr.h
   #include linux/pci.h
  +#include linux/sched.h
  +#include linux/kprobes.h
   
   #ifdef CONFIG_X86
   #include asm/desc.h
  @@ -1243,6 +1245,17 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
  set_irq_regs(old_regs);
   }
   
  +notrace void xen_end_upcall(struct pt_regs *regs)
  +{
  +   if (!xen_is_preemptible_hypercall(regs) ||
  +   __this_cpu_read(xed_nesting_count))
  +   return;
  +
  +   if (_cond_resched())
  +   printk(KERN_DEBUG xen hypercall preempted\n);
 
 I wouldn't even put this at debug level.  On a large server with plenty
 of domains being created/migrated/destroyed, it is quite likely that a
 toolstack task might get preempted in this way.
 
 I don't believe the message is of any practical use.

I'll just nuke it then.

 Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Andrew Cooper
On 22/01/2015 20:58, Andy Lutomirski wrote:
 On Thu, Jan 22, 2015 at 12:37 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 22 Jan 2015 12:24:47 -0800
 Andy Lutomirski l...@amacapital.net wrote:

 Also, please remove the notrace, because function tracing goes an
 extra step to not require RCU being visible. The only thing you get
 with notrace is not being able to trace an otherwise traceable function.

 Is this also true for kprobes?  And can kprobes nest inside function
 tracing hooks?
 No, kprobes are a bit more fragile than function tracing or tracepoints.

 And nothing should nest inside a function hook (except for interrupts,
 they are fine).

 But kprobes do nest inside interrupts, right?

 The other issue, above and beyond RCU, is that we can't let kprobes
 run on the int3 stack.  If Xen upcalls can happen when interrupts are
 off, then we may need this protection to prevent that type of
 recursion.  (This will be much less scary in 3.20, because userspace
 int3 instructions will no longer execute on the int3 stack.)
 Does this execute between the start of the int3 interrupt handler and
 the call of do_int3()?
 I doubt it.

 The thing I worry about is that, if do_int3 nests inside itself by any
 means (e.g. int3 sends a signal, scheduling for whatever reason
 (really shouldn't happen, but I haven't looked that hard)), then we're
 completely hosed -- the inner int3 will overwrite the outer int3's
 stack frame.  Since I have no idea what Xen upcalls do, I don't know
 whether they can fire inside do_int3.

The upcall is the you have a virtual interrupt pending signal and
should behave exactly like an external interrupt.  The exception frame
will appear to have interrupted the correct vcpu context, despite actual
trip via Xen.

Exceptions are handled as per native, with the xen_write_idt_entry()
PVOP taking care of registering the entry point with Xen, rather than
filling in a real IDT entry.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Luis R. Rodriguez
On Wed, Jan 21, 2015 at 07:18:46PM -0800, Andy Lutomirski wrote:
 On Wed, Jan 21, 2015 at 6:17 PM, Luis R. Rodriguez
 mcg...@do-not-panic.com wrote:
  From: Luis R. Rodriguez mcg...@suse.com
 
  Xen has support for splitting heavy work work into a series
  of hypercalls, called multicalls, and preempting them through
  what Xen calls continuation [0]. Despite this though without
  CONFIG_PREEMPT preemption won't happen, without preemption
  a system can become pretty useless on heavy handed hypercalls.
  Such is the case for example when creating a  50 GiB HVM guest,
  we can get softlockups [1] with:.
 
  kernel: [  802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]
 
  The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check
  (default 120 seconds), on the Xen side in this particular case
  this happens when the following Xen hypervisor code is used:
 
  xc_domain_set_pod_target() --
do_memory_op() --
  arch_memory_op() --
p2m_pod_set_mem_target()
  -- long delay (real or emulated) --
 
  This happens on arch_memory_op() on the XENMEM_set_pod_target memory
  op even though arch_memory_op() can handle continuation via
  hypercall_create_continuation() for example.
 
  Machines over 50 GiB of memory are on high demand and hard to come
  by so to help replicate this sort of issue long delays on select
  hypercalls have been emulated in order to be able to test this on
  smaller machines [2].
 
  On one hand this issue can be considered as expected given that
  CONFIG_PREEMPT=n is used however we have forced voluntary preemption
  precedent practices in the kernel even for CONFIG_PREEMPT=n through
  the usage of cond_resched() sprinkled in many places. To address
  this issue with Xen hypercalls though we need to find a way to aid
  to the schedular in the middle of hypercalls. We are motivated to
  address this issue on CONFIG_PREEMPT=n as otherwise the system becomes
  rather unresponsive for long periods of time; in the worst case, at least
  only currently by emulating long delays on select io disk bound
  hypercalls, this can lead to filesystem corruption if the delay happens
  for example on SCHEDOP_remote_shutdown (when we call 'xl domain 
  shutdown').
 
  We can address this problem by trying to check if we should schedule
  on the xen timer in the middle of a hypercall on the return from the
  timer interrupt. We want to be careful to not always force voluntary
  preemption though so to do this we only selectively enable preemption
  on very specific xen hypercalls.
 
  This enables hypercall preemption by selectively forcing checks for
  voluntary preempting only on ioctl initiated private hypercalls
  where we know some folks have run into reported issues [1].
 
  [0] 
  http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
  [1] https://bugzilla.novell.com/show_bug.cgi?id=861093
  [2] 
  http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch
 
  Based on original work by: David Vrabel david.vra...@citrix.com
  Suggested-by: Andy Lutomirski l...@amacapital.net
  Cc: Andy Lutomirski l...@amacapital.net
  Cc: Borislav Petkov b...@suse.de
  Cc: David Vrabel david.vra...@citrix.com
  Cc: Thomas Gleixner t...@linutronix.de
  Cc: Ingo Molnar mi...@redhat.com
  Cc: H. Peter Anvin h...@zytor.com
  Cc: x...@kernel.org
  Cc: Steven Rostedt rost...@goodmis.org
  Cc: Masami Hiramatsu masami.hiramatsu...@hitachi.com
  Cc: Jan Beulich jbeul...@suse.com
  Cc: linux-ker...@vger.kernel.org
  Signed-off-by: Luis R. Rodriguez mcg...@suse.com
  ---
   arch/x86/kernel/entry_32.S   |  2 ++
   arch/x86/kernel/entry_64.S   |  2 ++
   drivers/xen/events/events_base.c | 13 +
   include/xen/events.h |  1 +
   4 files changed, 18 insertions(+)
 
  diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
  index 000d419..b4b1f42 100644
  --- a/arch/x86/kernel/entry_32.S
  +++ b/arch/x86/kernel/entry_32.S
  @@ -982,6 +982,8 @@ ENTRY(xen_hypervisor_callback)
   ENTRY(xen_do_upcall)
   1: mov %esp, %eax
  call xen_evtchn_do_upcall
  +   movl %esp,%eax
  +   call xen_end_upcall
  jmp  ret_from_intr
  CFI_ENDPROC
   ENDPROC(xen_hypervisor_callback)
  diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
  index 9ebaf63..ee28733 100644
  --- a/arch/x86/kernel/entry_64.S
  +++ b/arch/x86/kernel/entry_64.S
  @@ -1198,6 +1198,8 @@ ENTRY(xen_do_hypervisor_callback)   # 
  do_hypervisor_callback(struct *pt_regs)
  popq %rsp
  CFI_DEF_CFA_REGISTER rsp
  decl PER_CPU_VAR(irq_count)
  +   movq %rsp, %rdi  /* pass pt_regs as first argument */
  +   call xen_end_upcall
  jmp  error_exit
  CFI_ENDPROC
   END(xen_do_hypervisor_callback)
  diff --git a/drivers/xen/events/events_base.c 
  b/drivers/xen/events/events_base.c
  index b4bca2d..23c526b 

Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Steven Rostedt

[ Added Paul McKenney ]

On Thu, 22 Jan 2015 19:39:13 +0100
Luis R. Rodriguez mcg...@suse.com wrote:

  Why not make this a tracepoint? Then you can enable it only when you
  want to. As tracepoints are also hooks, you could add you own code that
  hooks to it and does a printk as well. The advantage of doing it via a
  tracepoint is that you can turn it on and off regardless of what the
  loglevel is set at.
 
 This uses NOKPROBE_SYMBOL and notrace since based on Andy's advice
 we are not confident that tracing and kprobes are safe to use in what
 might be an extended RCU quiescent state (i.e. where we're outside
 irq_enter and irq_exit).

We have trace_*_rcuidle() for such cases.

That is, you create the tracepoint just the same, and instead of having
trace_foo(), if you are in a known area that is outside of rcu viewing,
you use trace_foo_rcuidle() and it will tell RCU hey, there's something
here that may need RCU, so look at me!

Also, please remove the notrace, because function tracing goes an
extra step to not require RCU being visible. The only thing you get
with notrace is not being able to trace an otherwise traceable function.

-- Steve

 
  That is, if there is any practical use for that message. Tracing just
  sched_switch will give you the same info.
 
 IMHO it may be more useful if we knew exactly what hypercalls were
 being preempted but perhaps all that can be left as a secondary
 exercise and for now I'll just nuke the print.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Andy Lutomirski
On Thu, Jan 22, 2015 at 12:16 PM, Steven Rostedt rost...@goodmis.org wrote:

 [ Added Paul McKenney ]

 On Thu, 22 Jan 2015 19:39:13 +0100
 Luis R. Rodriguez mcg...@suse.com wrote:

  Why not make this a tracepoint? Then you can enable it only when you
  want to. As tracepoints are also hooks, you could add you own code that
  hooks to it and does a printk as well. The advantage of doing it via a
  tracepoint is that you can turn it on and off regardless of what the
  loglevel is set at.

 This uses NOKPROBE_SYMBOL and notrace since based on Andy's advice
 we are not confident that tracing and kprobes are safe to use in what
 might be an extended RCU quiescent state (i.e. where we're outside
 irq_enter and irq_exit).

 We have trace_*_rcuidle() for such cases.

 That is, you create the tracepoint just the same, and instead of having
 trace_foo(), if you are in a known area that is outside of rcu viewing,
 you use trace_foo_rcuidle() and it will tell RCU hey, there's something
 here that may need RCU, so look at me!

 Also, please remove the notrace, because function tracing goes an
 extra step to not require RCU being visible. The only thing you get
 with notrace is not being able to trace an otherwise traceable function.


Is this also true for kprobes?  And can kprobes nest inside function
tracing hooks?

The other issue, above and beyond RCU, is that we can't let kprobes
run on the int3 stack.  If Xen upcalls can happen when interrupts are
off, then we may need this protection to prevent that type of
recursion.  (This will be much less scary in 3.20, because userspace
int3 instructions will no longer execute on the int3 stack.)

--Andy

 -- Steve


  That is, if there is any practical use for that message. Tracing just
  sched_switch will give you the same info.

 IMHO it may be more useful if we knew exactly what hypercalls were
 being preempted but perhaps all that can be left as a secondary
 exercise and for now I'll just nuke the print.



-- 
Andy Lutomirski
AMA Capital Management, LLC

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Julien Grall
On 22/01/15 18:56, Luis R. Rodriguez wrote:
 On Thu, Jan 22, 2015 at 01:10:49PM +, Julien Grall wrote:
 Hi Luis,

 On 22/01/15 02:17, Luis R. Rodriguez wrote:
 diff --git a/drivers/xen/events/events_base.c 
 b/drivers/xen/events/events_base.c
 index b4bca2d..23c526b 100644
 --- a/drivers/xen/events/events_base.c
 +++ b/drivers/xen/events/events_base.c
 @@ -32,6 +32,8 @@
  #include linux/slab.h
  #include linux/irqnr.h
  #include linux/pci.h
 +#include linux/sched.h
 +#include linux/kprobes.h
  
  #ifdef CONFIG_X86
  #include asm/desc.h
 @@ -1243,6 +1245,17 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
 set_irq_regs(old_regs);
  }
  
 +notrace void xen_end_upcall(struct pt_regs *regs)
 +{
 +   if (!xen_is_preemptible_hypercall(regs) ||

 I don't see any definition of xen_is_preemptible_hypercall for ARM32/ARM64.

 As this function is called from the generic code, you have at least to
 stub this function for those architectures.
 
 Will add as:
 
 diff --git a/arch/arm/include/asm/xen/hypercall.h 
 b/arch/arm/include/asm/xen/hypercall.h
 index 712b50e..4fc8395 100644
 --- a/arch/arm/include/asm/xen/hypercall.h
 +++ b/arch/arm/include/asm/xen/hypercall.h
 @@ -74,4 +74,9 @@ MULTI_mmu_update(struct multicall_entry *mcl, struct 
 mmu_update *req,
   BUG();
  }
  
 +static inline bool xen_is_preemptible_hypercall(struct pt_regs *regs)
 +{
 + return false;
 +}
 +
  #endif /* _ASM_ARM_XEN_HYPERCALL_H */
 
 This will cover both arm and arm64 as arm64 includes the arm header.

I'm fine with this solution.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Luis R. Rodriguez
On Thu, Jan 22, 2015 at 12:55:17PM +, David Vrabel wrote:
 On 22/01/15 03:18, Andy Lutomirski wrote:
  --- a/drivers/xen/events/events_base.c
  +++ b/drivers/xen/events/events_base.c
  @@ -32,6 +32,8 @@
   #include linux/slab.h
   #include linux/irqnr.h
   #include linux/pci.h
  +#include linux/sched.h
  +#include linux/kprobes.h
 
   #ifdef CONFIG_X86
   #include asm/desc.h
  @@ -1243,6 +1245,17 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
  set_irq_regs(old_regs);
   }
 
  +notrace void xen_end_upcall(struct pt_regs *regs)
  +{
  +   if (!xen_is_preemptible_hypercall(regs) ||
  +   __this_cpu_read(xed_nesting_count))
  +   return;
  
  What's xed_nesting_count?
 
 It used to prevent nested upcalls when a hypercall called from an upcall
 triggers another upcall.
 
 There's no way a such a nested hypercall can be preemptible so the check
 for xed_nesting_count an be removed from here.

Removed.

  Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Andy Lutomirski
On Thu, Jan 22, 2015 at 12:37 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 22 Jan 2015 12:24:47 -0800
 Andy Lutomirski l...@amacapital.net wrote:

  Also, please remove the notrace, because function tracing goes an
  extra step to not require RCU being visible. The only thing you get
  with notrace is not being able to trace an otherwise traceable function.
 

 Is this also true for kprobes?  And can kprobes nest inside function
 tracing hooks?

 No, kprobes are a bit more fragile than function tracing or tracepoints.

 And nothing should nest inside a function hook (except for interrupts,
 they are fine).


But kprobes do nest inside interrupts, right?


 The other issue, above and beyond RCU, is that we can't let kprobes
 run on the int3 stack.  If Xen upcalls can happen when interrupts are
 off, then we may need this protection to prevent that type of
 recursion.  (This will be much less scary in 3.20, because userspace
 int3 instructions will no longer execute on the int3 stack.)

 Does this execute between the start of the int3 interrupt handler and
 the call of do_int3()?

I doubt it.

The thing I worry about is that, if do_int3 nests inside itself by any
means (e.g. int3 sends a signal, scheduling for whatever reason
(really shouldn't happen, but I haven't looked that hard)), then we're
completely hosed -- the inner int3 will overwrite the outer int3's
stack frame.  Since I have no idea what Xen upcalls do, I don't know
whether they can fire inside do_int3.

--Andy

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Paul E. McKenney
On Thu, Jan 22, 2015 at 03:16:57PM -0500, Steven Rostedt wrote:
 
 [ Added Paul McKenney ]
 
 On Thu, 22 Jan 2015 19:39:13 +0100
 Luis R. Rodriguez mcg...@suse.com wrote:
 
   Why not make this a tracepoint? Then you can enable it only when you
   want to. As tracepoints are also hooks, you could add you own code that
   hooks to it and does a printk as well. The advantage of doing it via a
   tracepoint is that you can turn it on and off regardless of what the
   loglevel is set at.
  
  This uses NOKPROBE_SYMBOL and notrace since based on Andy's advice
  we are not confident that tracing and kprobes are safe to use in what
  might be an extended RCU quiescent state (i.e. where we're outside
  irq_enter and irq_exit).
 
 We have trace_*_rcuidle() for such cases.
 
 That is, you create the tracepoint just the same, and instead of having
 trace_foo(), if you are in a known area that is outside of rcu viewing,
 you use trace_foo_rcuidle() and it will tell RCU hey, there's something
 here that may need RCU, so look at me!

What Steve said!

Also, there is an rcu_is_watching() API member that can tell you
whether or not RCU is paying attention at a given point.  Or test with
CONFIG_PROVE_RCU, in which case lockdep will yell at you if you should
have used the _rcuidle() form of the tracing hooks.  ;-)

Thanx, Paul

 Also, please remove the notrace, because function tracing goes an
 extra step to not require RCU being visible. The only thing you get
 with notrace is not being able to trace an otherwise traceable function.
 
 -- Steve
 
  
   That is, if there is any practical use for that message. Tracing just
   sched_switch will give you the same info.
  
  IMHO it may be more useful if we knew exactly what hypercalls were
  being preempted but perhaps all that can be left as a secondary
  exercise and for now I'll just nuke the print.
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Steven Rostedt
On Thu, 22 Jan 2015 12:58:00 -0800
Andy Lutomirski l...@amacapital.net wrote:

 On Thu, Jan 22, 2015 at 12:37 PM, Steven Rostedt rost...@goodmis.org wrote:
  On Thu, 22 Jan 2015 12:24:47 -0800
  Andy Lutomirski l...@amacapital.net wrote:
 
   Also, please remove the notrace, because function tracing goes an
   extra step to not require RCU being visible. The only thing you get
   with notrace is not being able to trace an otherwise traceable function.
  
 
  Is this also true for kprobes?  And can kprobes nest inside function
  tracing hooks?
 
  No, kprobes are a bit more fragile than function tracing or tracepoints.
 
  And nothing should nest inside a function hook (except for interrupts,
  they are fine).
 
 
 But kprobes do nest inside interrupts, right?

A kprobe being called while a function trace is happening is fine, but
you should not have the kprobe set directly inside the function trace
callback code. Because that means a kprobe could happen anywhere
function tracing is happening (for instance, in NMI context).

 
 
  The other issue, above and beyond RCU, is that we can't let kprobes
  run on the int3 stack.  If Xen upcalls can happen when interrupts are
  off, then we may need this protection to prevent that type of
  recursion.  (This will be much less scary in 3.20, because userspace
  int3 instructions will no longer execute on the int3 stack.)
 
  Does this execute between the start of the int3 interrupt handler and
  the call of do_int3()?
 
 I doubt it.
 
 The thing I worry about is that, if do_int3 nests inside itself by any
 means (e.g. int3 sends a signal, scheduling for whatever reason
 (really shouldn't happen, but I haven't looked that hard)), then we're
 completely hosed -- the inner int3 will overwrite the outer int3's
 stack frame.  Since I have no idea what Xen upcalls do, I don't know
 whether they can fire inside do_int3.

I thought there's logic in the do_int3 handler (in the assembly code)
that can handle nested int3s.

I'm not sure what xen does though.

-- Steve

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-22 Thread Andy Lutomirski
On Thu, Jan 22, 2015 at 1:16 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Thu, 22 Jan 2015 12:58:00 -0800
 Andy Lutomirski l...@amacapital.net wrote:

 On Thu, Jan 22, 2015 at 12:37 PM, Steven Rostedt rost...@goodmis.org wrote:
  On Thu, 22 Jan 2015 12:24:47 -0800
  Andy Lutomirski l...@amacapital.net wrote:

 
  The other issue, above and beyond RCU, is that we can't let kprobes
  run on the int3 stack.  If Xen upcalls can happen when interrupts are
  off, then we may need this protection to prevent that type of
  recursion.  (This will be much less scary in 3.20, because userspace
  int3 instructions will no longer execute on the int3 stack.)
 
  Does this execute between the start of the int3 interrupt handler and
  the call of do_int3()?

 I doubt it.

 The thing I worry about is that, if do_int3 nests inside itself by any
 means (e.g. int3 sends a signal, scheduling for whatever reason
 (really shouldn't happen, but I haven't looked that hard)), then we're
 completely hosed -- the inner int3 will overwrite the outer int3's
 stack frame.  Since I have no idea what Xen upcalls do, I don't know
 whether they can fire inside do_int3.

 I thought there's logic in the do_int3 handler (in the assembly code)
 that can handle nested int3s.

Nope :(

In 3.20, there's likely to be logic that can handle a single level of
nesting as long as the outer one came from user space.

--Andy

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 2/2] x86/xen: allow privcmd hypercalls to be preempted

2015-01-21 Thread Andy Lutomirski
On Wed, Jan 21, 2015 at 6:17 PM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:
 From: Luis R. Rodriguez mcg...@suse.com

 Xen has support for splitting heavy work work into a series
 of hypercalls, called multicalls, and preempting them through
 what Xen calls continuation [0]. Despite this though without
 CONFIG_PREEMPT preemption won't happen, without preemption
 a system can become pretty useless on heavy handed hypercalls.
 Such is the case for example when creating a  50 GiB HVM guest,
 we can get softlockups [1] with:.

 kernel: [  802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]

 The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check
 (default 120 seconds), on the Xen side in this particular case
 this happens when the following Xen hypervisor code is used:

 xc_domain_set_pod_target() --
   do_memory_op() --
 arch_memory_op() --
   p2m_pod_set_mem_target()
 -- long delay (real or emulated) --

 This happens on arch_memory_op() on the XENMEM_set_pod_target memory
 op even though arch_memory_op() can handle continuation via
 hypercall_create_continuation() for example.

 Machines over 50 GiB of memory are on high demand and hard to come
 by so to help replicate this sort of issue long delays on select
 hypercalls have been emulated in order to be able to test this on
 smaller machines [2].

 On one hand this issue can be considered as expected given that
 CONFIG_PREEMPT=n is used however we have forced voluntary preemption
 precedent practices in the kernel even for CONFIG_PREEMPT=n through
 the usage of cond_resched() sprinkled in many places. To address
 this issue with Xen hypercalls though we need to find a way to aid
 to the schedular in the middle of hypercalls. We are motivated to
 address this issue on CONFIG_PREEMPT=n as otherwise the system becomes
 rather unresponsive for long periods of time; in the worst case, at least
 only currently by emulating long delays on select io disk bound
 hypercalls, this can lead to filesystem corruption if the delay happens
 for example on SCHEDOP_remote_shutdown (when we call 'xl domain shutdown').

 We can address this problem by trying to check if we should schedule
 on the xen timer in the middle of a hypercall on the return from the
 timer interrupt. We want to be careful to not always force voluntary
 preemption though so to do this we only selectively enable preemption
 on very specific xen hypercalls.

 This enables hypercall preemption by selectively forcing checks for
 voluntary preempting only on ioctl initiated private hypercalls
 where we know some folks have run into reported issues [1].

 [0] 
 http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
 [1] https://bugzilla.novell.com/show_bug.cgi?id=861093
 [2] 
 http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch

 Based on original work by: David Vrabel david.vra...@citrix.com
 Suggested-by: Andy Lutomirski l...@amacapital.net
 Cc: Andy Lutomirski l...@amacapital.net
 Cc: Borislav Petkov b...@suse.de
 Cc: David Vrabel david.vra...@citrix.com
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@redhat.com
 Cc: H. Peter Anvin h...@zytor.com
 Cc: x...@kernel.org
 Cc: Steven Rostedt rost...@goodmis.org
 Cc: Masami Hiramatsu masami.hiramatsu...@hitachi.com
 Cc: Jan Beulich jbeul...@suse.com
 Cc: linux-ker...@vger.kernel.org
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  arch/x86/kernel/entry_32.S   |  2 ++
  arch/x86/kernel/entry_64.S   |  2 ++
  drivers/xen/events/events_base.c | 13 +
  include/xen/events.h |  1 +
  4 files changed, 18 insertions(+)

 diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
 index 000d419..b4b1f42 100644
 --- a/arch/x86/kernel/entry_32.S
 +++ b/arch/x86/kernel/entry_32.S
 @@ -982,6 +982,8 @@ ENTRY(xen_hypervisor_callback)
  ENTRY(xen_do_upcall)
  1: mov %esp, %eax
 call xen_evtchn_do_upcall
 +   movl %esp,%eax
 +   call xen_end_upcall
 jmp  ret_from_intr
 CFI_ENDPROC
  ENDPROC(xen_hypervisor_callback)
 diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
 index 9ebaf63..ee28733 100644
 --- a/arch/x86/kernel/entry_64.S
 +++ b/arch/x86/kernel/entry_64.S
 @@ -1198,6 +1198,8 @@ ENTRY(xen_do_hypervisor_callback)   # 
 do_hypervisor_callback(struct *pt_regs)
 popq %rsp
 CFI_DEF_CFA_REGISTER rsp
 decl PER_CPU_VAR(irq_count)
 +   movq %rsp, %rdi  /* pass pt_regs as first argument */
 +   call xen_end_upcall
 jmp  error_exit
 CFI_ENDPROC
  END(xen_do_hypervisor_callback)
 diff --git a/drivers/xen/events/events_base.c 
 b/drivers/xen/events/events_base.c
 index b4bca2d..23c526b 100644
 --- a/drivers/xen/events/events_base.c
 +++ b/drivers/xen/events/events_base.c
 @@ -32,6 +32,8 @@
  #include linux/slab.h
  #include linux/irqnr.h
  #include linux/pci.h