Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On Thu, Dec 18, 2014 at 02:23:14PM -0500, Konrad Rzeszutek Wilk wrote: > > index 000..b5a3e98 > > --- /dev/null > > +++ b/drivers/xen/preempt.c > > @@ -0,0 +1,17 @@ > > +/* > > + * Preemptible hypercalls > > + * > > + * Copyright (C) 2014 Citrix Systems R&D ltd. > > + * > > + * This source code is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU General Public License as > > + * published by the Free Software Foundation; either version 2 of the > > + * License, or (at your option) any later version. > > + */ > > + > > +#include > > + > > +#ifndef CONFIG_PREEMPT > > +DEFINE_PER_CPU(bool, xen_in_preemptible_hcall); > > +EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall); > > +#endif > > Please also add this in the patch: > > > diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c > index b5a3e98..5d773dc 100644 > --- a/drivers/xen/preempt.c > +++ b/drivers/xen/preempt.c > @@ -13,5 +13,5 @@ > > #ifndef CONFIG_PREEMPT > DEFINE_PER_CPU(bool, xen_in_preemptible_hcall); > -EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall); > +EXPORT_PER_CPU_SYMBOL_GPL(xen_in_preemptible_hcall); > #endif Ammended, although I think we want another approach now based on the recommendations by Andy Lutomirski. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
> index 000..b5a3e98 > --- /dev/null > +++ b/drivers/xen/preempt.c > @@ -0,0 +1,17 @@ > +/* > + * Preemptible hypercalls > + * > + * Copyright (C) 2014 Citrix Systems R&D ltd. > + * > + * This source code is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License as > + * published by the Free Software Foundation; either version 2 of the > + * License, or (at your option) any later version. > + */ > + > +#include > + > +#ifndef CONFIG_PREEMPT > +DEFINE_PER_CPU(bool, xen_in_preemptible_hcall); > +EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall); > +#endif Please also add this in the patch: diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c index b5a3e98..5d773dc 100644 --- a/drivers/xen/preempt.c +++ b/drivers/xen/preempt.c @@ -13,5 +13,5 @@ #ifndef CONFIG_PREEMPT DEFINE_PER_CPU(bool, xen_in_preemptible_hcall); -EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall); +EXPORT_PER_CPU_SYMBOL_GPL(xen_in_preemptible_hcall); #endif ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On Thu, Dec 11, 2014 at 11:09:42AM +, David Vrabel wrote: > On 10/12/14 23:51, Andy Lutomirski wrote: > > On Wed, Dec 10, 2014 at 3:34 PM, Luis R. Rodriguez > > All that being said, this is IMO a bit gross. You've added a bunch of > > asm that's kind of like a parallel error_exit, and the error entry and > > exit code is hairy enough that this scares me. Can you do this mostly > > in C instead? This would look a nicer if it could be: > > I abandoned my initial attempt that looked like this because I thought > it was gross too. > > > call xen_evtchn_do_upcall > > popq %rsp > > CFI_DEF_CFA_REGISTER rsp > > decl PER_CPU_VAR(irq_count) > > + call xen_end_upcall > > jmp error_exit > > > > Where xen_end_upcall would be witten in C, nokprobes and notrace (if > > needed) and would check pt_regs and whatever else and just call > > schedule if needed? > > Oh that's a good idea, thanks! David, are you going to respin yourself with the goal to get this upstream? If so I can move on with life on other matters. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On Thu, Dec 11, 2014 at 10:47:44AM -0800, H. Peter Anvin wrote: > On 12/10/2014 05:03 PM, Luis R. Rodriguez wrote: > > > > This is an issue onloy for for non*-preemptive kernels. > > > > Some of Xen's hypercalls can take a long time and unfortunately for > > *non*-preemptive kernels this can be quite a bit of an issue. > > We've handled situations like this with cond_resched() before which will > > push even *non*-preemptive kernels to behave as voluntarily preemptive, > > I was not aware to what extent this was done and precedents set but > > its pretety widespread now... this then just addresses once particular > > case where this is also an issuefor but now in IRQ context. > > > > I agree its a hack but so are all the other cond_reshed() calls then. > > I don't think its a good idea to be spreading use of something like > > this everywhere but after careful review and trying toa void this > > exact code for a while I have not been able to find any other reasonable > > alternative. > > > > This sounds like a patch that is completely unrelated to the rest of the > patch. If you mean architecture and design then yes however this patch tries to look for a resolution with the existing architecture. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On 12/10/2014 05:03 PM, Luis R. Rodriguez wrote: > > This is an issue onloy for for non*-preemptive kernels. > > Some of Xen's hypercalls can take a long time and unfortunately for > *non*-preemptive kernels this can be quite a bit of an issue. > We've handled situations like this with cond_resched() before which will > push even *non*-preemptive kernels to behave as voluntarily preemptive, > I was not aware to what extent this was done and precedents set but > its pretety widespread now... this then just addresses once particular > case where this is also an issuefor but now in IRQ context. > > I agree its a hack but so are all the other cond_reshed() calls then. > I don't think its a good idea to be spreading use of something like > this everywhere but after careful review and trying toa void this > exact code for a while I have not been able to find any other reasonable > alternative. > This sounds like a patch that is completely unrelated to the rest of the patch. -hpa ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On 10/12/14 23:51, Andy Lutomirski wrote: > On Wed, Dec 10, 2014 at 3:34 PM, Luis R. Rodriguez >> --- a/arch/x86/kernel/entry_64.S >> +++ b/arch/x86/kernel/entry_64.S >> @@ -1170,7 +1170,23 @@ ENTRY(xen_do_hypervisor_callback) # >> do_hypervisor_callback(struct *pt_regs) >> popq %rsp >> CFI_DEF_CFA_REGISTER rsp >> decl PER_CPU_VAR(irq_count) >> +#ifdef CONFIG_PREEMPT >> jmp error_exit >> +#else >> + movl %ebx, %eax >> + RESTORE_REST >> + DISABLE_INTERRUPTS(CLBR_NONE) >> + TRACE_IRQS_OFF >> + GET_THREAD_INFO(%rcx) >> + testl %eax, %eax >> + je error_exit_user >> + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) >> + jz retint_kernel > > I think I understand this part. > >> + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > > Why? Is the issue that, if preemptible hypercalls nest, you don't > want to preempt again? We need to clear and reset this per-cpu variable around the schedule point since the current task may be rescheduled on a different CPU, or we may switch to a task that was previously preempted at this point. That this prevents nested preemption is fine because we only want hypercalls issued by the privcmd driver on behalf of userspace to be preemptible. >> + call cond_resched_irq > > On !CONFIG_PREEMPT, there's no preempt_disable, right? So how do you > guarantee that you don't preempt something you shouldn't? Is the idea > that these events will only fire nested *directly* inside a > preemptible hypercall? Also, should you check that IRQs were on when > the event fired? (Are they on in pt_regs?) Testing xen_in_preemptible_hcall is sufficient. We bracket the hypercalls we want to be preemptible like so: xen_preemptible_hcall_begin(); ret = privcmd_call(hypercall.op, hypercall.arg[0], hypercall.arg[1], hypercall.arg[2], hypercall.arg[3], hypercall.arg[4]); xen_preemptible_hcall_end(); begin() and end() are somewhat like a Xen-specific prempt_enable() and preempt_disable(), overriding the default no-preempt state. >> + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall) >> + jmp retint_kernel >> +#endif /* CONFIG_PREEMPT */ >> CFI_ENDPROC > > All that being said, this is IMO a bit gross. You've added a bunch of > asm that's kind of like a parallel error_exit, and the error entry and > exit code is hairy enough that this scares me. Can you do this mostly > in C instead? This would look a nicer if it could be: I abandoned my initial attempt that looked like this because I thought it was gross too. > call xen_evtchn_do_upcall > popq %rsp > CFI_DEF_CFA_REGISTER rsp > decl PER_CPU_VAR(irq_count) > + call xen_end_upcall > jmp error_exit > > Where xen_end_upcall would be witten in C, nokprobes and notrace (if > needed) and would check pt_regs and whatever else and just call > schedule if needed? Oh that's a good idea, thanks! David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On Wed, Dec 10, 2014 at 4:55 PM, Luis R. Rodriguez wrote: > On Wed, Dec 10, 2014 at 03:51:48PM -0800, Andy Lutomirski wrote: >> On Wed, Dec 10, 2014 at 3:34 PM, Luis R. Rodriguez >> wrote: >> > From: "Luis R. Rodriguez" >> > >> > Xen has support for splitting heavy work work into a series >> > of hypercalls, called multicalls, and preempting them through >> > what Xen calls continuation [0]. Despite this though without >> > CONFIG_PREEMPT preemption won't happen and while enabling >> > CONFIG_RT_GROUP_SCHED can at times help its not enough to >> > make a system usable. Such is the case for example when >> > creating a > 50 GiB HVM guest, we can get softlockups [1] with:. >> > >> > kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] >> > >> > The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check >> > (default 120 seconds), on the Xen side in this particular case >> > this happens when the following Xen hypervisor code is used: >> > >> > xc_domain_set_pod_target() --> >> > do_memory_op() --> >> > arch_memory_op() --> >> > p2m_pod_set_mem_target() >> > -- long delay (real or emulated) -- >> > >> > This happens on arch_memory_op() on the XENMEM_set_pod_target memory >> > op even though arch_memory_op() can handle continuation via >> > hypercall_create_continuation() for example. >> > >> > Machines over 50 GiB of memory are on high demand and hard to come >> > by so to help replicate this sort of issue long delays on select >> > hypercalls have been emulated in order to be able to test this on >> > smaller machines [2]. >> > >> > On one hand this issue can be considered as expected given that >> > CONFIG_PREEMPT=n is used however we have forced voluntary preemption >> > precedent practices in the kernel even for CONFIG_PREEMPT=n through >> > the usage of cond_resched() sprinkled in many places. To address >> > this issue with Xen hypercalls though we need to find a way to aid >> > to the schedular in the middle of hypercalls. We are motivated to >> > address this issue on CONFIG_PREEMPT=n as otherwise the system becomes >> > rather unresponsive for long periods of time; in the worst case, at least >> > only currently by emulating long delays on select io disk bound >> > hypercalls, this can lead to filesystem corruption if the delay happens >> > for example on SCHEDOP_remote_shutdown (when we call 'xl >> > shutdown'). >> > >> > We can address this problem by trying to check if we should schedule >> > on the xen timer in the middle of a hypercall on the return from the >> > timer interrupt. We want to be careful to not always force voluntary >> > preemption though so to do this we only selectively enable preemption >> > on very specific xen hypercalls. >> > >> > This enables hypercall preemption by selectively forcing checks for >> > voluntary preempting only on ioctl initiated private hypercalls >> > where we know some folks have run into reported issues [1]. >> > >> > [0] >> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 >> > [1] https://bugzilla.novell.com/show_bug.cgi?id=861093 >> > [2] >> > http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch >> > >> > Based on original work by: David Vrabel >> > Cc: Borislav Petkov >> > Cc: David Vrabel >> > Cc: Thomas Gleixner >> > Cc: Ingo Molnar >> > Cc: "H. Peter Anvin" >> > Cc: x...@kernel.org >> > Cc: Andy Lutomirski >> > Cc: Steven Rostedt >> > Cc: Masami Hiramatsu >> > Cc: Jan Beulich >> > Cc: linux-ker...@vger.kernel.org >> > Signed-off-by: Luis R. Rodriguez >> > --- >> > arch/x86/kernel/entry_32.S | 21 + >> > arch/x86/kernel/entry_64.S | 17 + >> > drivers/xen/Makefile | 2 +- >> > drivers/xen/preempt.c | 17 + >> > drivers/xen/privcmd.c | 2 ++ >> > include/xen/xen-ops.h | 26 ++ >> > 6 files changed, 84 insertions(+), 1 deletion(-) >> > create mode 100644 drivers/xen/preempt.c >> > >> > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S >> > index 344b63f..40b5c0c 100644 >> > --- a/arch/x86/kernel/entry_32.S >> > +++ b/arch/x86/kernel/entry_32.S >> > @@ -982,7 +982,28 @@ ENTRY(xen_hypervisor_callback) >> > ENTRY(xen_do_upcall) >> > 1: mov %esp, %eax >> > call xen_evtchn_do_upcall >> > +#ifdef CONFIG_PREEMPT >> > jmp ret_from_intr >> > +#else >> > + GET_THREAD_INFO(%ebp) >> > +#ifdef CONFIG_VM86 >> > + movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS >> > + movb PT_CS(%esp), %al >> > + andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax >> > +#else >> > + movl PT_CS(%esp), %eax >> > + andl $SEGMENT_RPL_MASK, %eax >> > +#endif >> > + cmpl $USER_RPL, %eax >> > + jae resume_userspace# returning to v8086 or userspace >> > + DISABLE_INTERRUPTS(CLBR_ANY) >> > + cmpb $0,
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On Wed, Dec 10, 2014 at 04:29:06PM -0800, H. Peter Anvin wrote: > On 12/10/2014 03:34 PM, Luis R. Rodriguez wrote: > > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S > > index 344b63f..40b5c0c 100644 > > --- a/arch/x86/kernel/entry_32.S > > +++ b/arch/x86/kernel/entry_32.S > > @@ -982,7 +982,28 @@ ENTRY(xen_hypervisor_callback) > > ENTRY(xen_do_upcall) > > 1: mov %esp, %eax > > call xen_evtchn_do_upcall > > +#ifdef CONFIG_PREEMPT > > jmp ret_from_intr > > +#else > > + GET_THREAD_INFO(%ebp) > > +#ifdef CONFIG_VM86 > > + movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS > > + movb PT_CS(%esp), %al > > + andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax > > +#else > > + movl PT_CS(%esp), %eax > > + andl $SEGMENT_RPL_MASK, %eax > > +#endif > > + cmpl $USER_RPL, %eax > > + jae resume_userspace# returning to v8086 or userspace > > + DISABLE_INTERRUPTS(CLBR_ANY) > > + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > > + jz resume_kernel > > + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > > + call cond_resched_irq > > + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall) > > + jmp resume_kernel > > +#endif /* CONFIG_PREEMPT */ > > CFI_ENDPROC > > ENDPROC(xen_hypervisor_callback) > > > > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > > index c0226ab..0ccdd06 100644 > > --- a/arch/x86/kernel/entry_64.S > > +++ b/arch/x86/kernel/entry_64.S > > @@ -1170,7 +1170,23 @@ ENTRY(xen_do_hypervisor_callback) # > > do_hypervisor_callback(struct *pt_regs) > > popq %rsp > > CFI_DEF_CFA_REGISTER rsp > > decl PER_CPU_VAR(irq_count) > > +#ifdef CONFIG_PREEMPT > > jmp error_exit > > +#else > > + movl %ebx, %eax > > + RESTORE_REST > > + DISABLE_INTERRUPTS(CLBR_NONE) > > + TRACE_IRQS_OFF > > + GET_THREAD_INFO(%rcx) > > + testl %eax, %eax > > + je error_exit_user > > + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > > + jz retint_kernel > > + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > > + call cond_resched_irq > > + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall) > > + jmp retint_kernel > > +#endif /* CONFIG_PREEMPT */ > > CFI_ENDPROC > > END(xen_do_hypervisor_callback) > > > > @@ -1398,6 +1414,7 @@ ENTRY(error_exit) > > GET_THREAD_INFO(%rcx) > > testl %eax,%eax > > jne retint_kernel > > +error_exit_user: > > LOCKDEP_SYS_EXIT_IRQ > > movl TI_flags(%rcx),%edx > > movl $_TIF_WORK_MASK,%edi > > You're adding a bunch of code for the *non*-preemptive case here... why? This is an issue onloy for for non*-preemptive kernels. Some of Xen's hypercalls can take a long time and unfortunately for *non*-preemptive kernels this can be quite a bit of an issue. We've handled situations like this with cond_resched() before which will push even *non*-preemptive kernels to behave as voluntarily preemptive, I was not aware to what extent this was done and precedents set but its pretety widespread now... this then just addresses once particular case where this is also an issuefor but now in IRQ context. I agree its a hack but so are all the other cond_reshed() calls then. I don't think its a good idea to be spreading use of something like this everywhere but after careful review and trying toa void this exact code for a while I have not been able to find any other reasonable alternative. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On Wed, Dec 10, 2014 at 03:51:48PM -0800, Andy Lutomirski wrote: > On Wed, Dec 10, 2014 at 3:34 PM, Luis R. Rodriguez > wrote: > > From: "Luis R. Rodriguez" > > > > Xen has support for splitting heavy work work into a series > > of hypercalls, called multicalls, and preempting them through > > what Xen calls continuation [0]. Despite this though without > > CONFIG_PREEMPT preemption won't happen and while enabling > > CONFIG_RT_GROUP_SCHED can at times help its not enough to > > make a system usable. Such is the case for example when > > creating a > 50 GiB HVM guest, we can get softlockups [1] with:. > > > > kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] > > > > The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check > > (default 120 seconds), on the Xen side in this particular case > > this happens when the following Xen hypervisor code is used: > > > > xc_domain_set_pod_target() --> > > do_memory_op() --> > > arch_memory_op() --> > > p2m_pod_set_mem_target() > > -- long delay (real or emulated) -- > > > > This happens on arch_memory_op() on the XENMEM_set_pod_target memory > > op even though arch_memory_op() can handle continuation via > > hypercall_create_continuation() for example. > > > > Machines over 50 GiB of memory are on high demand and hard to come > > by so to help replicate this sort of issue long delays on select > > hypercalls have been emulated in order to be able to test this on > > smaller machines [2]. > > > > On one hand this issue can be considered as expected given that > > CONFIG_PREEMPT=n is used however we have forced voluntary preemption > > precedent practices in the kernel even for CONFIG_PREEMPT=n through > > the usage of cond_resched() sprinkled in many places. To address > > this issue with Xen hypercalls though we need to find a way to aid > > to the schedular in the middle of hypercalls. We are motivated to > > address this issue on CONFIG_PREEMPT=n as otherwise the system becomes > > rather unresponsive for long periods of time; in the worst case, at least > > only currently by emulating long delays on select io disk bound > > hypercalls, this can lead to filesystem corruption if the delay happens > > for example on SCHEDOP_remote_shutdown (when we call 'xl > > shutdown'). > > > > We can address this problem by trying to check if we should schedule > > on the xen timer in the middle of a hypercall on the return from the > > timer interrupt. We want to be careful to not always force voluntary > > preemption though so to do this we only selectively enable preemption > > on very specific xen hypercalls. > > > > This enables hypercall preemption by selectively forcing checks for > > voluntary preempting only on ioctl initiated private hypercalls > > where we know some folks have run into reported issues [1]. > > > > [0] > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 > > [1] https://bugzilla.novell.com/show_bug.cgi?id=861093 > > [2] > > http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch > > > > Based on original work by: David Vrabel > > Cc: Borislav Petkov > > Cc: David Vrabel > > Cc: Thomas Gleixner > > Cc: Ingo Molnar > > Cc: "H. Peter Anvin" > > Cc: x...@kernel.org > > Cc: Andy Lutomirski > > Cc: Steven Rostedt > > Cc: Masami Hiramatsu > > Cc: Jan Beulich > > Cc: linux-ker...@vger.kernel.org > > Signed-off-by: Luis R. Rodriguez > > --- > > arch/x86/kernel/entry_32.S | 21 + > > arch/x86/kernel/entry_64.S | 17 + > > drivers/xen/Makefile | 2 +- > > drivers/xen/preempt.c | 17 + > > drivers/xen/privcmd.c | 2 ++ > > include/xen/xen-ops.h | 26 ++ > > 6 files changed, 84 insertions(+), 1 deletion(-) > > create mode 100644 drivers/xen/preempt.c > > > > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S > > index 344b63f..40b5c0c 100644 > > --- a/arch/x86/kernel/entry_32.S > > +++ b/arch/x86/kernel/entry_32.S > > @@ -982,7 +982,28 @@ ENTRY(xen_hypervisor_callback) > > ENTRY(xen_do_upcall) > > 1: mov %esp, %eax > > call xen_evtchn_do_upcall > > +#ifdef CONFIG_PREEMPT > > jmp ret_from_intr > > +#else > > + GET_THREAD_INFO(%ebp) > > +#ifdef CONFIG_VM86 > > + movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS > > + movb PT_CS(%esp), %al > > + andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax > > +#else > > + movl PT_CS(%esp), %eax > > + andl $SEGMENT_RPL_MASK, %eax > > +#endif > > + cmpl $USER_RPL, %eax > > + jae resume_userspace# returning to v8086 or userspace > > + DISABLE_INTERRUPTS(CLBR_ANY) > > + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > > + jz resume_kernel > > + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > > + call cond_resched_irq > > +
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On 12/10/2014 03:34 PM, Luis R. Rodriguez wrote: > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S > index 344b63f..40b5c0c 100644 > --- a/arch/x86/kernel/entry_32.S > +++ b/arch/x86/kernel/entry_32.S > @@ -982,7 +982,28 @@ ENTRY(xen_hypervisor_callback) > ENTRY(xen_do_upcall) > 1: mov %esp, %eax > call xen_evtchn_do_upcall > +#ifdef CONFIG_PREEMPT > jmp ret_from_intr > +#else > + GET_THREAD_INFO(%ebp) > +#ifdef CONFIG_VM86 > + movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS > + movb PT_CS(%esp), %al > + andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax > +#else > + movl PT_CS(%esp), %eax > + andl $SEGMENT_RPL_MASK, %eax > +#endif > + cmpl $USER_RPL, %eax > + jae resume_userspace# returning to v8086 or userspace > + DISABLE_INTERRUPTS(CLBR_ANY) > + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > + jz resume_kernel > + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > + call cond_resched_irq > + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall) > + jmp resume_kernel > +#endif /* CONFIG_PREEMPT */ > CFI_ENDPROC > ENDPROC(xen_hypervisor_callback) > > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > index c0226ab..0ccdd06 100644 > --- a/arch/x86/kernel/entry_64.S > +++ b/arch/x86/kernel/entry_64.S > @@ -1170,7 +1170,23 @@ ENTRY(xen_do_hypervisor_callback) # > do_hypervisor_callback(struct *pt_regs) > popq %rsp > CFI_DEF_CFA_REGISTER rsp > decl PER_CPU_VAR(irq_count) > +#ifdef CONFIG_PREEMPT > jmp error_exit > +#else > + movl %ebx, %eax > + RESTORE_REST > + DISABLE_INTERRUPTS(CLBR_NONE) > + TRACE_IRQS_OFF > + GET_THREAD_INFO(%rcx) > + testl %eax, %eax > + je error_exit_user > + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > + jz retint_kernel > + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > + call cond_resched_irq > + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall) > + jmp retint_kernel > +#endif /* CONFIG_PREEMPT */ > CFI_ENDPROC > END(xen_do_hypervisor_callback) > > @@ -1398,6 +1414,7 @@ ENTRY(error_exit) > GET_THREAD_INFO(%rcx) > testl %eax,%eax > jne retint_kernel > +error_exit_user: > LOCKDEP_SYS_EXIT_IRQ > movl TI_flags(%rcx),%edx > movl $_TIF_WORK_MASK,%edi You're adding a bunch of code for the *non*-preemptive case here... why? -hpa ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
On Wed, Dec 10, 2014 at 3:34 PM, Luis R. Rodriguez wrote: > From: "Luis R. Rodriguez" > > Xen has support for splitting heavy work work into a series > of hypercalls, called multicalls, and preempting them through > what Xen calls continuation [0]. Despite this though without > CONFIG_PREEMPT preemption won't happen and while enabling > CONFIG_RT_GROUP_SCHED can at times help its not enough to > make a system usable. Such is the case for example when > creating a > 50 GiB HVM guest, we can get softlockups [1] with:. > > kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] > > The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check > (default 120 seconds), on the Xen side in this particular case > this happens when the following Xen hypervisor code is used: > > xc_domain_set_pod_target() --> > do_memory_op() --> > arch_memory_op() --> > p2m_pod_set_mem_target() > -- long delay (real or emulated) -- > > This happens on arch_memory_op() on the XENMEM_set_pod_target memory > op even though arch_memory_op() can handle continuation via > hypercall_create_continuation() for example. > > Machines over 50 GiB of memory are on high demand and hard to come > by so to help replicate this sort of issue long delays on select > hypercalls have been emulated in order to be able to test this on > smaller machines [2]. > > On one hand this issue can be considered as expected given that > CONFIG_PREEMPT=n is used however we have forced voluntary preemption > precedent practices in the kernel even for CONFIG_PREEMPT=n through > the usage of cond_resched() sprinkled in many places. To address > this issue with Xen hypercalls though we need to find a way to aid > to the schedular in the middle of hypercalls. We are motivated to > address this issue on CONFIG_PREEMPT=n as otherwise the system becomes > rather unresponsive for long periods of time; in the worst case, at least > only currently by emulating long delays on select io disk bound > hypercalls, this can lead to filesystem corruption if the delay happens > for example on SCHEDOP_remote_shutdown (when we call 'xl shutdown'). > > We can address this problem by trying to check if we should schedule > on the xen timer in the middle of a hypercall on the return from the > timer interrupt. We want to be careful to not always force voluntary > preemption though so to do this we only selectively enable preemption > on very specific xen hypercalls. > > This enables hypercall preemption by selectively forcing checks for > voluntary preempting only on ioctl initiated private hypercalls > where we know some folks have run into reported issues [1]. > > [0] > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 > [1] https://bugzilla.novell.com/show_bug.cgi?id=861093 > [2] > http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch > > Based on original work by: David Vrabel > Cc: Borislav Petkov > Cc: David Vrabel > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x...@kernel.org > Cc: Andy Lutomirski > Cc: Steven Rostedt > Cc: Masami Hiramatsu > Cc: Jan Beulich > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Luis R. Rodriguez > --- > arch/x86/kernel/entry_32.S | 21 + > arch/x86/kernel/entry_64.S | 17 + > drivers/xen/Makefile | 2 +- > drivers/xen/preempt.c | 17 + > drivers/xen/privcmd.c | 2 ++ > include/xen/xen-ops.h | 26 ++ > 6 files changed, 84 insertions(+), 1 deletion(-) > create mode 100644 drivers/xen/preempt.c > > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S > index 344b63f..40b5c0c 100644 > --- a/arch/x86/kernel/entry_32.S > +++ b/arch/x86/kernel/entry_32.S > @@ -982,7 +982,28 @@ ENTRY(xen_hypervisor_callback) > ENTRY(xen_do_upcall) > 1: mov %esp, %eax > call xen_evtchn_do_upcall > +#ifdef CONFIG_PREEMPT > jmp ret_from_intr > +#else > + GET_THREAD_INFO(%ebp) > +#ifdef CONFIG_VM86 > + movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS > + movb PT_CS(%esp), %al > + andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax > +#else > + movl PT_CS(%esp), %eax > + andl $SEGMENT_RPL_MASK, %eax > +#endif > + cmpl $USER_RPL, %eax > + jae resume_userspace# returning to v8086 or userspace > + DISABLE_INTERRUPTS(CLBR_ANY) > + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > + jz resume_kernel > + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) > + call cond_resched_irq > + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall) > + jmp resume_kernel > +#endif /* CONFIG_PREEMPT */ > CFI_ENDPROC > ENDPROC(xen_hypervisor_callback) > > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > index c0226ab..0ccdd06 100644 > --- a/arch/x86/kernel/e
[Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted
From: "Luis R. Rodriguez" Xen has support for splitting heavy work work into a series of hypercalls, called multicalls, and preempting them through what Xen calls continuation [0]. Despite this though without CONFIG_PREEMPT preemption won't happen and while enabling CONFIG_RT_GROUP_SCHED can at times help its not enough to make a system usable. Such is the case for example when creating a > 50 GiB HVM guest, we can get softlockups [1] with:. kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check (default 120 seconds), on the Xen side in this particular case this happens when the following Xen hypervisor code is used: xc_domain_set_pod_target() --> do_memory_op() --> arch_memory_op() --> p2m_pod_set_mem_target() -- long delay (real or emulated) -- This happens on arch_memory_op() on the XENMEM_set_pod_target memory op even though arch_memory_op() can handle continuation via hypercall_create_continuation() for example. Machines over 50 GiB of memory are on high demand and hard to come by so to help replicate this sort of issue long delays on select hypercalls have been emulated in order to be able to test this on smaller machines [2]. On one hand this issue can be considered as expected given that CONFIG_PREEMPT=n is used however we have forced voluntary preemption precedent practices in the kernel even for CONFIG_PREEMPT=n through the usage of cond_resched() sprinkled in many places. To address this issue with Xen hypercalls though we need to find a way to aid to the schedular in the middle of hypercalls. We are motivated to address this issue on CONFIG_PREEMPT=n as otherwise the system becomes rather unresponsive for long periods of time; in the worst case, at least only currently by emulating long delays on select io disk bound hypercalls, this can lead to filesystem corruption if the delay happens for example on SCHEDOP_remote_shutdown (when we call 'xl shutdown'). We can address this problem by trying to check if we should schedule on the xen timer in the middle of a hypercall on the return from the timer interrupt. We want to be careful to not always force voluntary preemption though so to do this we only selectively enable preemption on very specific xen hypercalls. This enables hypercall preemption by selectively forcing checks for voluntary preempting only on ioctl initiated private hypercalls where we know some folks have run into reported issues [1]. [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9 [1] https://bugzilla.novell.com/show_bug.cgi?id=861093 [2] http://ftp.suse.com/pub/people/mcgrof/xen/emulate-long-xen-hypercalls.patch Based on original work by: David Vrabel Cc: Borislav Petkov Cc: David Vrabel Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: Andy Lutomirski Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Jan Beulich Cc: linux-ker...@vger.kernel.org Signed-off-by: Luis R. Rodriguez --- arch/x86/kernel/entry_32.S | 21 + arch/x86/kernel/entry_64.S | 17 + drivers/xen/Makefile | 2 +- drivers/xen/preempt.c | 17 + drivers/xen/privcmd.c | 2 ++ include/xen/xen-ops.h | 26 ++ 6 files changed, 84 insertions(+), 1 deletion(-) create mode 100644 drivers/xen/preempt.c diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index 344b63f..40b5c0c 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -982,7 +982,28 @@ ENTRY(xen_hypervisor_callback) ENTRY(xen_do_upcall) 1: mov %esp, %eax call xen_evtchn_do_upcall +#ifdef CONFIG_PREEMPT jmp ret_from_intr +#else + GET_THREAD_INFO(%ebp) +#ifdef CONFIG_VM86 + movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS + movb PT_CS(%esp), %al + andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax +#else + movl PT_CS(%esp), %eax + andl $SEGMENT_RPL_MASK, %eax +#endif + cmpl $USER_RPL, %eax + jae resume_userspace# returning to v8086 or userspace + DISABLE_INTERRUPTS(CLBR_ANY) + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall) + jz resume_kernel + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall) + call cond_resched_irq + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall) + jmp resume_kernel +#endif /* CONFIG_PREEMPT */ CFI_ENDPROC ENDPROC(xen_hypervisor_callback) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index c0226ab..0ccdd06 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1170,7 +1170,23 @@ ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) popq %rsp CFI_DEF_CFA_REGISTER rsp decl PER_CPU_VAR(irq_count) +#ifdef CONFIG_PREEMPT jmp error_exit +#e