I got to the bottom of this now:

1. This has been around for a long time.

2. It was not showing up because:

     2.1. The deactivate_fd/reactivate_fd in the poll based IRQ 
controller double up as per-fd reentrancy guard for SIGIO. So while 
SIGIO handler was reentrant (potentially bad), the actual device reading 
and writing were not. This is why this was not blowing up regularly so 
far. Even if there are potential races in modifying the poll structures, 
etc, they are very difficult to trigger.

     2.2. The VTALRM based timer system produced timer interrupts only 
when UML was running or out of idle. This ensured that there is only one 
timer interrupt in flight at any given time.

The new timer subsystem + me trying to move to EPOLL brought the 
gremlins out of the woodwork (which is good, we know that they are there 
now).

I am going to reissue the timer errata + IRQ guard setting a guard only 
around the timer. The per-fd disable/reenable behaviour provides a 
sufficient guard for SIGIO so while you can crudely guard for both (as I 
have in the first version of the reentrancy patch), you do not need to 
do that. A guard for the timer is sufficient.

I now have a fixed version for the EPOLL patch which replicates the 
per-fd old reentrancy guard behavior and has a timer guard. That has 
killed all WARN() and BUG() in my testing so far.

I am going to give all of these some testing for 1-2 days and if I am 
happy with the results resubmit the errata patchset and the revised 
EPOLL IRQ controller patch (as an incremental on top of the errata).

The Epoll controller with the fixes still manages ~ 10%+ better 
performance than the old POLL based one and it will also allow further 
optimizations later on.

A.


On 20/11/15 12:05, Anton Ivanov wrote:
> I have gotten to the bottom of this.
>
> 1. The IRQ handler re-entrancy issue predates the timer patch. Adding a
> simple guard with a WARN_ON_ONCE around the device loop in the
> sig_io_handler catches it in plain 4.3
>
> diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
> index 23cb935..ac0bbce 100644
> --- a/arch/um/kernel/irq.c
> +++ b/arch/um/kernel/irq.c
> @@ -30,12 +30,17 @@ static struct irq_fd **last_irq_ptr = &active_fds;
>
>    extern void free_irqs(void);
>
> +static int in_poll_handler = 0;
> +
>    void sigio_handler(int sig, struct siginfo *unused_si, struct
> uml_pt_regs *regs)
>    {
>           struct irq_fd *irq_fd;
>           int n;
>
> +    WARN_ON_ONCE(in_poll_handler == 1);
> +
>           while (1) {
> +        in_poll_handler = 1;
>                   n = os_waiting_for_events(active_fds);
>                   if (n <= 0) {
>                           if (n == -EINTR)
> @@ -51,6 +56,7 @@ void sigio_handler(int sig, struct siginfo *unused_si,
> struct uml_pt_regs *regs)
>                           }
>                   }
>           }
> +    in_poll_handler = 0;
>
>           free_irqs();
>    }
>
> This is dangerously broken - you can under heavy IO exhaust the stack,
> you can get packets out of order, etc. Most IO is reasonably atomic so
> corruption is not likely, but not impossible (especially if one or more
> drivers are optimized to use multi-read/multi-write).
>
> 2. I cannot catch what is wrong with the current code in signal.c. When
> I read it, it should not produce re-entrancy. But it does.
>
> 3. I found 2-3 minor issues with signal handling and the timer patch
> which I will submit a hot-fix for, including a proper fix for the
> hang-in-sleep issue.
>
> 4. While I can propose a brutal patch for signal.c which sets guards
> against reentrancy which works fine, I suggest we actually get to the
> bottom of this. Why the code in unblock_signals() does not guard
> correctly against that?
>
> A.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to