Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-07 Thread Pavel Machek
Hi!

> On (04/07/17 09:21), Pavel Machek wrote:
> > > spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
> > > additional printk()-s so CPU0 has even more job to do in console_unlock(),
> > > while it still holds the contended spin_lock. and so on; there are
> > > many other examples.
> > > 
> > > so should we declare a "we can spend only 2 seconds in direct printk()
> > > and then must offload printing" rule? I don't think it's much better
> > > than a simpler "we always offload, as long as we think it's safe".
> > 
> > I believe we should do the 2 seconds rule. It allows us to print "some
> > messages delayed" message, so at least whoever is trying to debug the
> > crash will have the hints that he needs to look at the printk system.
> 
> do you mean panic()? in panic() we call console_flush_on_panic(),
> which immediately outputs all pending logbuf messages. printk()
> offloading does not happen there.

Not panic(). I have seen many crashes where we had printk(KERN_ERR)
and then hard hang. And the printk() was really important for debugging.

> > "we always offload, as long as we think it's safe" rule does not
> > really work, as printk() can not detect if it is safe or not.
> 
> but "2 seconds" rule has that "as long as we think it's safe" string
> attached as well. just because we do offloading. which is sometimes
> un-safe. so regardless the timeout value (0 seconds or 2 seconds) we
> still need some sort of a hint from the path that issues printk()
> because that path (panic, kexec, sysrq, etc.) knows for sure when
> things are abnormal. printk() is pretty clueless in this regard.
> /* well, I still think that EMERG loglevel thing is not completely
>  broken. */

Well, at least with my solution you know there are messages that were
not printed.

Yes, you'd still want to switch printk_now() for stuff like
panic(). But if you get it wrong (and you will), at least you will see
the "something is missing here" message in the log. 

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-07 Thread Pavel Machek
Hi!

> On (04/07/17 09:21), Pavel Machek wrote:
> > > spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
> > > additional printk()-s so CPU0 has even more job to do in console_unlock(),
> > > while it still holds the contended spin_lock. and so on; there are
> > > many other examples.
> > > 
> > > so should we declare a "we can spend only 2 seconds in direct printk()
> > > and then must offload printing" rule? I don't think it's much better
> > > than a simpler "we always offload, as long as we think it's safe".
> > 
> > I believe we should do the 2 seconds rule. It allows us to print "some
> > messages delayed" message, so at least whoever is trying to debug the
> > crash will have the hints that he needs to look at the printk system.
> 
> do you mean panic()? in panic() we call console_flush_on_panic(),
> which immediately outputs all pending logbuf messages. printk()
> offloading does not happen there.

Not panic(). I have seen many crashes where we had printk(KERN_ERR)
and then hard hang. And the printk() was really important for debugging.

> > "we always offload, as long as we think it's safe" rule does not
> > really work, as printk() can not detect if it is safe or not.
> 
> but "2 seconds" rule has that "as long as we think it's safe" string
> attached as well. just because we do offloading. which is sometimes
> un-safe. so regardless the timeout value (0 seconds or 2 seconds) we
> still need some sort of a hint from the path that issues printk()
> because that path (panic, kexec, sysrq, etc.) knows for sure when
> things are abnormal. printk() is pretty clueless in this regard.
> /* well, I still think that EMERG loglevel thing is not completely
>  broken. */

Well, at least with my solution you know there are messages that were
not printed.

Yes, you'd still want to switch printk_now() for stuff like
panic(). But if you get it wrong (and you will), at least you will see
the "something is missing here" message in the log. 

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-07 Thread Sergey Senozhatsky
Hello,

On (04/07/17 09:21), Pavel Machek wrote:
> > spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
> > additional printk()-s so CPU0 has even more job to do in console_unlock(),
> > while it still holds the contended spin_lock. and so on; there are
> > many other examples.
> > 
> > so should we declare a "we can spend only 2 seconds in direct printk()
> > and then must offload printing" rule? I don't think it's much better
> > than a simpler "we always offload, as long as we think it's safe".
> 
> I believe we should do the 2 seconds rule. It allows us to print "some
> messages delayed" message, so at least whoever is trying to debug the
> crash will have the hints that he needs to look at the printk system.

do you mean panic()? in panic() we call console_flush_on_panic(),
which immediately outputs all pending logbuf messages. printk()
offloading does not happen there.


> "we always offload, as long as we think it's safe" rule does not
> really work, as printk() can not detect if it is safe or not.

but "2 seconds" rule has that "as long as we think it's safe" string
attached as well. just because we do offloading. which is sometimes
un-safe. so regardless the timeout value (0 seconds or 2 seconds) we
still need some sort of a hint from the path that issues printk()
because that path (panic, kexec, sysrq, etc.) knows for sure when
things are abnormal. printk() is pretty clueless in this regard.
/* well, I still think that EMERG loglevel thing is not completely
 broken. */

-ss


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-07 Thread Sergey Senozhatsky
Hello,

On (04/07/17 09:21), Pavel Machek wrote:
> > spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
> > additional printk()-s so CPU0 has even more job to do in console_unlock(),
> > while it still holds the contended spin_lock. and so on; there are
> > many other examples.
> > 
> > so should we declare a "we can spend only 2 seconds in direct printk()
> > and then must offload printing" rule? I don't think it's much better
> > than a simpler "we always offload, as long as we think it's safe".
> 
> I believe we should do the 2 seconds rule. It allows us to print "some
> messages delayed" message, so at least whoever is trying to debug the
> crash will have the hints that he needs to look at the printk system.

do you mean panic()? in panic() we call console_flush_on_panic(),
which immediately outputs all pending logbuf messages. printk()
offloading does not happen there.


> "we always offload, as long as we think it's safe" rule does not
> really work, as printk() can not detect if it is safe or not.

but "2 seconds" rule has that "as long as we think it's safe" string
attached as well. just because we do offloading. which is sometimes
un-safe. so regardless the timeout value (0 seconds or 2 seconds) we
still need some sort of a hint from the path that issues printk()
because that path (panic, kexec, sysrq, etc.) knows for sure when
things are abnormal. printk() is pretty clueless in this regard.
/* well, I still think that EMERG loglevel thing is not completely
 broken. */

-ss


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-07 Thread Pavel Machek
Hi!

> spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
> additional printk()-s so CPU0 has even more job to do in console_unlock(),
> while it still holds the contended spin_lock. and so on; there are
> many other examples.
> 
> so should we declare a "we can spend only 2 seconds in direct printk()
> and then must offload printing" rule? I don't think it's much better
> than a simpler "we always offload, as long as we think it's safe".

I believe we should do the 2 seconds rule. It allows us to print "some
messages delayed" message, so at least whoever is trying to debug the
crash will have the hints that he needs to look at the printk system.

"we always offload, as long as we think it's safe" rule does not
really work, as printk() can not detect if it is safe or not.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-07 Thread Pavel Machek
Hi!

> spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
> additional printk()-s so CPU0 has even more job to do in console_unlock(),
> while it still holds the contended spin_lock. and so on; there are
> many other examples.
> 
> so should we declare a "we can spend only 2 seconds in direct printk()
> and then must offload printing" rule? I don't think it's much better
> than a simpler "we always offload, as long as we think it's safe".

I believe we should do the 2 seconds rule. It allows us to print "some
messages delayed" message, so at least whoever is trying to debug the
crash will have the hints that he needs to look at the printk system.

"we always offload, as long as we think it's safe" rule does not
really work, as printk() can not detect if it is safe or not.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-06 Thread Sergey Senozhatsky
Hello,

On (04/06/17 19:14), Pavel Machek wrote:
[..]
> > @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  
> > printed_len += log_output(facility, level, lflags, dict, dictlen, text, 
> > text_len);
> >  
> > +   /*
> > +* Emergency level indicates that the system is unstable and, thus,
> > +* we better stop relying on wake_up(printk_kthread) and try to do
> > +* a direct printing.
> > +*/
> > +   if (level == LOGLEVEL_EMERG)
> > +   printk_kthread_disabled = true;
> > +
> > +   set_bit(PRINTK_PENDING_OUTPUT, _pending);
> 
> Messages lower then _EMERG may be important, too.. and usually are,
> for debugging.
> 
> And you keep both code paths, anyway, so they have to work. So you did
> not really "fix" issues you are pointing out -- they still remain
> there for _EMERG and above.

we don't drop messages of lower levels. we just print then from a
schedulable context. once the things go off the rails, and EMERG
is a good hint, I think, we stop being optimismitcs and switch to
a "best effort" mode. that is sort of reasonable. if there is a
flood of EMERG messages that are not actually important and,
basically, are result of a coding error, then, I think, the we
must fix that coding error. I mean, there should be some common
sense, and doing
while (1)
printk(KERN_EMERG "hello\n");
is probably not.


> I agree that printing too much is a problem. Could you just print
> "(messages delayed)" in that case, then wake a kernel thread to [rint
> the rest?

sorry, but what difference would it make?

it's really unclear at what point we should offload printing if we begin
that "we will offload sometimes". for example, I've seen many spin-lock
lockups where printk was involved.

CPU0CPU1CPU2
CPU3

spin_lock()
spin_lock()spin_lock()
printk("foo") // grabs the console_sem
printk("bar")   printk("a")
printk("b")
printk("c")
...
printk("z")
spin_dump() 
spin_dump()
  call_console_drivers() printk()   
 printk()
   serial_driver_write() printk()   
 printk()
spin_lock_irqsave(port->lock)...
 ...
 uart_console_write(...) 
trigger_all_cpu_backtrace() trigger_all_cpu_backtrace()
  serial_driver_putchar()
   while (!txrdy(...))
 cpu_relax()


spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
additional printk()-s so CPU0 has even more job to do in console_unlock(),
while it still holds the contended spin_lock. and so on; there are
many other examples.

so should we declare a "we can spend only 2 seconds in direct printk()
and then must offload printing" rule? I don't think it's much better
than a simpler "we always offload, as long as we think it's safe".

-ss


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-06 Thread Sergey Senozhatsky
Hello,

On (04/06/17 19:14), Pavel Machek wrote:
[..]
> > @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  
> > printed_len += log_output(facility, level, lflags, dict, dictlen, text, 
> > text_len);
> >  
> > +   /*
> > +* Emergency level indicates that the system is unstable and, thus,
> > +* we better stop relying on wake_up(printk_kthread) and try to do
> > +* a direct printing.
> > +*/
> > +   if (level == LOGLEVEL_EMERG)
> > +   printk_kthread_disabled = true;
> > +
> > +   set_bit(PRINTK_PENDING_OUTPUT, _pending);
> 
> Messages lower then _EMERG may be important, too.. and usually are,
> for debugging.
> 
> And you keep both code paths, anyway, so they have to work. So you did
> not really "fix" issues you are pointing out -- they still remain
> there for _EMERG and above.

we don't drop messages of lower levels. we just print then from a
schedulable context. once the things go off the rails, and EMERG
is a good hint, I think, we stop being optimismitcs and switch to
a "best effort" mode. that is sort of reasonable. if there is a
flood of EMERG messages that are not actually important and,
basically, are result of a coding error, then, I think, the we
must fix that coding error. I mean, there should be some common
sense, and doing
while (1)
printk(KERN_EMERG "hello\n");
is probably not.


> I agree that printing too much is a problem. Could you just print
> "(messages delayed)" in that case, then wake a kernel thread to [rint
> the rest?

sorry, but what difference would it make?

it's really unclear at what point we should offload printing if we begin
that "we will offload sometimes". for example, I've seen many spin-lock
lockups where printk was involved.

CPU0CPU1CPU2
CPU3

spin_lock()
spin_lock()spin_lock()
printk("foo") // grabs the console_sem
printk("bar")   printk("a")
printk("b")
printk("c")
...
printk("z")
spin_dump() 
spin_dump()
  call_console_drivers() printk()   
 printk()
   serial_driver_write() printk()   
 printk()
spin_lock_irqsave(port->lock)...
 ...
 uart_console_write(...) 
trigger_all_cpu_backtrace() trigger_all_cpu_backtrace()
  serial_driver_putchar()
   while (!txrdy(...))
 cpu_relax()


spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
additional printk()-s so CPU0 has even more job to do in console_unlock(),
while it still holds the contended spin_lock. and so on; there are
many other examples.

so should we declare a "we can spend only 2 seconds in direct printk()
and then must offload printing" rule? I don't think it's much better
than a simpler "we always offload, as long as we think it's safe".

-ss


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-06 Thread Pavel Machek
Hi!

> printk() is quite complex internally and, basically, it does two
> slightly independent things:
>  a) adds a new message to a kernel log buffer (log_store())
>  b) prints kernel log messages to serial consoles (console_unlock())
> 
> while (a) is guaranteed to be executed by printk(), (b) is not, for a
> variety of reasons, and, unlike log_store(), it comes at a price:
> 
>  1) console_unlock() attempts to flush all pending kernel log messages
> to the console. Thus, it can loop indefinitely.
> 
>  2) while console_unlock() is executed on one particular CPU, printing
> pending kernel log messages, other CPUs can simultaneously append new
> messages to the kernel log buffer.
> 
>  3) the time it takes console_unlock() to print kernel messages also
> depends on the speed of the console -- which may not be fast at all.
> 
>  4) console_unlock() is executed in the same context as printk(), so
> it may be non-preemptible/atomic, which makes 1)-3) dangerous.
> 
> As a result, nobody knows how long a printk() call will take, so
> it's not really safe to call printk() in a number of situations,
> including atomic context, RCU critical sections, interrupt context,
> and more.

You have made good argumentation for not flushing unlimited ammount of
messages from printk() -- okay. But I don't think this is good idea:

> @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   printed_len += log_output(facility, level, lflags, dict, dictlen, text, 
> text_len);
>  
> + /*
> +  * Emergency level indicates that the system is unstable and, thus,
> +  * we better stop relying on wake_up(printk_kthread) and try to do
> +  * a direct printing.
> +  */
> + if (level == LOGLEVEL_EMERG)
> + printk_kthread_disabled = true;
> +
> + set_bit(PRINTK_PENDING_OUTPUT, _pending);

Messages lower then _EMERG may be important, too.. and usually are,
for debugging.

And you keep both code paths, anyway, so they have to work. So you did
not really "fix" issues you are pointing out -- they still remain
there for _EMERG and above.

I agree that printing too much is a problem. Could you just print
"(messages delayed)" in that case, then wake a kernel thread to [rint
the rest?

Pavel


>   logbuf_unlock_irqrestore(flags);
>  
>   /* If called from the scheduler, we can not call up(). */
>   if (!in_sched) {
>   /*
> -  * Try to acquire and then immediately release the console
> -  * semaphore.  The release will print out buffers and wake up
> -  * /dev/kmsg and syslog() users.
> +  * Under heavy printing load/slow serial console/etc
> +  * console_unlock() can stall CPUs, which can result in
> +  * soft/hard-lockups, lost interrupts, RCU stalls, etc.
> +  * Therefore we attempt to print the messages to console
> +  * from a dedicated printk_kthread, which always runs in
> +  * schedulable context.
>*/
> - if (console_trylock())
> - console_unlock();
> + if (printk_kthread_enabled()) {
> + printk_safe_enter_irqsave(flags);
> + wake_up_process(printk_kthread);
> + printk_safe_exit_irqrestore(flags);
> + } else {
> + /*
> +  * Try to acquire and then immediately release the
> +  * console semaphore. The release will print out
> +  * buffers and wake up /dev/kmsg and syslog() users.
> +  */
> + if (console_trylock())
> + console_unlock();
> + }
>   }
>


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-06 Thread Pavel Machek
Hi!

> printk() is quite complex internally and, basically, it does two
> slightly independent things:
>  a) adds a new message to a kernel log buffer (log_store())
>  b) prints kernel log messages to serial consoles (console_unlock())
> 
> while (a) is guaranteed to be executed by printk(), (b) is not, for a
> variety of reasons, and, unlike log_store(), it comes at a price:
> 
>  1) console_unlock() attempts to flush all pending kernel log messages
> to the console. Thus, it can loop indefinitely.
> 
>  2) while console_unlock() is executed on one particular CPU, printing
> pending kernel log messages, other CPUs can simultaneously append new
> messages to the kernel log buffer.
> 
>  3) the time it takes console_unlock() to print kernel messages also
> depends on the speed of the console -- which may not be fast at all.
> 
>  4) console_unlock() is executed in the same context as printk(), so
> it may be non-preemptible/atomic, which makes 1)-3) dangerous.
> 
> As a result, nobody knows how long a printk() call will take, so
> it's not really safe to call printk() in a number of situations,
> including atomic context, RCU critical sections, interrupt context,
> and more.

You have made good argumentation for not flushing unlimited ammount of
messages from printk() -- okay. But I don't think this is good idea:

> @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   printed_len += log_output(facility, level, lflags, dict, dictlen, text, 
> text_len);
>  
> + /*
> +  * Emergency level indicates that the system is unstable and, thus,
> +  * we better stop relying on wake_up(printk_kthread) and try to do
> +  * a direct printing.
> +  */
> + if (level == LOGLEVEL_EMERG)
> + printk_kthread_disabled = true;
> +
> + set_bit(PRINTK_PENDING_OUTPUT, _pending);

Messages lower then _EMERG may be important, too.. and usually are,
for debugging.

And you keep both code paths, anyway, so they have to work. So you did
not really "fix" issues you are pointing out -- they still remain
there for _EMERG and above.

I agree that printing too much is a problem. Could you just print
"(messages delayed)" in that case, then wake a kernel thread to [rint
the rest?

Pavel


>   logbuf_unlock_irqrestore(flags);
>  
>   /* If called from the scheduler, we can not call up(). */
>   if (!in_sched) {
>   /*
> -  * Try to acquire and then immediately release the console
> -  * semaphore.  The release will print out buffers and wake up
> -  * /dev/kmsg and syslog() users.
> +  * Under heavy printing load/slow serial console/etc
> +  * console_unlock() can stall CPUs, which can result in
> +  * soft/hard-lockups, lost interrupts, RCU stalls, etc.
> +  * Therefore we attempt to print the messages to console
> +  * from a dedicated printk_kthread, which always runs in
> +  * schedulable context.
>*/
> - if (console_trylock())
> - console_unlock();
> + if (printk_kthread_enabled()) {
> + printk_safe_enter_irqsave(flags);
> + wake_up_process(printk_kthread);
> + printk_safe_exit_irqrestore(flags);
> + } else {
> + /*
> +  * Try to acquire and then immediately release the
> +  * console semaphore. The release will print out
> +  * buffers and wake up /dev/kmsg and syslog() users.
> +  */
> + if (console_trylock())
> + console_unlock();
> + }
>   }
>


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-04 Thread Sergey Senozhatsky
On (04/04/17 11:01), Petr Mladek wrote:
[..]
> > +static atomic_t printk_emergency __read_mostly;
> > +/*
> > + * Disable printk_kthread permanently. Unlike `oops_in_progress'
> > + * it doesn't go back to 0.
> > + */
> 
> The comment is not valid once we allow to modify the variable using
> the sysfs knob.

it's updated in that patch (sysfs knob introduction).


[..]
> > @@ -2182,6 +2253,7 @@ void console_unlock(void)
> > console_may_schedule = 0;
> >  
> >  again:
> > +   clear_bit(PRINTK_PENDING_OUTPUT, _pending);
> 
> This will not help if new messages appear during
> call_console_drivers().

you are right. wouldn't do much harm (an extra console_unlock() from
printk_kthread in the worst case), but agree.

I added it there because of that "!can_use_console()" branch. not that
I expect printk_kthread being executed on !online CPU, but we might have
no callable consoles.

probably should have that clear_bit() before and after the loop.

-ss


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-04 Thread Sergey Senozhatsky
On (04/04/17 11:01), Petr Mladek wrote:
[..]
> > +static atomic_t printk_emergency __read_mostly;
> > +/*
> > + * Disable printk_kthread permanently. Unlike `oops_in_progress'
> > + * it doesn't go back to 0.
> > + */
> 
> The comment is not valid once we allow to modify the variable using
> the sysfs knob.

it's updated in that patch (sysfs knob introduction).


[..]
> > @@ -2182,6 +2253,7 @@ void console_unlock(void)
> > console_may_schedule = 0;
> >  
> >  again:
> > +   clear_bit(PRINTK_PENDING_OUTPUT, _pending);
> 
> This will not help if new messages appear during
> call_console_drivers().

you are right. wouldn't do much harm (an extra console_unlock() from
printk_kthread in the worst case), but agree.

I added it there because of that "!can_use_console()" branch. not that
I expect printk_kthread being executed on !online CPU, but we might have
no callable consoles.

probably should have that clear_bit() before and after the loop.

-ss


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-04 Thread Petr Mladek
On Wed 2017-03-29 18:25:05, Sergey Senozhatsky wrote:
> This patch introduces a dedicated printing kernel thread - printk_kthread.
> The main purpose of this kthread is to offload printing to a non-atomic
> and always scheduleable context, which eliminates 4) and makes 1)-3) less
> critical. printk() now just appends log messages to the kernel log buffer
> and wake_up()s printk_kthread instead of locking console_sem and calling
> into potentially unsafe console_unlock().
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2d07678e9ff9..ab6b3b2a68c6 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -445,6 +447,42 @@ static char __log_buf[__LOG_BUF_LEN] 
> __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +static struct task_struct *printk_kthread __read_mostly;
> +/*
> + * We can't call into the scheduler (wake_up() printk kthread) during
> + * suspend/kexec/etc. This temporarily switches printk to old behaviour.
> + */
> +static atomic_t printk_emergency __read_mostly;
> +/*
> + * Disable printk_kthread permanently. Unlike `oops_in_progress'
> + * it doesn't go back to 0.
> + */

The comment is not valid once we allow to modify the variable using
the sysfs knob.

> @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   printed_len += log_output(facility, level, lflags, dict, dictlen, text, 
> text_len);
>  
> + /*
> +  * Emergency level indicates that the system is unstable and, thus,
> +  * we better stop relying on wake_up(printk_kthread) and try to do
> +  * a direct printing.
> +  */
> + if (level == LOGLEVEL_EMERG)
> + printk_kthread_disabled = true;
> +
> + set_bit(PRINTK_PENDING_OUTPUT, _pending);
>   logbuf_unlock_irqrestore(flags);
>  
>   /* If called from the scheduler, we can not call up(). */
>   if (!in_sched) {
>   /*
> -  * Try to acquire and then immediately release the console
> -  * semaphore.  The release will print out buffers and wake up
> -  * /dev/kmsg and syslog() users.
> +  * Under heavy printing load/slow serial console/etc
> +  * console_unlock() can stall CPUs, which can result in
> +  * soft/hard-lockups, lost interrupts, RCU stalls, etc.
> +  * Therefore we attempt to print the messages to console
> +  * from a dedicated printk_kthread, which always runs in
> +  * schedulable context.
>*/
> - if (console_trylock())
> - console_unlock();
> + if (printk_kthread_enabled()) {
> + printk_safe_enter_irqsave(flags);
> + wake_up_process(printk_kthread);
> + printk_safe_exit_irqrestore(flags);

I am really happy that we have the printk_safe stuff available!

> + } else {
> + /*
> +  * Try to acquire and then immediately release the
> +  * console semaphore. The release will print out
> +  * buffers and wake up /dev/kmsg and syslog() users.
> +  */
> + if (console_trylock())
> + console_unlock();
> + }
>   }
>  
>   return printed_len;
> @@ -1882,6 +1943,9 @@ static size_t msg_print_text(const struct printk_log 
> *msg,
>bool syslog, char *buf, size_t size) { return 0; }
>  static bool suppress_message_printing(int level) { return false; }
>  
> +void printk_emergency_begin(void) {}
> +void printk_emergency_end(void) {}
> +
>  #endif /* CONFIG_PRINTK */
>  
>  #ifdef CONFIG_EARLY_PRINTK
> @@ -2164,6 +2228,13 @@ void console_unlock(void)
>   bool do_cond_resched, retry;
>  
>   if (console_suspended) {
> + /*
> +  * Avoid an infinite loop in printk_kthread function
> +  * when console_unlock() cannot flush messages because
> +  * we suspended consoles. Someone else will print the
> +  * messages from resume_console().
> +  */
> + clear_bit(PRINTK_PENDING_OUTPUT, _pending);

Great catch!

>   up_console_sem();
>   return;
>   }
> @@ -2182,6 +2253,7 @@ void console_unlock(void)
>   console_may_schedule = 0;
>  
>  again:
> + clear_bit(PRINTK_PENDING_OUTPUT, _pending);

This will not help if new messages appear during
call_console_drivers().

I would move this line after the for(;;) cycle. It will be
cleared when all messages are really handled.

Otherwise, it looks fine to me.

Best Regards,
Petr


Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-04-04 Thread Petr Mladek
On Wed 2017-03-29 18:25:05, Sergey Senozhatsky wrote:
> This patch introduces a dedicated printing kernel thread - printk_kthread.
> The main purpose of this kthread is to offload printing to a non-atomic
> and always scheduleable context, which eliminates 4) and makes 1)-3) less
> critical. printk() now just appends log messages to the kernel log buffer
> and wake_up()s printk_kthread instead of locking console_sem and calling
> into potentially unsafe console_unlock().
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2d07678e9ff9..ab6b3b2a68c6 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -445,6 +447,42 @@ static char __log_buf[__LOG_BUF_LEN] 
> __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +static struct task_struct *printk_kthread __read_mostly;
> +/*
> + * We can't call into the scheduler (wake_up() printk kthread) during
> + * suspend/kexec/etc. This temporarily switches printk to old behaviour.
> + */
> +static atomic_t printk_emergency __read_mostly;
> +/*
> + * Disable printk_kthread permanently. Unlike `oops_in_progress'
> + * it doesn't go back to 0.
> + */

The comment is not valid once we allow to modify the variable using
the sysfs knob.

> @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>   printed_len += log_output(facility, level, lflags, dict, dictlen, text, 
> text_len);
>  
> + /*
> +  * Emergency level indicates that the system is unstable and, thus,
> +  * we better stop relying on wake_up(printk_kthread) and try to do
> +  * a direct printing.
> +  */
> + if (level == LOGLEVEL_EMERG)
> + printk_kthread_disabled = true;
> +
> + set_bit(PRINTK_PENDING_OUTPUT, _pending);
>   logbuf_unlock_irqrestore(flags);
>  
>   /* If called from the scheduler, we can not call up(). */
>   if (!in_sched) {
>   /*
> -  * Try to acquire and then immediately release the console
> -  * semaphore.  The release will print out buffers and wake up
> -  * /dev/kmsg and syslog() users.
> +  * Under heavy printing load/slow serial console/etc
> +  * console_unlock() can stall CPUs, which can result in
> +  * soft/hard-lockups, lost interrupts, RCU stalls, etc.
> +  * Therefore we attempt to print the messages to console
> +  * from a dedicated printk_kthread, which always runs in
> +  * schedulable context.
>*/
> - if (console_trylock())
> - console_unlock();
> + if (printk_kthread_enabled()) {
> + printk_safe_enter_irqsave(flags);
> + wake_up_process(printk_kthread);
> + printk_safe_exit_irqrestore(flags);

I am really happy that we have the printk_safe stuff available!

> + } else {
> + /*
> +  * Try to acquire and then immediately release the
> +  * console semaphore. The release will print out
> +  * buffers and wake up /dev/kmsg and syslog() users.
> +  */
> + if (console_trylock())
> + console_unlock();
> + }
>   }
>  
>   return printed_len;
> @@ -1882,6 +1943,9 @@ static size_t msg_print_text(const struct printk_log 
> *msg,
>bool syslog, char *buf, size_t size) { return 0; }
>  static bool suppress_message_printing(int level) { return false; }
>  
> +void printk_emergency_begin(void) {}
> +void printk_emergency_end(void) {}
> +
>  #endif /* CONFIG_PRINTK */
>  
>  #ifdef CONFIG_EARLY_PRINTK
> @@ -2164,6 +2228,13 @@ void console_unlock(void)
>   bool do_cond_resched, retry;
>  
>   if (console_suspended) {
> + /*
> +  * Avoid an infinite loop in printk_kthread function
> +  * when console_unlock() cannot flush messages because
> +  * we suspended consoles. Someone else will print the
> +  * messages from resume_console().
> +  */
> + clear_bit(PRINTK_PENDING_OUTPUT, _pending);

Great catch!

>   up_console_sem();
>   return;
>   }
> @@ -2182,6 +2253,7 @@ void console_unlock(void)
>   console_may_schedule = 0;
>  
>  again:
> + clear_bit(PRINTK_PENDING_OUTPUT, _pending);

This will not help if new messages appear during
call_console_drivers().

I would move this line after the for(;;) cycle. It will be
cleared when all messages are really handled.

Otherwise, it looks fine to me.

Best Regards,
Petr


[RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-03-29 Thread Sergey Senozhatsky
printk() is quite complex internally and, basically, it does two
slightly independent things:
 a) adds a new message to a kernel log buffer (log_store())
 b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

 1) console_unlock() attempts to flush all pending kernel log messages
to the console. Thus, it can loop indefinitely.

 2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

 3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

 4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
and more.

This patch introduces a dedicated printing kernel thread - printk_kthread.
The main purpose of this kthread is to offload printing to a non-atomic
and always scheduleable context, which eliminates 4) and makes 1)-3) less
critical. printk() now just appends log messages to the kernel log buffer
and wake_up()s printk_kthread instead of locking console_sem and calling
into potentially unsafe console_unlock().

This, however, is not always safe on its own. For example, we can't call
into the scheduler from panic(), because this may cause deadlock. That's
why we introduce a concept of printk_emergency() mode, when printk()
switches back to the old behaviour (console_unlock() from vprintk_emit())
in those cases. In general, this switch happens automatically once a EMERG
log level message appears in the log buffer. Another cases when wake_up()
might not work as expected are suspend, hibernate, etc. For those situations
we provide two new functions:
 -- printk_emergency_begin()
Disables printk offloading. All printk() calls will attempt
to lock the console_sem and, if successful, flush kernel log
messages.

 -- printk_emergency_end()
Enables printk offloading.

Signed-off-by: Sergey Senozhatsky 
Signed-off-by: Jan Kara 
---
 include/linux/console.h |   3 ++
 kernel/printk/printk.c  | 107 
 2 files changed, 102 insertions(+), 8 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 5949d1855589..f1a86944072e 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
 extern void suspend_console(void);
 extern void resume_console(void);
 
+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
 int mda_console_init(void);
 void prom_con_init(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2d07678e9ff9..ab6b3b2a68c6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -402,7 +403,8 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
} while (0)
 
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
  */
 #define PRINTK_PENDING_WAKEUP  0x01
 #define PRINTK_PENDING_OUTPUT  0x02
@@ -445,6 +447,42 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static struct task_struct *printk_kthread __read_mostly;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool printk_kthread_disabled __read_mostly;
+
+static inline bool printk_kthread_enabled(void)
+{
+   return !printk_kthread_disabled &&
+   printk_kthread && atomic_read(_emergency) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not stop the printk_kthread if it's already
+ * printing logbuf messages.
+ */
+void printk_emergency_begin(void)
+{
+   atomic_inc(_emergency);
+}
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+   atomic_dec(_emergency);
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
 
printed_len += log_output(facility, level, lflags, dict, dictlen, text, 

[RFC][PATCHv2 2/8] printk: introduce printing kernel thread

2017-03-29 Thread Sergey Senozhatsky
printk() is quite complex internally and, basically, it does two
slightly independent things:
 a) adds a new message to a kernel log buffer (log_store())
 b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

 1) console_unlock() attempts to flush all pending kernel log messages
to the console. Thus, it can loop indefinitely.

 2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

 3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

 4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
and more.

This patch introduces a dedicated printing kernel thread - printk_kthread.
The main purpose of this kthread is to offload printing to a non-atomic
and always scheduleable context, which eliminates 4) and makes 1)-3) less
critical. printk() now just appends log messages to the kernel log buffer
and wake_up()s printk_kthread instead of locking console_sem and calling
into potentially unsafe console_unlock().

This, however, is not always safe on its own. For example, we can't call
into the scheduler from panic(), because this may cause deadlock. That's
why we introduce a concept of printk_emergency() mode, when printk()
switches back to the old behaviour (console_unlock() from vprintk_emit())
in those cases. In general, this switch happens automatically once a EMERG
log level message appears in the log buffer. Another cases when wake_up()
might not work as expected are suspend, hibernate, etc. For those situations
we provide two new functions:
 -- printk_emergency_begin()
Disables printk offloading. All printk() calls will attempt
to lock the console_sem and, if successful, flush kernel log
messages.

 -- printk_emergency_end()
Enables printk offloading.

Signed-off-by: Sergey Senozhatsky 
Signed-off-by: Jan Kara 
---
 include/linux/console.h |   3 ++
 kernel/printk/printk.c  | 107 
 2 files changed, 102 insertions(+), 8 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 5949d1855589..f1a86944072e 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
 extern void suspend_console(void);
 extern void resume_console(void);
 
+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
 int mda_console_init(void);
 void prom_con_init(void);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2d07678e9ff9..ab6b3b2a68c6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -402,7 +403,8 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
} while (0)
 
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
  */
 #define PRINTK_PENDING_WAKEUP  0x01
 #define PRINTK_PENDING_OUTPUT  0x02
@@ -445,6 +447,42 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static struct task_struct *printk_kthread __read_mostly;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool printk_kthread_disabled __read_mostly;
+
+static inline bool printk_kthread_enabled(void)
+{
+   return !printk_kthread_disabled &&
+   printk_kthread && atomic_read(_emergency) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not stop the printk_kthread if it's already
+ * printing logbuf messages.
+ */
+void printk_emergency_begin(void)
+{
+   atomic_inc(_emergency);
+}
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+   atomic_dec(_emergency);
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
 
printed_len += log_output(facility, level, lflags, dict, dictlen, text, 
text_len);
 
+   /*
+* Emergency