Re: [PATCH LINUX v5 2/2] xen: add support for initializing xenstore later as HVM domain

2023-07-21 Thread Petr Mladek
On Thu 2023-07-20 16:31:16, Stefano Stabellini wrote:
> On Thu, 20 Jul 2023, Petr Mladek wrote:
> > On Wed 2023-07-19 18:46:08, Stefano Stabellini wrote:
> > > On Wed, 19 Jul 2023, Petr Mladek wrote:
> > > > I see the following warning from free_irq() in 6.5-rc2 when running
> > > > livepatching selftests. It does not happen after reverting this patch.
> > > > 
> > > > [  352.168453] livepatch: signaling remaining tasks
> > > > [  352.173228] [ cut here ]
> > > > [  352.175563] Trying to free already-free IRQ 0
> > > > [  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 
> > > > free_irq+0xbf/0x350
> > > > [  352.179942] Modules linked in: test_klp_livepatch(EK)
> > > > [  352.181621] CPU: 1 PID: 88 Comm: xenbus_probe Kdump: loaded Tainted: 
> > > > GE K6.5.0-rc2-default+ #535
> > > > [  352.184754] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > > rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
> > > > [  352.188214] RIP: 0010:free_irq+0xbf/0x350
[...]
> > > > [  352.213951] Call Trace:
> > > > [  352.214390]  
> > > > [  352.214717]  ? __warn+0x81/0x170
> > > > [  352.215436]  ? free_irq+0xbf/0x350
> > > > [  352.215906]  ? report_bug+0x10b/0x200
> > > > [  352.216408]  ? prb_read_valid+0x17/0x20
> > > > [  352.216926]  ? handle_bug+0x44/0x80
> > > > [  352.217409]  ? exc_invalid_op+0x13/0x60
> > > > [  352.217932]  ? asm_exc_invalid_op+0x16/0x20
> > > > [  352.218497]  ? free_irq+0xbf/0x350
> > > > [  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
> > > > [  352.219600]  xenbus_probe+0x7a/0x80
> > > > [  352.221030]  xenbus_probe_thread+0x76/0xc0
> > > > [  352.221416]  ? __pfx_autoremove_wake_function+0x10/0x10
> > > > [  352.221882]  kthread+0xfd/0x130
> > > > [  352.222191]  ? __pfx_kthread+0x10/0x10
> > > > [  352.222544]  ret_from_fork+0x2d/0x50
> > > > [  352.222893]  ? __pfx_kthread+0x10/0x10
> > > > [  352.223260]  ret_from_fork_asm+0x1b/0x30
> > 
> > I do not know where exactly it triggers the XEN code.
> > 
> > > but it would seem
> > > that either:
> > > 1) free_irq is called twice
> > > 2) free_irq is called but xs_init_irq wasn't initialized before
> > > 
> > > For 2) I can see that xenbus_probe() is called in a few cases where
> > > xs_init_irq wasn't set. Something like the below would make the warning
> > > go away but it would be nice to figure out which one is the code path
> > > taken that originally triggered the warning.
> > 
> > I added some debugging messages and:
> > 
> >   + xenbus_probe() was called in xenbus_probe_thread().
> > 
> >   + xenbus_init() returned early after xen_domain() check so that
> > xs_init_irq was never initialized.
> > 
> > Note that I use KVM and not XEN:
> > 
> > [0.00] Hypervisor detected: KVM
> > [...]
> > [0.072150] Booting paravirtualized kernel on KVM
> 
> Ah! So the issue is that xenbus_init() returns early but
> xenbus_probe_initcall() doesn't. So maybe we just need a xen_domain
> check in xenbus_probe_initcall too.
> 
> diff --git a/drivers/xen/xenbus/xenbus_probe.c 
> b/drivers/xen/xenbus/xenbus_probe.c
> index 58b732dcbfb8..e9bd3ed70108 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -811,6 +812,9 @@ static int xenbus_probe_thread(void *unused)
>  
>  static int __init xenbus_probe_initcall(void)
>  {
> + if (!xen_domain())
> + return -ENODEV;
> +
>   /*
>* Probe XenBus here in the XS_PV case, and also XS_HVM unless we
>* need to wait for the platform PCI device to come up or

I confirm that this change cures the problem as well. Feel free
to add:

Tested-by: Petr Mladek 

Best Regards,
Petr



Re: [PATCH LINUX v5 2/2] xen: add support for initializing xenstore later as HVM domain

2023-07-20 Thread Petr Mladek
On Wed 2023-07-19 18:46:08, Stefano Stabellini wrote:
> On Wed, 19 Jul 2023, Petr Mladek wrote:
> > On Fri 2022-05-13 14:19:38, Stefano Stabellini wrote:
> > > From: Luca Miccio 
> > > 
> > > When running as dom0less guest (HVM domain on ARM) the xenstore event
> > > channel is available at domain creation but the shared xenstore
> > > interface page only becomes available later on.
> > > 
> > > In that case, wait for a notification on the xenstore event channel,
> > > then complete the xenstore initialization later, when the shared page
> > > is actually available.
> > > 
> > > The xenstore page has few extra field. Add them to the shared struct.
> > > One of the field is "connection", when the connection is ready, it is
> > > zero. If the connection is not-zero, wait for a notification.
> > 
> > I see the following warning from free_irq() in 6.5-rc2 when running
> > livepatching selftests. It does not happen after reverting this patch.
> > 
> > [  352.168453] livepatch: signaling remaining tasks
> > [  352.173228] [ cut here ]
> > [  352.175563] Trying to free already-free IRQ 0
> > [  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 
> > free_irq+0xbf/0x350
> > [  352.179942] Modules linked in: test_klp_livepatch(EK)
> > [  352.181621] CPU: 1 PID: 88 Comm: xenbus_probe Kdump: loaded Tainted: G   
> >  E K6.5.0-rc2-default+ #535
> > [  352.184754] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
> > [  352.188214] RIP: 0010:free_irq+0xbf/0x350
> > [  352.192211] Code: 7a 08 75 0e e9 36 02 00 00 4c 3b 7b 08 74 5a 48 89 da 
> > 48 8b 5a 18 48 85 db 75 ee 44 89 f6 48 c7 c7 58 b0 8b 86 e8 21 0a f5 ff 
> > <0f> 0b 48 8b 34 24 4c 89 ef e8 53 bb e3 00 
> > 48 8b 45 40 48 8b 40 78
> > [  352.200079] RSP: 0018:af0440b4be80 EFLAGS: 00010086
> > [  352.201465] RAX:  RBX: 99f105116c80 RCX: 
> > 0003
> > [  352.203324] RDX: 8003 RSI: 8691d4bc RDI: 
> > 
> > [  352.204989] RBP: 99f100052000 R08:  R09: 
> > c0007fff
> > [  352.206253] R10: af0440b4bd18 R11: af0440b4bd10 R12: 
> > 99f1000521e8
> > [  352.207451] R13: 99f1000520a8 R14:  R15: 
> > 86f42360
> > [  352.208787] FS:  () GS:99f15a40() 
> > knlGS:
> > [  352.210061] CS:  0010 DS:  ES:  CR0: 80050033
> > [  352.210815] CR2: 7f8415d56000 CR3: 000105e36003 CR4: 
> > 00370ee0
> > [  352.211867] DR0:  DR1:  DR2: 
> > 
> > [  352.212912] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [  352.213951] Call Trace:
> > [  352.214390]  
> > [  352.214717]  ? __warn+0x81/0x170
> > [  352.215436]  ? free_irq+0xbf/0x350
> > [  352.215906]  ? report_bug+0x10b/0x200
> > [  352.216408]  ? prb_read_valid+0x17/0x20
> > [  352.216926]  ? handle_bug+0x44/0x80
> > [  352.217409]  ? exc_invalid_op+0x13/0x60
> > [  352.217932]  ? asm_exc_invalid_op+0x16/0x20
> > [  352.218497]  ? free_irq+0xbf/0x350
> > [  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
> > [  352.219600]  xenbus_probe+0x7a/0x80
> > [  352.221030]  xenbus_probe_thread+0x76/0xc0
> > [  352.221416]  ? __pfx_autoremove_wake_function+0x10/0x10
> > [  352.221882]  kthread+0xfd/0x130
> > [  352.222191]  ? __pfx_kthread+0x10/0x10
> > [  352.222544]  ret_from_fork+0x2d/0x50
> > [  352.222893]  ? __pfx_kthread+0x10/0x10
> > [  352.223260]  ret_from_fork_asm+0x1b/0x30
> > [  352.223629] RIP: :0x0
> > [  352.223931] Code: Unable to access opcode bytes at 0xffd6.
> > [  352.224488] RSP: : EFLAGS:  ORIG_RAX: 
> > 
> > [  352.225044] RAX:  RBX:  RCX: 
> > 
> > [  352.225571] RDX:  RSI:  RDI: 
> > 
> > [  352.226106] RBP:  R08:  R09: 
> > 
> > [  352.226632] R10:  R11:  R12: 
> > 
> > [  352.227171] R13:  R14:  R15: 
> > 
> > [  352.227710]  
> > [  352.227917] irq event stamp: 22
> > [  352.228209] hardirqs last  enabled at (21): [] 
> > ___

Re: [PATCH LINUX v5 2/2] xen: add support for initializing xenstore later as HVM domain

2023-07-19 Thread Petr Mladek
On Fri 2022-05-13 14:19:38, Stefano Stabellini wrote:
> From: Luca Miccio 
> 
> When running as dom0less guest (HVM domain on ARM) the xenstore event
> channel is available at domain creation but the shared xenstore
> interface page only becomes available later on.
> 
> In that case, wait for a notification on the xenstore event channel,
> then complete the xenstore initialization later, when the shared page
> is actually available.
> 
> The xenstore page has few extra field. Add them to the shared struct.
> One of the field is "connection", when the connection is ready, it is
> zero. If the connection is not-zero, wait for a notification.

I see the following warning from free_irq() in 6.5-rc2 when running
livepatching selftests. It does not happen after reverting this patch.

[  352.168453] livepatch: signaling remaining tasks
[  352.173228] [ cut here ]
[  352.175563] Trying to free already-free IRQ 0
[  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 
free_irq+0xbf/0x350
[  352.179942] Modules linked in: test_klp_livepatch(EK)
[  352.181621] CPU: 1 PID: 88 Comm: xenbus_probe Kdump: loaded Tainted: G   
 E K6.5.0-rc2-default+ #535
[  352.184754] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[  352.188214] RIP: 0010:free_irq+0xbf/0x350
[  352.192211] Code: 7a 08 75 0e e9 36 02 00 00 4c 3b 7b 08 74 5a 48 89 da 48 
8b 5a 18 48 85 db 75 ee 44 89 f6 48 c7 c7 58 b0 8b 86 e8 21 0a f5 ff <0f> 0b 48 
8b 34 24 4c 89 ef e8 53 bb e3 00 
48 8b 45 40 48 8b 40 78
[  352.200079] RSP: 0018:af0440b4be80 EFLAGS: 00010086
[  352.201465] RAX:  RBX: 99f105116c80 RCX: 0003
[  352.203324] RDX: 8003 RSI: 8691d4bc RDI: 
[  352.204989] RBP: 99f100052000 R08:  R09: c0007fff
[  352.206253] R10: af0440b4bd18 R11: af0440b4bd10 R12: 99f1000521e8
[  352.207451] R13: 99f1000520a8 R14:  R15: 86f42360
[  352.208787] FS:  () GS:99f15a40() 
knlGS:
[  352.210061] CS:  0010 DS:  ES:  CR0: 80050033
[  352.210815] CR2: 7f8415d56000 CR3: 000105e36003 CR4: 00370ee0
[  352.211867] DR0:  DR1:  DR2: 
[  352.212912] DR3:  DR6: fffe0ff0 DR7: 0400
[  352.213951] Call Trace:
[  352.214390]  
[  352.214717]  ? __warn+0x81/0x170
[  352.215436]  ? free_irq+0xbf/0x350
[  352.215906]  ? report_bug+0x10b/0x200
[  352.216408]  ? prb_read_valid+0x17/0x20
[  352.216926]  ? handle_bug+0x44/0x80
[  352.217409]  ? exc_invalid_op+0x13/0x60
[  352.217932]  ? asm_exc_invalid_op+0x16/0x20
[  352.218497]  ? free_irq+0xbf/0x350
[  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
[  352.219600]  xenbus_probe+0x7a/0x80
[  352.221030]  xenbus_probe_thread+0x76/0xc0
[  352.221416]  ? __pfx_autoremove_wake_function+0x10/0x10
[  352.221882]  kthread+0xfd/0x130
[  352.222191]  ? __pfx_kthread+0x10/0x10
[  352.222544]  ret_from_fork+0x2d/0x50
[  352.222893]  ? __pfx_kthread+0x10/0x10
[  352.223260]  ret_from_fork_asm+0x1b/0x30
[  352.223629] RIP: :0x0
[  352.223931] Code: Unable to access opcode bytes at 0xffd6.
[  352.224488] RSP: : EFLAGS:  ORIG_RAX: 

[  352.225044] RAX:  RBX:  RCX: 
[  352.225571] RDX:  RSI:  RDI: 
[  352.226106] RBP:  R08:  R09: 
[  352.226632] R10:  R11:  R12: 
[  352.227171] R13:  R14:  R15: 
[  352.227710]  
[  352.227917] irq event stamp: 22
[  352.228209] hardirqs last  enabled at (21): [] 
___slab_alloc+0x68e/0xc80
[  352.228914] hardirqs last disabled at (22): [] 
_raw_spin_lock_irqsave+0x8d/0x90
[  352.229546] softirqs last  enabled at (0): [] 
copy_process+0xaae/0x1fd0
[  352.230079] softirqs last disabled at (0): [<>] 0x0
[  352.230503] ---[ end trace  ]---

, where the message "livepatch: signaling remaining tasks" means that
it might send fake signals to non-kthread tasks.

The aim is to force userspace tasks to enter and leave kernel space
so that they might start using the new patched code. It is done
this way:

/*
 * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
 * Kthreads with TIF_PATCH_PENDING set are woken up.
 */
static void klp_send_signals(void)
{
[...]

/*
 * Send fake signal to all non-kthread tasks which are
 * still not migrated.
 */
set_notify_signal(task);
[...]

The warning is most likely printed in this condition:

const void *free_irq(unsign

Re: [PATCH 08/11] sysctl: Add size to register_sysctl_init

2023-06-23 Thread Petr Mladek
On Thu 2023-06-22 16:00:21, Joel Granados wrote:
> On Thu, Jun 22, 2023 at 06:21:48AM +0200, Jiri Slaby wrote:
> > On 21. 06. 23, 15:15, Joel Granados wrote:
> > > On Wed, Jun 21, 2023 at 12:47:58PM +0200, Greg Kroah-Hartman wrote:
> > > > On Wed, Jun 21, 2023 at 11:09:57AM +0200, Joel Granados wrote:
> > > > >   static int __init random_sysctls_init(void)
> > > > >   {
> > > > > - register_sysctl_init("kernel/random", random_table);
> > > > > + register_sysctl_init("kernel/random", random_table,
> > > > > +  ARRAY_SIZE(random_table));
> > > > 
> > > > As mentioned before, why not just do:
> > > > 
> > > > #define register_sysctl_init(string, table) \
> > > > __register_sysctl_init(string, table, ARRAY_SIZE(table);
> > > Answered you in the original mail where you suggested it.
> > 
> > I am curious what that was, do you have a link?
> of course. I think you will find it here 
> https://lore.kernel.org/all/20230621123816.ufqbob6qthz4hujx@localhost/

Let me to copy the answer here:


I considered this at the outset, but it will not work with callers that
use a pointer instead of the actual array.
Additionally, we would not avoid big commits as we would have to go
looking in all the files where register is called directly or indirectly
and make sure the logic is sound.


For the callers using a pointer. A solution would be to create another
wrapper which would take the array size, e.g.

#define register_sysctl_init_limited(string, table, size)   \
__register_sysctl_init(string, table, size);


And ARRAY_SIZE() is defined in include/linux/kernel.h as:

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

It will create a compiler error when either an array[] or *array is
passed.

When using this:

1. The compiler will tell us where the other wrapper is needed.

2. Some locations might need the @size parameter even when a static
   array is passed. For example, neigh_sysctl_register() terminates
   the array early.

   But this will work when __register_sysctl_init() supports
   both ways.I mean that it will stop either on @size or empty
   element, as discussed in the other subthread.

   This should be caught when the final "empty" is removed
   from the particular caller.

Best Regards,
Petr



Re: [PATCH 08/11] sysctl: Add size to register_sysctl_init

2023-06-21 Thread Petr Mladek
On Wed 2023-06-21 11:09:57, Joel Granados wrote:
> In order to remove the end element from the ctl_table struct arrays, we
> explicitly define the size when registering the targes. We add a size
> argument to the register_sysctl_init call and pass an ARRAY_SIZE for all
> the callers.

This does not explain the motivatin why the end element is removed.

I agree with Jiri that saving 9k is a questionable gain. According to
the cover letter it saved 0,00%. It is because it saved 9k with allyes
config which produces huge kernel. IMHO, the 9k might be interesting
only for a tiny kernel. But I guess that it would safe much less
bytes there.

And the code with the added ARRAY_SIZE() parameter looks worse than before.

> diff --git a/kernel/printk/sysctl.c b/kernel/printk/sysctl.c
> index c228343eeb97..28f37b86414e 100644
> --- a/kernel/printk/sysctl.c
> +++ b/kernel/printk/sysctl.c
> @@ -81,5 +81,6 @@ static struct ctl_table printk_sysctls[] = {
>  
>  void __init printk_sysctl_init(void)
>  {
> - register_sysctl_init("kernel", printk_sysctls);
> + register_sysctl_init("kernel", printk_sysctls,
> +  ARRAY_SIZE(printk_sysctls));
>  }

Is register_sysctl_init() still ready to handle the last empty element,
please? I am not in Cc on the related patches.

Best Regards,
Petr



Re: [PATCH 24/30] panic: Refactor the panic path

2022-06-14 Thread Petr Mladek
On Thu 2022-05-26 13:25:57, Guilherme G. Piccoli wrote:
> OK, so it seems we have some points in which agreement exists, and some
> points that there is no agreement and instead, we have antagonistic /
> opposite views and needs. Let's start with the easier part heh
>
> It seems everybody agrees that *we shouldn't over-engineer things*, and
> as per Eric good words: making the panic path more feature-full or
> increasing flexibility isn't a good idea. So, as a "corollary": the
> panic level approach I'm proposing is not a good fit, I'll drop it and
> let's go with something simpler.

Makes sense.

> Another point of agreement seems to be that _notifier lists in the panic
> path are dangerous_, for *2 different reasons*:
> 
> (a) We cannot guarantee that people won't add crazy callbacks there, we
> can plan and document things the best as possible - it'll never be
> enough, somebody eventually would slip a nonsense callback that would
> break things and defeat the planned purpose of such a list;

It is true that notifier lists might allow to add crazy stuff
without proper review more easily. Things added into the core
code would most likely get better review.

But nothing is error-proof. And bugs will happen with any approach.


> (b) As per Eric point, in a panic/crash situation we might have memory
> corruption exactly in the list code / pointers, etc, so the notifier
> lists are, by nature, a bit fragile. But I think we shouldn't consider
> it completely "bollocks", since this approach has been used for a while
> with a good success rate. So, lists aren't perfect at all, but at the
> same time, they aren't completely useless.

I am not able to judge this. Of course, any extra step increases
the risk. I am just not sure how much more complicated it would
be to hardcode the calls. Most of them are architecture
and/or feature specific. And such code is often hard to
review and maintain.

> To avoid using a 4th list,

4th or 5th? We already have "hypervisor", "info", "pre-reboot", and "pre-loop".
The 5th might be pre-crash-exec.

> especially given the list nature is a bit
> fragile, I'd suggest one of the 3 following approaches - I *really
> appreciate feedbacks* on that so I can implement the best solution and
> avoid wasting time in some poor/disliked solution:

Honestly, I am not able to decide what might be better without seeing
the code.

Most things fits pretty well into the 4 proposed lists:
"hypervisor", "info", "pre-reboot", and "pre-loop". IMHO, the
only question is the code that needs to be always called
even before crash_dump.

I suggest that you solve the crash_dump callbacks the way that
looks best to you. Ideally do it in a separate patch so it can be
reviewed and reworked more easily.

I believe that a fresh code with an updated split and simplified
logic would help us to move forward.

Best Regards,
Petr



Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-09 Thread Petr Mladek
On Thu 2022-06-09 12:02:04, Peter Zijlstra wrote:
> On Thu, Jun 09, 2022 at 11:16:46AM +0200, Petr Mladek wrote:
> > On Wed 2022-06-08 16:27:47, Peter Zijlstra wrote:
> > > The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console
> > > tracepoint"), was printk usage from the cpuidle path where RCU was
> > > already disabled.
> > > 
> > Does this "prevent" calling printk() a safe way in code with
> > RCU disabled?
> 
> On x86_64, yes. Other architectures, less so.
> 
> Specifically, the objtool noinstr validation pass will warn at build
> time (DEBUG_ENTRY=y) if any noinstr/cpuidle code does a call to
> non-vetted code like printk().
> 
> At the same time; there's a few hacks that allow WARN to work, but
> mostly if you hit WARN in entry/noinstr you get to keep the pieces in
> any case.
> 
> On other architecture we'll need to rely on runtime coverage with
> PROVE_RCU. That is, if a splat like in the above mentioned commit
> happens again, we'll need to fix it by adjusting the callchain, not by
> mucking about with RCU state.

Makes sense. Feel free to use for this patch:

Acked-by: Petr Mladek 

> > Therefore if this patch allows to remove some tricky tracing
> > code then it might be worth it. But if trace_console_rcuidle()
> > variant is still going to be available then I would keep using it.
> 
> My ultimate goal is to delete trace_.*_rcuidle() and RCU_NONIDLE()
> entirely. We're close, but not quite there yet.

I keep my fingers crossed.

Best Regards,
Petr



Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-09 Thread Petr Mladek
On Thu 2022-06-09 20:30:58, Sergey Senozhatsky wrote:
> My emails are getting rejected... Let me try web-interface

Bad day for mail sending. I have problems as well ;-)

> Kudos to Petr for the questions and thanks to PeterZ for the answers.
> 
> On Thu, Jun 9, 2022 at 7:02 PM Peter Zijlstra  wrote:
> > This is the tracepoint used to spool all of printk into ftrace, I
> > suspect there's users, but I haven't used it myself.
> 
> I'm somewhat curious whether we can actually remove that trace event.

Good question.

Well, I think that it might be useful. It allows to see trace and
printk messages together.

It was ugly when it was in the console code. The new location
in vprintk_store() allows to have it even "correctly" sorted
(timestamp) against other tracing messages.

Best Regards,
Petr



Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-09 Thread Petr Mladek
Sending again. The previous attempt was rejected by several
recipients. It was caused by a mail server changes on my side.

I am sorry for spamming those who got the 1st mail already.

On Wed 2022-06-08 16:27:47, Peter Zijlstra wrote:
> The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console
> tracepoint"), was printk usage from the cpuidle path where RCU was
> already disabled.
> 
> Per the patches earlier in this series, this is no longer the case.

My understanding is that this series reduces a lot the amount
of code called with RCU disabled. As a result the particular printk()
call mentioned by commit fc98c3c8c9dc ("printk: use rcuidle console
tracepoint") is called with RCU enabled now. Hence this particular
problem is fixed better way now.

But is this true in general?
Does this "prevent" calling printk() a safe way in code with
RCU disabled?

I am not sure if anyone cares. printk() is the best effort
functionality because of the consoles code anyway. Also I wonder
if anyone uses this trace_console().

Therefore if this patch allows to remove some tricky tracing
code then it might be worth it. But if trace_console_rcuidle()
variant is still going to be available then I would keep using it.

Best Regards,
Petr

> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/printk/printk.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2238,7 +2238,7 @@ static u16 printk_sprint(char *text, u16
>   }
>   }
>  
> - trace_console_rcuidle(text, text_len);
> + trace_console(text, text_len);
>  
>   return text_len;
>  }
> 



Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-09 Thread Petr Mladek
On Wed 2022-06-08 16:27:47, Peter Zijlstra wrote:
> The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console
> tracepoint"), was printk usage from the cpuidle path where RCU was
> already disabled.
> 
> Per the patches earlier in this series, this is no longer the case.

My understanding is that this series reduces a lot the amount
of code called with RCU disabled. As a result the particular printk()
call mentioned by commit fc98c3c8c9dc ("printk: use rcuidle console
tracepoint") is called with RCU enabled now. Hence this particular
problem is fixed better way now.

But is this true in general?
Does this "prevent" calling printk() a safe way in code with
RCU disabled?

I am not sure if anyone cares. printk() is the best effort
functionality because of the consoles code anyway. Also I wonder
if anyone uses this trace_console().

Therefore if this patch allows to remove some tricky tracing
code then it might be worth it. But if trace_console_rcuidle()
variant is still going to be available then I would keep using it.

Best Regards,
Petr

> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/printk/printk.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2238,7 +2238,7 @@ static u16 printk_sprint(char *text, u16
>   }
>   }
>  
> - trace_console_rcuidle(text, text_len);
> + trace_console(text, text_len);
>  
>   return text_len;
>  }
> 



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-24 Thread Petr Mladek
On Mon 2022-05-23 11:56:12, Guilherme G. Piccoli wrote:
> On 19/05/2022 16:20, Scott Branden wrote:
> > [...] 
> >> Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
> >> designed to run in x86 only or you have other architectures' use cases?
> > The adapter may be used in any PCIe design that supports DMA.
> > So it may be possible to run in arm64 servers.
> >>
> >> [...]
> >> With that said, and given this is a lightweight notifier that ideally
> >> should run ASAP, I'd keep this one in the hypervisor list. We can
> >> "adjust" the semantic of this list to include lightweight notifiers that
> >> reset adapters.
> > Sounds the best to keep system operating as tested today.
> >>
> >> With that said, Petr has a point - not always such list is going to be
> >> called before kdump. So, that makes me think in another idea: what if we
> >> have another list, but not on panic path, but instead in the custom
> >> crash_shutdown()? Drivers could add callbacks there that must execute
> >> before kexec/kdump, no matter what.
> > It may be beneficial for some other drivers but for our use we would 
> > then need to register for the panic path and the crash_shutdown path. 
> > We notify the VK card for 2 purposes: one to stop DMA so memory stop 
> > changing during a kdump.  And also to get the card into a good state so 
> > resets happen cleanly.
> 
> Thanks Scott! With that, I guess it's really better to keep this
> notifier in this hypervisor/early list - I'm planning to do that for V2.
> Unless Petr or somebody has strong feelings against that, of course.

I am fine with it because we do not have a better solution at the
moment.

It might be a good candidate for the 5th notifier list mentioned
in the thread https://lore.kernel.org/r/YoyQyHHfhIIXSX0U@alley .
But I am not sure if the 5th list is worth the complexity.

Best Regards,
Petr



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-24 Thread Petr Mladek
On Fri 2022-05-20 08:23:33, Guilherme G. Piccoli wrote:
> On 19/05/2022 20:45, Baoquan He wrote:
> > [...]
> >> I really appreciate the summary skill you have, to convert complex
> >> problems in very clear and concise ideas. Thanks for that, very useful!
> >> I agree with what was summarized above.
> > 
> > I want to say the similar words to Petr's reviewing comment when I went
> > through the patches and traced each reviewing sub-thread to try to
> > catch up. Petr has reivewed this series so carefully and given many
> > comments I want to ack immediately.
> > 
> > I agree with most of the suggestions from Petr to this patch, except of
> > one tiny concern, please see below inline comment.
> 
> Hi Baoquan, thanks! I'm glad you're also reviewing that =)
> 
> 
> > [...]
> > 
> > I like the proposed skeleton of panic() and code style suggested by
> > Petr very much. About panic_prefer_crash_dump which might need be added,
> > I hope it has a default value true. This makes crash_dump execute at
> > first by default just as before, unless people specify
> > panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> > panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> > this is inconsistent with the old behaviour.
> 
> I'd like to understand better why the crash_kexec() must always be the
> first thing in your use case. If we keep that behavior, we'll see all
> sorts of workarounds - see the last patches of this series, Hyper-V and
> PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
> execution of their relevant notifiers (like the vmbus disconnect,
> specially in arm64 that has no custom machine_crash_shutdown, or the
> fadump case in ppc). This led to more risk in kdump.
> 
> The thing is: with the notifiers' split, we tried to keep only the most
> relevant/necessary stuff in this first list, things that ultimately
> should improve kdump reliability or if not, at least not break it. My
> feeling is that, with this series, we should change the idea/concept
> that kdump must run first nevertheless, not matter what. We're here
> trying to accommodate the antagonistic goals of hypervisors that need
> some clean-up (even for kdump to work) VS. kdump users, that wish a
> "pristine" system reboot ASAP after the crash.

Good question. I wonder if Baoquan knows about problems caused by the
particular notifiers that will end up in the hypervisor list. Note
that there will be some shuffles and the list will be slightly
different in V2.

Anyway, I see four possible solutions:

  1. The most conservative approach is to keep the current behavior
 and call kdump first by default.

  2. A medium conservative approach to change the default default
 behavior and call hypervisor and eventually the info notifiers
 before kdump. There still would be the possibility to call kdump
 first by the command line parameter.

  3. Remove the possibility to call kdump first completely. It would
 assume that all the notifiers in the info list are super safe
 or that they make kdump actually more safe.

  4. Create one more notifier list for operations that always should
 be called before crash_dump.

Regarding the extra notifier list (4th solution). It is not clear to
me whether it would be always called even before hypervisor list or
when kdump is not enabled. We must not over-engineer it.

2nd proposal looks like a good compromise. But maybe we could do
this change few releases later. The notifiers split is a big
change on its own.

Best Regards,
Petr



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-19 Thread Petr Mladek
On Wed 2022-05-18 10:16:20, Guilherme G. Piccoli wrote:
> On 18/05/2022 04:58, Petr Mladek wrote:
> > [...]
> >> I does similar things like kmsg_dump() so it should be called in
> >> the same location (after info notifier list and before kdump).
> >>
> >> A solution might be to put it at these notifiers at the very
> >> end of the "info" list or make extra "dump" notifier list.
> > 
> > I just want to point out that the above idea has problems.
> > Notifiers storing kernel log need to be treated as kmsg_dump().
> > In particular, we would  need to know if there are any.
> > We do not need to call "info" notifier list before kdump
> > when there is no kernel log dumper registered.
> > 
> 
> Notifiers respect the priority concept, which is just a number that
> orders the list addition (and the list is called in order).
> 
> I've used the last position to panic_print() [in patch 25] - one idea
> here is to "reserve" the last position (represented by INT_MIN) for
> notifiers that act like kmsg_dump(). I couldn't find any IIRC, but that
> doesn't prevent us to save this position and comment about that.

I would ignore it for now. If anyone would want to safe the log
then they would need to read it. They will most likely use
the existing kmsg_dump() infastructure. In fact, they should
use it to avoid a code duplication.

Best Regards,
Petr



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Petr Mladek
On Tue 2022-05-17 15:57:34, Petr Mladek wrote:
> On Mon 2022-05-16 12:06:17, Guilherme G. Piccoli wrote:
> > >> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> > >> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> > >> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device 
> > >> *pdev)
> > >>  goto out;
> > >>  }
> > >>  
> > >> -atomic_notifier_chain_register(&panic_notifier_list,
> > >> +atomic_notifier_chain_register(&panic_hypervisor_list,
> > >> &brcmstb_pm_panic_nb);
> > > 
> > > I am not sure about this one. It instruct some HW to preserve DRAM.
> > > IMHO, it better fits into pre_reboot category but I do not have
> > > strong opinion.
> > 
> > Disagree here, I'm CCing Florian for information.
> > 
> > This notifier preserves RAM so it's *very interesting* if we have
> > kmsg_dump() for example, but maybe might be also relevant in case kdump
> > kernel is configured to store something in a persistent RAM (then,
> > without this notifier, after kdump reboots the system data would be lost).
> 
> I see. It is actually similar problem as with
> drivers/firmware/google/gsmi.c.

As discussed in the other other reply, it seems that both affected
notifiers do not store kernel logs and should stay in the "hypervisor".

> I does similar things like kmsg_dump() so it should be called in
> the same location (after info notifier list and before kdump).
>
> A solution might be to put it at these notifiers at the very
> end of the "info" list or make extra "dump" notifier list.

I just want to point out that the above idea has problems.
Notifiers storing kernel log need to be treated as kmsg_dump().
In particular, we would  need to know if there are any.
We do not need to call "info" notifier list before kdump
when there is no kernel log dumper registered.

Best Regards,
Petr



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Petr Mladek
On Tue 2022-05-17 13:42:06, Guilherme G. Piccoli wrote:
> On 17/05/2022 10:57, Petr Mladek wrote:
> >> Disagree here, I'm CCing Florian for information.
> >>
> >> This notifier preserves RAM so it's *very interesting* if we have
> >> kmsg_dump() for example, but maybe might be also relevant in case kdump
> >> kernel is configured to store something in a persistent RAM (then,
> >> without this notifier, after kdump reboots the system data would be lost).
> > 
> > I see. It is actually similar problem as with
> > drivers/firmware/google/gsmi.c.
> > 
> > I does similar things like kmsg_dump() so it should be called in
> > the same location (after info notifier list and before kdump).
> > 
> > A solution might be to put it at these notifiers at the very
> > end of the "info" list or make extra "dump" notifier list.
> 
> Here I still disagree. I've commented in the other response thread
> (about Google gsmi) about the semantics of the hypervisor list, but
> again: this list should contain callbacks that
> 
> (a) Should run early, _by default_ before a kdump;
> (b) Communicate with the firmware/hypervisor in a "low-risk" way;
> 
> Imagine a scenario where users configure kdump kernel to save something
> in a persistent form in DRAM - it'd be like a late pstore, in the next
> kernel. This callback enables that, it's meant to inform FW "hey, panic
> happened, please from now on don't clear the RAM in the next FW-reboot".
> I don't see a reason to postpone that - let's see if the maintainers
> have an opinion.

I have answered this in more detail in the other reply, see
https://lore.kernel.org/r/YoShZVYNAdvvjb7z@alley

I agree that both notifiers in

drivers/soc/bcm/brcmstb/pm/pm-arm.c
drivers/firmware/google/gsmi.c

better fit into the hypervisor list after all.

Best Regards,
Petr



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Petr Mladek
On Tue 2022-05-17 13:37:58, Guilherme G. Piccoli wrote:
> On 17/05/2022 10:28, Petr Mladek wrote:
> > [...]
> >>> Disagree here. I'm looping Google maintainers, so they can comment.
> >>> (CCed Evan, David, Julius)
> >>>
> >>> This notifier is clearly a hypervisor notification mechanism. I've fixed
> >>> a locking stuff there (in previous patch), I feel it's low-risk but even
> >>> if it's mid-risk, the class of such callback remains a perfect fit with
> >>> the hypervisor list IMHO.
> >>
> >> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> >> for critical and power-related events. In some cases this ends up
 > >> being the only clue we get in a Chromebook feedback report that a
> >> panic occurred, so from my perspective moving it to the front of the
> >> line seems like a good idea.
> > 
> > IMHO, this would really better fit into the pre-reboot notifier list:
> > 
> >+ the callback stores the log so it is similar to kmsg_dump()
> >  or console_flush_on_panic()
> > 
> >+ the callback should be proceed after "info" notifiers
> >  that might add some other useful information.
> > 
> > Honestly, I am not sure what exactly hypervisor callbacks do. But I
> > think that they do not try to extract the kernel log because they
> > would need to handle the internal format.
> > 
> 
> I guess the main point in your response is : "I am not sure what exactly
> hypervisor callbacks do". We need to be sure about the semantics of such
> list, and agree on that.
> 
> So, my opinion about this first list, that we call "hypervisor list",
> is: it contains callbacks that
> 
> (1) should run early, preferably before kdump (or even if kdump isn't
> set, should run ASAP);
> 
> (2) these callbacks perform some communication with an abstraction that
> runs "below" the kernel, like a firmware or hypervisor. Classic example:
> pvpanic, that communicates with VMM (usually qemu) and allow such VMM to
> snapshot the full guest memory, for example.
> 
> (3) Should be low-risk. What defines risk is the level of reliability of
> subsequent operations - if the callback have 50% of chance of "bricking"
> the system totally and prevent kdump / kmsg_dump() / reboot , this is
> high risk one for example.
> 
> Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in
> powerpc, etc.
> 
> So, this is a good case for the Google notifier as well - it's not
> collecting data like the dmesg (hence your second bullet seems to not
> apply here, info notifiers won't add info to be collected by gsmi). It
> is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that
> tells such "lower" abstraction a panic occurred. It seems low risk and
> we want it to run ASAP, if possible.

" 
> >> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> >> for critical and power-related events. In some cases this ends up

I see. I somehow assumed that it was about the kernel log because
Evans wrote:

  "This logs a panic to our "eventlog", a tiny logging area in SPI flash
   for critical and power-related events. In some cases this ends up"


Anyway, I would distinguish it the following way.

  + If the notifier is preserving kernel log then it should be ideally
treated as kmsg_dump().

  + It the notifier is saving another debugging data then it better
fits into the "hypervisor" notifier list.


Regarding the reliability. From my POV, any panic notifier enabled
in a generic kernel should be reliable with more than 99,9%.
Otherwise, they should not be in the notifier list at all.

An exception would be a platform-specific notifier that is
called only on some specific platform and developers maintaining
this platform agree on this.

The value "99,9%" is arbitrary. I am not sure if it is realistic
even in the other code, for example, console_flush_on_panic()
or emergency_restart(). I just want to point out that the border
should be rather high. Otherwise we would back in the situation
where people would want to disable particular notifiers.

Best Regards,
Petr



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Petr Mladek
On Mon 2022-05-16 13:33:51, Guilherme G. Piccoli wrote:
> On 16/05/2022 13:18, Luck, Tony wrote:
> >> [...]
> > Would it be possible to have some global "kdump is configured + enabled" 
> > flag?
> > 
> > Then notifiers could make an informed choice on whether to deep dive to
> > get all the possible details (when there is no kdump) or just skim the high
> > level stuff (to maximize chance of getting a successful kdump).
> > 
> > -Tony
> 
> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?

I like this idea.

One small problem is that kexec_crash_loaded() has valid result
only under kexec_mutex. On the other hand, it should stay true
once loaded so that the small race window should be innocent.

> With that, are you/Petr/Dinh OK in moving it for the info list?

Sounds good to me.

Best Regards,
Petr



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Petr Mladek
On Mon 2022-05-16 12:06:17, Guilherme G. Piccoli wrote:
> Thanks for the review!
> 
> I agree with the blinking stuff, I can rework and add all LED/blinking
> stuff into the loop list, it does make sense. I'll comment a bit in the
> others below...
> 
> On 16/05/2022 11:01, Petr Mladek wrote:
> >> --- a/drivers/firmware/google/gsmi.c
> >> +++ b/drivers/firmware/google/gsmi.c
> >> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
> >>  
> >>register_reboot_notifier(&gsmi_reboot_notifier);
> >>register_die_notifier(&gsmi_die_notifier);
> >> -  atomic_notifier_chain_register(&panic_notifier_list,
> >> +  atomic_notifier_chain_register(&panic_hypervisor_list,
> >>   &gsmi_panic_notifier);
> > 
> > I am not sure about this one. It looks like some logging or
> > pre_reboot stuff.
> > 
> 
> Disagree here. I'm looping Google maintainers, so they can comment.
> (CCed Evan, David, Julius)
> 
> This notifier is clearly a hypervisor notification mechanism. I've fixed
> a locking stuff there (in previous patch), I feel it's low-risk but even
> if it's mid-risk, the class of such callback remains a perfect fit with
> the hypervisor list IMHO.

It is similar to drivers/soc/bcm/brcmstb/pm/pm-arm.c.
See below for another idea.

> >> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
> >> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
> >> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const 
> >> struct pci_device_id *ent)
> >>  
> >>/* register for panic notifier */
> >>vk->panic_nb.notifier_call = bcm_vk_on_panic;
> >> -  err = atomic_notifier_chain_register(&panic_notifier_list,
> >> +  err = atomic_notifier_chain_register(&panic_hypervisor_list,
> >> &vk->panic_nb);
> > 
> > It seems to reset some hardware or so. IMHO, it should go into the
> > pre-reboot list.
> 
> Mixed feelings here, I'm looping Broadcom maintainers to comment.
> (CC Scott and Broadcom list)
> 
> I'm afraid it breaks kdump if this device is not reset beforehand - it's
> a doorbell write, so not high risk I think...
> 
> But in case the not-reset device can be probed normally in kdump kernel,
> then I'm fine in moving this to the reboot list! I don't have the HW to
> test myself.

Good question. Well, it if has to be called before kdump then
even "hypervisor" list is a wrong place because is not always
called before kdump.


> >> --- a/drivers/power/reset/ltc2952-poweroff.c
> >> +++ b/drivers/power/reset/ltc2952-poweroff.c
> >> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct 
> >> platform_device *pdev)
> >>pm_power_off = ltc2952_poweroff_kill;
> >>  
> >>data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
> >> -  atomic_notifier_chain_register(&panic_notifier_list,
> >> +  atomic_notifier_chain_register(&panic_hypervisor_list,
> >>   &data->panic_notifier);
> > 
> > I looks like this somehow triggers the reboot. IMHO, it should go
> > into the pre_reboot list.
> 
> Mixed feeling again here - CCing the maintainers for comments (Sebastian
> / PM folks).
> 
> This is setting a variable only, and once it's set (data->kernel_panic
> is the bool's name), it just bails out the IRQ handler and a timer
> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
> IMHO.

IMHO, the timer informs the hardware that the system is still alive
in the middle of panic(). If the timer is not working then the
hardware (chip) will think that the system frozen in panic()
and will power off the system. See the comments in
drivers/power/reset/ltc2952-poweroff.c:

 * The following GPIOs are used:
 * - trigger (input)
 * A level change indicates the shut-down trigger. If it's state reverts
 * within the time-out defined by trigger_delay, the shut down is not
 * executed. If no pin is assigned to this input, the driver will start the
 * watchdog toggle immediately. The chip will only power off the system if
 * it is requested to do so through the kill line.
 *
 * - watchdog (output)
 * Once a shut down is triggered, the driver will toggle this signal,
 * with an internal (wde_interval) to stall the hardware shut down.

IMHO, we really have to keep it alive until we reach the reboot stage.

Another question is how it actually works when the interrupts are
disabled during panic() and the timer callbacks are not handled.


> &g

Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Petr Mladek
On Mon 2022-05-16 09:02:10, Evan Green wrote:
> On Mon, May 16, 2022 at 8:07 AM Guilherme G. Piccoli
>  wrote:
> >
> > Thanks for the review!
> >
> > I agree with the blinking stuff, I can rework and add all LED/blinking
> > stuff into the loop list, it does make sense. I'll comment a bit in the
> > others below...
> >
> > On 16/05/2022 11:01, Petr Mladek wrote:
> > > [...]
> > >> --- a/arch/mips/sgi-ip22/ip22-reset.c
> > >> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> > >> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
> > >>  }
> > >>
> > >>  timer_setup(&blink_timer, blink_timeout, 0);
> > >> -atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> > >> +atomic_notifier_chain_register(&panic_hypervisor_list, 
> > >> &panic_block);
> > >
> > > This notifier enables blinking. It is not much safe. It calls
> > > mod_timer() that takes a lock internally.
> > >
> > > This kind of functionality should go into the last list called
> > > before panic() enters the infinite loop. IMHO, all the blinking
> > > stuff should go there.
> > > [...]
> > >> --- a/arch/mips/sgi-ip32/ip32-reset.c
> > >> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> > >> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
> > >>  pm_power_off = ip32_machine_halt;
> > >>
> > >>  timer_setup(&blink_timer, blink_timeout, 0);
> > >> -atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> > >> +atomic_notifier_chain_register(&panic_hypervisor_list, 
> > >> &panic_block);
> > >
> > > Same here. Should be done only before the "loop".
> > > [...]
> >
> > Ack.
> >
> >
> > >> --- a/drivers/firmware/google/gsmi.c
> > >> +++ b/drivers/firmware/google/gsmi.c
> > >> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
> > >>
> > >>  register_reboot_notifier(&gsmi_reboot_notifier);
> > >>  register_die_notifier(&gsmi_die_notifier);
> > >> -atomic_notifier_chain_register(&panic_notifier_list,
> > >> +atomic_notifier_chain_register(&panic_hypervisor_list,
> > >> &gsmi_panic_notifier);
> > >
> > > I am not sure about this one. It looks like some logging or
> > > pre_reboot stuff.
> > >
> >
> > Disagree here. I'm looping Google maintainers, so they can comment.
> > (CCed Evan, David, Julius)
> >
> > This notifier is clearly a hypervisor notification mechanism. I've fixed
> > a locking stuff there (in previous patch), I feel it's low-risk but even
> > if it's mid-risk, the class of such callback remains a perfect fit with
> > the hypervisor list IMHO.
>
> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> for critical and power-related events. In some cases this ends up
> being the only clue we get in a Chromebook feedback report that a
> panic occurred, so from my perspective moving it to the front of the
> line seems like a good idea.

IMHO, this would really better fit into the pre-reboot notifier list:

   + the callback stores the log so it is similar to kmsg_dump()
 or console_flush_on_panic()

   + the callback should be proceed after "info" notifiers
 that might add some other useful information.

Honestly, I am not sure what exactly hypervisor callbacks do. But I
think that they do not try to extract the kernel log because they
would need to handle the internal format.

Best Regards,
Petr



Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-17 Thread Petr Mladek
On Tue 2022-05-10 13:16:54, Guilherme G. Piccoli wrote:
> On 10/05/2022 12:16, Petr Mladek wrote:
> > [...]
> > Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> > All notifiers will be always called with PANIC_NOTIFIER.
> > 
> > The @val parameter is normally used when the same notifier_list
> > is used in different situations.
> > 
> > But you are going to use it when the same notifier is used
> > in more lists. This is normally distinguished by the @nh
> > (atomic_notifier_head) parameter.
> > 
> > IMHO, it is a bad idea. First, it would confuse people because
> > it does not follow the original design of the parameters.
> > Second, the related code must be touched anyway when
> > the notifier is moved into another list so it does not
> > help much.
> > 
> > Or do I miss anything, please?
> > 
> > Best Regards,
> > Petr
> 
> Hi Petr, thanks for the review.
> 
> I'm not strong attached to this patch, so we could drop it and refactor
> the code of next patches to use the @nh as identification - but
> personally, I feel this parameter could be used to identify the list
> that called such function, in other words, what is the event that
> triggered the callback. Some notifiers are even declared with this
> parameter called "ev", like the event that triggers the notifier.
> 
> 
> You mentioned 2 cases:
> 
> (a) Same notifier_list used in different situations;
> 
> (b) Same *notifier callback* used in different lists;
> 
> Mine is case (b), right? Can you show me an example of case (a)?

There are many examples of case (a):

   + module_notify_list:
MODULE_STATE_LIVE,  /* Normal state. */
MODULE_STATE_COMING,/* Full formed, running module_init. */
MODULE_STATE_GOING, /* Going away. */
MODULE_STATE_UNFORMED,  /* Still setting it up. */


   + netdev_chain:

NETDEV_UP   = 1,/* For now you can't veto a device up/down */
NETDEV_DOWN,
NETDEV_REBOOT,  /* Tell a protocol stack a network interface
   detected a hardware crash and restarted
   - we can use this eg to kick tcp sessions
   once done */
NETDEV_CHANGE,  /* Notify device state change */
NETDEV_REGISTER,
NETDEV_UNREGISTER,
NETDEV_CHANGEMTU,   /* notify after mtu change happened */
NETDEV_CHANGEADDR,  /* notify after the address change */
NETDEV_PRE_CHANGEADDR,  /* notify before the address change */
NETDEV_GOING_DOWN,
...

+ vt_notifier_list:

#define VT_ALLOCATE 0x0001 /* Console got allocated */
#define VT_DEALLOCATE   0x0002 /* Console will be deallocated */
#define VT_WRITE0x0003 /* A char got output */
#define VT_UPDATE   0x0004 /* A bigger update occurred */
#define VT_PREWRITE 0x0005 /* A char is about to be written 
to the console */

+ die_chain:

DIE_OOPS = 1,
DIE_INT3,
DIE_DEBUG,
DIE_PANIC,
DIE_NMI,
DIE_DIE,
DIE_KERNELDEBUG,
...

These all call the same list/chain in different situations.
The situation is distinguished by @val.


> You can see in the following patches (or grep the kernel) that people are 
> using
> this identification parameter to determine which kind of OOPS trigger
> the callback to condition the execution of the function to specific
> cases.

Could you please show me some existing code for case (b)?
I am not able to find any except in your patches.

Anyway, the solution in 16th patch is bad, definitely.
hv_die_panic_notify_crash() uses "val" to disinguish
both:

 + "panic_notifier_list" vs "die_chain"
 + die_val when callen via "die_chain"

The API around "die_chain" API is not aware of enum panic_notifier_val
and the API using "panic_notifier_list" is not aware of enum die_val.
As I said, it is mixing apples and oranges and it is error prone.

Best Regards,
Petr



Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-17 Thread Petr Mladek
On Tue 2022-05-10 10:00:58, Guilherme G. Piccoli wrote:
> On 10/05/2022 09:14, Petr Mladek wrote:
> > [...]
> >> With that said, it's dangerous to use regular spinlocks in such path,
> >> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple 
> >> instances").
> >> This patch fixes that by replacing regular spinlocks with the trylock
> >> safer approach.
> > 
> > It seems that the lock is used just to manipulating a list. A super
> > safe solution would be to use the rcu API: rcu_add_rcu() and
> > list_del_rcu() under rcu_read_lock(). The spin lock will not be
> > needed and the list will always be valid.
> > 
> > The advantage would be that it will always call members that
> > were successfully added earlier. That said, I am not familiar
> > with pvpanic and am not sure if it is worth it.
> > 
> >> It also fixes an old comment (about a long gone framebuffer code) and
> >> the notifier priority - we should execute hypervisor notifiers early,
> >> deferring this way the panic action to the hypervisor, as expected by
> >> the users that are setting up pvpanic.
> > 
> > This should be done in a separate patch. It changes the behavior.
> > Also there might be a discussion whether it really should be
> > the maximal priority.
> > 
> > Best Regards,
> > Petr
> 
> Thanks for the review Petr. Patch was already merged - my goal was to be
> concise, i.e., a patch per driver / module, so the patch kinda fixes
> whatever I think is wrong with the driver with regards panic handling.
> 
> Do you think it worth to remove this patch from Greg's branch just to
> split it in 2? Personally I think it's not worth, but opinions are welcome.

No problem. It is not worth the effort.


> About the RCU part, this one really could be a new patch, a good
> improvement patch - it makes sense to me, we can think about that after
> the fixes I guess.

Yup.

Best Regards,
Petr



Re: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

2022-05-16 Thread Petr Mladek
On Wed 2022-04-27 19:49:19, Guilherme G. Piccoli wrote:
> Currently the parameter "panic_print" relies in a function called
> directly on panic path; one of the flags the users can set for
> panic_print triggers a console replay mechanism, to show the
> entire kernel log buffer (from the beginning) in a panic event.
> 
> Two problems with that: the dual nature of the panic_print
> isn't really appropriate, the function was originally meant
> to allow users dumping system information on panic events,
> and was "overridden" to also force a console flush of the full
> kernel log buffer. It also turns the code a bit more complex
> and duplicate than it needs to be.
> 
> This patch proposes 2 changes: first, we decouple panic_print
> from the console flushing mechanism, in the form of a new kernel
> core parameter (panic_console_replay); we kept the functionality
> on panic_print to avoid userspace breakage, although we comment
> in both code and documentation that this panic_print usage is
> deprecated.
> 
> We converted panic_print function to a panic notifier too, adding
> it on the panic informational notifier list, executed as the final
> callback. This allows a more clear code and makes sense, as
> panic_print_sys_info() is really a panic-time only function.
> We also moved its code to kernel/printk.c, it seems to make more
> sense given it's related to printing stuff.

I really like both changes. Just please split it them into two
patchset. I mean one patch for the new "panic_console_replay"
parameter and 2nd that moves "printk_info" into the notifier.

Best Regards,
Petr



Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-05-16 Thread Petr Mladek
On Wed 2022-05-11 17:03:51, Guilherme G. Piccoli wrote:
> On 10/05/2022 14:40, Steven Rostedt wrote:
> > On Wed, 27 Apr 2022 19:49:17 -0300
> > "Guilherme G. Piccoli"  wrote:
> > 
> >> Currently we don't have a way to check if there are dumpers set,
> >> except counting the list members maybe. This patch introduces a very
> >> simple helper to provide this information, by just keeping track of
> >> registered/unregistered kmsg dumpers. It's going to be used on the
> >> panic path in the subsequent patch.
> > 
> > FYI, it is considered "bad form" to reference in the change log "this
> > patch". We know this is a patch. The change log should just talk about what
> > is being done. So can you reword your change logs (you do this is almost
> > every patch). Here's what I would reword the above to be:
> > 
> >  Currently we don't have a way to check if there are dumpers set, except
> >  perhaps by counting the list members. Introduce a very simple helper to
> >  provide this information, by just keeping track of registered/unregistered
> >  kmsg dumpers. This will simplify the refactoring of the panic path.
> 
> Thanks for the hint, you're right - it's almost in all of my patches.
> I'll reword all of them (except the ones already merged) to remove this
> "bad form".

Shame on me that I do not care that much about the style of the commit
message :-)

Anyway, the code looks good to me. With the better commit message:

Reviewed-by: Petr Mladek 

Best Regards,
Petr



Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-05-16 Thread Petr Mladek
On Wed 2022-04-27 19:49:16, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
> 
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
> 
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.

The patch looks good to me. I would just suggest two changes.

1. I would rename the list to "panic_loop_list" instead of
   "panic_post_reboot_list".

   It will be more clear that it includes things that are
   needed before panic() enters the infinite loop.


2. I would move all the notifiers that enable blinking here.

   The blinking should be done only during the infinite
   loop when there is nothing else to do. If we enable
   earlier then it might disturb/break more important
   functionality (dumping information, reboot).

Best Regards,
Petr



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-16 Thread Petr Mladek
On Wed 2022-04-27 19:49:15, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
> 
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.
> 
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device 
> *pdev)
>   int dberror, err_addr;
>  
>   edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_pre_reboot_list,

My understanding is that this notifier first prints info about ECC
errors and then triggers reboot. It might make sense to split it
into two notifiers.


>  &edac->panic_notifier);
>  
>   /* Printout a message if uncorrectable error previously. */
> --- a/drivers/leds/trigger/ledtrig-panic.c
> +++ b/drivers/leds/trigger/ledtrig-panic.c
> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>  
>  static int __init ledtrig_panic_init(void)
>  {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_pre_reboot_list,
>  &led_trigger_panic_nb);

Blinking => should go to the last "post_reboot/loop" list.


>  
>   led_trigger_register_simple("panic", &trigger);
> --- a/drivers/misc/ibmasm/heartbeat.c
> +++ b/drivers/misc/ibmasm/heartbeat.c
> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
>  static int panic_happened(struct notifier_block *n, unsigned long val, void 
> *v)
>  {
>   suspend_heartbeats = 1;
> - return 0;
> + return NOTIFY_DONE;
>  }
>  
> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
> +static struct notifier_block panic_notifier = {
> + .notifier_call = panic_happened,
> +};
>  
>  void ibmasm_register_panic_notifier(void)
>  {
> - atomic_notifier_chain_register(&panic_notifier_list, &panic_notifier);
> + atomic_notifier_chain_register(&panic_pre_reboot_list,
> + &panic_notifier);

Same here. Blinking => should go to the last "post_reboot/loop" list.


>  }
>  
>  void ibmasm_unregister_panic_notifier(void)
>  {
> - atomic_notifier_chain_unregister(&panic_notifier_list,
> - &panic_notifier);
> + atomic_notifier_chain_unregister(&panic_pre_reboot_list,
> + &panic_notifier);
>  }


The rest of the moved notifiers seem to fit well this "pre_reboot"
list.

Best Regards,
Petr



Re: [PATCH 20/30] panic: Add the panic informational notifier list

2022-05-16 Thread Petr Mladek
On Wed 2022-04-27 19:49:14, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to
> register callbacks to run earlier in the panic path than they
> currently do. This aims at informational mechanisms, like dumping
> kernel offsets and showing device error data (in case it's simple
> registers reading, for example) as well as mechanisms to disable
> log flooding (like hung_task detector / RCU warnings) and the
> tracing dump_on_oops (when enabled).
> 
> Any (non-invasive) information that should be provided before
> kmsg_dump() as well as log flooding preventing code should fit
> here, as long it offers relatively low risk for kdump.
> 
> For now, the patch is almost a no-op, although it changes a bit
> the ordering in which some panic notifiers are executed - specially
> affected by this are the notifiers responsible for disabling the
> hung_task detector / RCU warnings, which now run first. In a
> subsequent patch, the panic path will be refactored, then the
> panic informational notifiers will effectively run earlier,
> before ksmg_dump() (and usually before kdump as well).
> 
> We also defer documenting it all properly in the subsequent
> refactor patch. Finally, while at it, we removed some useless
> header inclusions too.
> 
> Signed-off-by: Guilherme G. Piccoli 

All notifiers moved in this patch seems to fit well the "info"
notifier list. The patch looks good from this POV.

I still have to review the rest of the patches to see if it
is complete.

Best Regards,
Petr



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-16 Thread Petr Mladek
On Wed 2022-04-27 19:49:13, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to register
> callbacks to run very early in the panic path. This aims hypervisor/FW
> notification mechanisms as well as simple LED functions, and any other
> simple and safe mechanism that should run early in the panic path; more
> dangerous callbacks should execute later.
> 
> For now, the patch is almost a no-op (although it changes a bit the
> ordering in which some panic notifiers are executed). In a subsequent
> patch, the panic path will be refactored, then the panic hypervisor
> notifiers will effectively run very early in the panic path.
> 
> We also defer documenting it all properly in the subsequent refactor
> patch. While at it, we removed some useless header inclusions and
> fixed some notifiers return too (by using the standard NOTIFY_DONE).

> --- a/arch/mips/sgi-ip22/ip22-reset.c
> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
>   }
>  
>   timer_setup(&blink_timer, blink_timeout, 0);
> - atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> + atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);

This notifier enables blinking. It is not much safe. It calls
mod_timer() that takes a lock internally.

This kind of functionality should go into the last list called
before panic() enters the infinite loop. IMHO, all the blinking
stuff should go there.

>  
>   return 0;
>  }
> diff --git a/arch/mips/sgi-ip32/ip32-reset.c b/arch/mips/sgi-ip32/ip32-reset.c
> index 18d1c115cd53..9ee1302c9d13 100644
> --- a/arch/mips/sgi-ip32/ip32-reset.c
> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
>   pm_power_off = ip32_machine_halt;
>  
>   timer_setup(&blink_timer, blink_timeout, 0);
> - atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
> + atomic_notifier_chain_register(&panic_hypervisor_list, &panic_block);

Same here. Should be done only before the "loop".

>  
>   return 0;
>  }
> --- a/drivers/firmware/google/gsmi.c
> +++ b/drivers/firmware/google/gsmi.c
> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
>  
>   register_reboot_notifier(&gsmi_reboot_notifier);
>   register_die_notifier(&gsmi_die_notifier);
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
>  &gsmi_panic_notifier);

I am not sure about this one. It looks like some logging or
pre_reboot stuff.


>  
>   printk(KERN_INFO "gsmi version " DRIVER_VERSION " loaded\n");
> --- a/drivers/leds/trigger/ledtrig-activity.c
> +++ b/drivers/leds/trigger/ledtrig-activity.c
> @@ -247,7 +247,7 @@ static int __init activity_init(void)
>   int rc = led_trigger_register(&activity_led_trigger);
>  
>   if (!rc) {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
>  &activity_panic_nb);

The notifier is trivial. It just sets a variable.

But still, it is about blinking and should be done
in the last "loop" list.


>   register_reboot_notifier(&activity_reboot_nb);
>   }
> --- a/drivers/leds/trigger/ledtrig-heartbeat.c
> +++ b/drivers/leds/trigger/ledtrig-heartbeat.c
> @@ -190,7 +190,7 @@ static int __init heartbeat_trig_init(void)
>   int rc = led_trigger_register(&heartbeat_led_trigger);
>  
>   if (!rc) {
> - atomic_notifier_chain_register(&panic_notifier_list,
> + atomic_notifier_chain_register(&panic_hypervisor_list,
>  &heartbeat_panic_nb);

Same here. Blinking => loop list.

>   register_reboot_notifier(&heartbeat_reboot_nb);
>   }
> diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c 
> b/drivers/misc/bcm-vk/bcm_vk_dev.c
> index a16b99bdaa13..d9d5199cdb2b 100644
> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   /* register for panic notifier */
>   vk->panic_nb.notifier_call = bcm_vk_on_panic;
> - err = atomic_notifier_chain_register(&panic_notifier_list,
> + err = atomic_notifier_chain_register(&panic_hypervisor_list,
>&vk->panic_nb);

It seems to reset some hardware or so. IMHO, it should go into the
pre-reboot list.


>   if (err) {
>   dev_err(dev, "Fail to register panic notifier\n");
> --- a/drivers/power/reset/ltc2952-poweroff.c
> +++ b/drivers/power/reset/ltc2952-poweroff.c
> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device 
> *pdev)
>   pm_power_off = ltc2952_poweroff_kill;
>  
>   data->panic_notifier.notifier_call 

Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-16 Thread Petr Mladek
On Sun 2022-05-15 19:47:39, Guilherme G. Piccoli wrote:
> On 12/05/2022 11:03, Petr Mladek wrote:
> > This talks only about kdump. The reality is much more complicated.
> > The level affect the order of:
> > 
> > + notifiers vs. kdump
> > + notifiers vs. crash_dump
> > + crash_dump vs. kdump
> 
> First of all, I'd like to ask you please to clarify to me *exactly* what
> are the differences between "crash_dump" and "kdump". I'm sorry if
> that's a silly question, I need to be 100% sure I understand the
> concepts the same way you do.

Ah, it should have been:

 + notifiers vs. kmsg_dump
 + notifiers vs. crash_dump
 + crash_dump vs. kmsg_dump

I am sorry for the confusion. Even "crash_dump" is slightly
misleading because there is no function with this name.
But it seems to be easier to understand than __crash_kexec().


> > There might theoretically many variants of the ordering of kdump,
> > crash_dump, and the 4 notifier list. Some variants do not make
> > much sense. You choose 5 variants and tried to select them by
> > a level number.
> > 
> > The question is if we really could easily describe the meaning this
> > way. It is not only about a "level" of notifiers before kdump. It is
> > also about the ordering of crash_dump vs. kdump. IMHO, "level"
> > semantic does not fit there.
> > 
> > Maybe more parameters might be easier to understand the effect.
> > Anyway, we first need to agree on the chosen variants.
> > I am going to discuss it more in the code, see below.
> > 
> > 
> > [...] 
> > Here is the code using the above functions. It helps to discuss
> > the design and logic.
> > 
> > I have to say that the logic is very unclear. Almost all
> > functions are called twice:
> > 
> > The really used code path is defined by order_panic_notifiers_and_kdump()
> > that encodes "level" into "bits". The bits are then flipped in
> > panic_notifier_*_once() calls that either do something or not.
> > kmsg_dump() is called according to the bit flip.
> > 
> > Also I guess that it is good proof that "level" abstraction does
> > not fit here. Normal levels would not need this kind of magic.
> 
> Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
> disagree here - I'm much more fond to this kind of code than a bunch of
> if/else blocks that almost give headaches. Encoding such "level" logic
> in the if/else scheme is very convoluted, generates a very big code. And
> the functions aren't so black magic - they map a level in bits, and the
> functions _once() are called...once! Although we switch the position in
> the code, so there are 2 calls, one of them is called and the other not.

I see. Well, I would consider this as a warning that the approach is
too complex. If the code, using if/then/else, would cause headaches
then also understanding of the behavior would cause headaches for
both users and programmers.


> But that's totally fine to change - especially if we're moving away from
> the "level" logic. I see below you propose a much simpler approach - if
> we follow that, definitely we won't need the "black magic" approach heheh

I do not say that my proposal is fully correct. But we really need
this kind of simpler approach.


> > OK, the question is how to make it better.

> > One option "panic_prefer_crash_dump" should be enough.
> > And the code might look like:
> > 
> > void panic()
> > {
> > [...]
> > dump_stack();
> > kgdb_panic(buf);
> > 
> > < ---  here starts the reworked code --- >
> > 
> > /* crash dump is enough when enabled and preferred. */
> > if (panic_prefer_crash_dump)
> > __crash_kexec(NULL);
> > 
> > /* Stop other CPUs and focus on handling the panic state. */
> > if (has_kexec_crash_image)
> > crash_smp_send_stop();
> > else
> > smp_send_stop()
> > 
> 
> Here we have a very important point. Why do we need 2 variants of SMP
> CPU stopping functions? I disagree with that - my understanding of this
> after some study in architectures is that the crash_() variant is
> "stronger", should work in all cases and if not, we should fix that -
> that'd be a bug.
> 
> Such variant either maps to smp_send_stop() (in various architectures,
> including XEN/x86) or overrides the basic function with more proper
> handling for panic() case...I don't see why we still need such
> distinction, i

Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-12 Thread Petr Mladek
Hello,

first, I am sorry for stepping into the discussion so late.
I was busy with some other stuff and this patchset is far
from trivial.

Second, thanks a lot for putting so much effort into it.
Most of the changes look pretty good, especially all
the fixes of particular notifiers and split into
four lists.

Though this patch will need some more love. See below
for more details.


On Wed 2022-04-27 19:49:18, Guilherme G. Piccoli wrote:
> The panic() function is somewhat convoluted - a lot of changes were
> made over the years, adding comments that might be misleading/outdated
> now, it has a code structure that is a bit complex to follow, with
> lots of conditionals, for example. The panic notifier list is something
> else - a single list, with multiple callbacks of different purposes,
> that run in a non-deterministic order and may affect hardly kdump
> reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.
> 
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3784,6 +3791,33 @@
>   timeout < 0: reboot immediately
>   Format: 
>  
> + panic_notifiers_level=
> + [KNL] Set the panic notifiers execution order.
> + Format: 
> + We currently have 4 lists of panic notifiers; based
> + on the functionality and risk (for panic success) the
> + callbacks are added in a given list. The lists are:
> + - hypervisor/FW notification list (low risk);
> + - informational list (low/medium risk);
> + - pre_reboot list (higher risk);
> + - post_reboot list (only run late in panic and after
> + kdump, not configurable for now).
> + This parameter defines the ordering of the first 3
> + lists with regards to kdump; the levels determine
> + which set of notifiers execute before kdump. The
> + accepted levels are:

This talks only about kdump. The reality is much more complicated.
The level affect the order of:

+ notifiers vs. kdump
+ notifiers vs. crash_dump
+ crash_dump vs. kdump

There might theoretically many variants of the ordering of kdump,
crash_dump, and the 4 notifier list. Some variants do not make
much sense. You choose 5 variants and tried to select them by
a level number.

The question is if we really could easily describe the meaning this
way. It is not only about a "level" of notifiers before kdump. It is
also about the ordering of crash_dump vs. kdump. IMHO, "level"
semantic does not fit there.

Maybe more parameters might be easier to understand the effect.
Anyway, we first need to agree on the chosen variants.
I am going to discuss it more in the code, see below.



> + 0: kdump is the first thing to run, NO list is
> + executed before kdump.
> + 1: only the hypervisor list is executed before kdump.
> + 2 (default level): the hypervisor list and (*if*
> + there's any kmsg_dumper defined) the informational
> + list are executed before kdump.
> + 3: both the hypervisor and the informational lists
> + (always) execute before kdump.
> + 4: the 3 lists (hypervisor, info and pre_reboot)
> + execute before kdump - this behavior is analog to the
> + deprecated parameter "crash_kexec_post_notifiers".
> +
>   panic_print=Bitmask for printing system info when panic happens.
>   User can chose combination of the following bits:
>   bit 0: print all tasks info
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -183,6 +195,112 @@ static void panic_print_sys_info(bool console_flush)
>   ftrace_dump(DUMP_ALL);
>  }
>  
> +/*
> + * Helper that accumulates all console flushing routines executed on panic.
> + */
> +static void console_flushing(void)
> +{
> +#ifdef CONFIG_VT
> + unblank_screen();
> +#endif
> + console_unblank();
> +
> + /*
> +  * In this point, we may have disabled other CPUs, hence stopping the
> +  * CPU holding the lock while still having some valuable data in the
> +  * console buffer.
> +  *
> +  * Try to acquire the lock then release it regardless of the result.
> +  * The release will also print the buffers out. Locks debug should
> +  * be disabled to avoid reporting bad unlock balance when panic()
> +  * is not being called from OOPS.
> +  */
> + debug_locks_off();
> + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
> + panic_print_sys_info(true);
> +}
> +
> +#define PN_HYPERVISOR_BIT0
> +#define PN_INFO_BIT  1
> +#define PN_PRE_REBOOT_BIT

Re: [PATCH 17/30] tracing: Improve panic/die notifiers

2022-05-11 Thread Petr Mladek
On Wed 2022-04-27 19:49:11, Guilherme G. Piccoli wrote:
> Currently the tracing dump_on_oops feature is implemented
> through separate notifiers, one for die/oops and the other
> for panic. With the addition of panic notifier "id", this
> patch makes use of such "id" to unify both functions.
> 
> It also comments the function and changes the priority of the
> notifier blocks, in order they run early compared to other
> notifiers, to prevent useless trace data (like the callback
> names for the other notifiers). Finally, we also removed an
> unnecessary header inclusion.
> 
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -9767,38 +9766,46 @@ static __init int tracer_init_tracefs(void)
>  
>  fs_initcall(tracer_init_tracefs);
>  
> -static int trace_panic_handler(struct notifier_block *this,
> -unsigned long event, void *unused)
> +/*
> + * The idea is to execute the following die/panic callback early, in order
> + * to avoid showing irrelevant information in the trace (like other panic
> + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> + * warnings get disabled (to prevent potential log flooding).
> + */
> +static int trace_die_panic_handler(struct notifier_block *self,
> + unsigned long ev, void *unused)
>  {
> - if (ftrace_dump_on_oops)
> + int do_dump;
> +
> + if (!ftrace_dump_on_oops)
> + return NOTIFY_DONE;
> +
> + switch (ev) {
> + case DIE_OOPS:
> + do_dump = 1;
> + break;
> + case PANIC_NOTIFIER:
> + do_dump = 1;
> + break;

DIE_OOPS and PANIC_NOTIFIER are from different enum.
It feels like comparing apples with oranges here.

IMHO, the proper way to unify the two notifiers is
a check of the @self parameter. Something like:

static int trace_die_panic_handler(struct notifier_block *self,
unsigned long ev, void *unused)
{
if (self == trace_die_notifier && val != DIE_OOPS)
goto out;

ftrace_dump(ftrace_dump_on_oops);
out:
return NOTIFY_DONE;
}

Best Regards,
Petr



Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-11 Thread Petr Mladek
On Tue 2022-05-10 21:46:38, John Ogness wrote:
> On 2022-05-10, Steven Rostedt  wrote:
> >> As already mentioned in the other reply, panic() sometimes stops the
> >> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> >> 
> >> Another situation is when the CPU using the lock ends in some
> >> infinite loop because something went wrong. The system is in
> >> an unpredictable state during panic().
> >> 
> >> I am not sure if this is possible with the code under gsmi_dev.lock
> >> but such things really happen during panic() in other subsystems.
> >> Using trylock in the panic() code path is a good practice.
> >
> > I believe that Peter Zijlstra had a special spin lock for NMIs or
> > early printk, where it would not block if the lock was held on the
> > same CPU. That is, if an NMI happened and paniced while this lock was
> > held on the same CPU, it would not deadlock. But it would block if the
> > lock was held on another CPU.
> 
> Yes. And starting with 5.19 it will be carrying the name that _you_ came
> up with (cpu_sync):
> 
> printk_cpu_sync_get_irqsave()
> printk_cpu_sync_put_irqrestore()

There is a risk that this lock might become a big kernel lock.

This special lock would need to be used even during normal
system operation. It does not make sense to suddenly start using
another lock during panic.

So I think that we should think twice before using it.
I would prefer using trylock of the original lock when
possible during panic.

It is possible that I miss something.

Best Regards,
Petr



Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

2022-05-10 Thread Petr Mladek
On Wed 2022-04-27 19:49:09, Guilherme G. Piccoli wrote:
> This patch improves the panic/die notifiers in this driver by
> making use of a passed "id" instead of comparing pointer
> address; also, it removes an useless prototype declaration
> and unnecessary header inclusion.
> 
> This is part of a panic notifiers refactor - this notifier in
> the future will be moved to a new list, that encompass the
> information notifiers only.
> 
> --- a/drivers/bus/brcmstb_gisb.c
> +++ b/drivers/bus/brcmstb_gisb.c
> @@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, 
> void *dev_id)
>  /*
>   * Dump out gisb errors on die or panic.
>   */
> -static int dump_gisb_error(struct notifier_block *self, unsigned long v,
> -void *p);
> -
> -static struct notifier_block gisb_die_notifier = {
> - .notifier_call = dump_gisb_error,
> -};
> -
> -static struct notifier_block gisb_panic_notifier = {
> - .notifier_call = dump_gisb_error,
> -};
> -
>  static int dump_gisb_error(struct notifier_block *self, unsigned long v,
>  void *p)
>  {
>   struct brcmstb_gisb_arb_device *gdev;
> - const char *reason = "panic";
> + const char *reason = "die";
>  
> - if (self == &gisb_die_notifier)
> - reason = "die";
> + if (v == PANIC_NOTIFIER)
> + reason = "panic";

IMHO, the check of the @self parameter was the proper solution.

"gisb_die_notifier" list uses @val from enum die_val.
"gisb_panic_notifier" list uses @val from enum panic_notifier_val.

These are unrelated types. It might easily break when
someone defines the same constant also in enum die_val.

Best Regards,
Petr



Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-10 Thread Petr Mladek
On Wed 2022-04-27 19:49:08, Guilherme G. Piccoli wrote:
> The notifiers infrastructure provides a way to pass an "id" to the
> callbacks to determine what kind of event happened, i.e., what is
> the reason behind they getting called.
> 
> The panic notifier currently pass 0, but this is soon to be
> used in a multi-targeted notifier, so let's pass a meaningful
> "id" over there.
>
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  include/linux/panic_notifier.h | 5 +
>  kernel/panic.c | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index 41e32483d7a7..07dced83a783 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> +enum panic_notifier_val {
> + PANIC_UNUSED,
> + PANIC_NOTIFIER = 0xDEAD,
> +};

Hmm, this looks like a hack. PANIC_UNUSED will never be used.
All notifiers will be always called with PANIC_NOTIFIER.

The @val parameter is normally used when the same notifier_list
is used in different situations.

But you are going to use it when the same notifier is used
in more lists. This is normally distinguished by the @nh
(atomic_notifier_head) parameter.

IMHO, it is a bad idea. First, it would confuse people because
it does not follow the original design of the parameters.
Second, the related code must be touched anyway when
the notifier is moved into another list so it does not
help much.

Or do I miss anything, please?

Best Regards,
Petr



Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

2022-05-10 Thread Petr Mladek
On Wed 2022-04-27 19:49:05, Guilherme G. Piccoli wrote:
> Currently the panic notifiers from user mode linux don't follow
> the convention for most of the other notifiers present in the
> kernel (indentation, priority setting, numeric return).
> More important, the priorities could be improved, since it's a
> special case (userspace), hence we could run the notifiers earlier;
> user mode linux shouldn't care much with other panic notifiers but
> the ordering among the mconsole and arch notifier is important,
> given that the arch one effectively triggers a core dump.

It is not clear to me why user mode linux should not care about
the other notifiers. It might be because I do not know much
about the user mode linux.

Is the because they always create core dump or are never running
in a hypervisor or ...?

AFAIK, the notifiers do many different things. For example, there
is a notifier that disables RCU watchdog, print some extra
information. Why none of them make sense here?

> This patch fixes that by running the mconsole notifier as the first
> panic notifier, followed by the architecture one (that coredumps).
> Also, we remove a useless header inclusion.


Best Regards,
Petr



Re: [PATCH 10/30] alpha: Clean-up the panic notifier code

2022-05-10 Thread Petr Mladek
On Mon 2022-05-09 11:13:17, Guilherme G. Piccoli wrote:
> On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> > The alpha panic notifier has some code issues, not following
> > the conventions of other notifiers. Also, it might halt the
> > machine but still it is set to run as early as possible, which
> > doesn't seem to be a good idea.

Yeah, it is pretty strange behavior.

I looked into the history. This notifier was added into the alpha code
in 2.4.0-test2pre2. In this historic code, the default panic() code
either rebooted after a timeout or ended in a infinite loop. There
was not crasdump at that times.

The notifier allowed to change the behavior. There were 3 notifiers:

   + mips and mips64 ended with blinking in panic()
   + alpha did __halt() in this srm case

They both still do this. I guess that it is some historic behavior
that people using these architectures are used to.

Anyway, it makes sense to do this as the last notifier after
dumping other information.

Reviewed-by: Petr Mladek 

Best Regards,
Petr



Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Petr Mladek
On Wed 2022-04-27 19:48:59, Guilherme G. Piccoli wrote:
> The pvpanic driver relies on panic notifiers to execute a callback
> on panic event. Such function is executed in atomic context - the
> panic function disables local IRQs, preemption and all other CPUs
> that aren't running the panic code.
> 
> With that said, it's dangerous to use regular spinlocks in such path,
> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple 
> instances").
> This patch fixes that by replacing regular spinlocks with the trylock
> safer approach.

It seems that the lock is used just to manipulating a list. A super
safe solution would be to use the rcu API: rcu_add_rcu() and
list_del_rcu() under rcu_read_lock(). The spin lock will not be
needed and the list will always be valid.

The advantage would be that it will always call members that
were successfully added earlier. That said, I am not familiar
with pvpanic and am not sure if it is worth it.

> It also fixes an old comment (about a long gone framebuffer code) and
> the notifier priority - we should execute hypervisor notifiers early,
> deferring this way the panic action to the hypervisor, as expected by
> the users that are setting up pvpanic.

This should be done in a separate patch. It changes the behavior.
Also there might be a discussion whether it really should be
the maximal priority.

Best Regards,
Petr



Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Petr Mladek
On Tue 2022-05-03 16:12:09, Guilherme G. Piccoli wrote:
> On 03/05/2022 15:03, Evan Green wrote:
> > [...]
> > gsmi_shutdown_reason() is a common function called in other scenarios
> > as well, like reboot and thermal trip, where it may still make sense
> > to wait to acquire a spinlock. Maybe we should add a parameter to
> > gsmi_shutdown_reason() so that you can get your change on panic, but
> > we don't convert other callbacks into try-fail scenarios causing us to
> > miss logs.
> > 
> 
> Hi Evan, thanks for your feedback, much appreciated!
> What I've done in other cases like this was to have a helper checking
> the spinlock in the panic notifier - if we can acquire that, go ahead
> but if not, bail out. For a proper example of an implementation, check
> patch 13 of the series:
> https://lore.kernel.org/lkml/20220427224924.592546-14-gpicc...@igalia.com/ .
> 
> Do you agree with that, or prefer really a parameter in
> gsmi_shutdown_reason() ? I'll follow your choice =)

I see two more alternative solutions:

1st variant is a trick already used in console write() callbacks.
They do trylock() when oops_in_progress is set. They remember
the result to prevent double unlock when printing Oops messages and
the system will try to continue working. For example:

pl011_console_write(struct console *co, const char *s, unsigned int count)
{
[...]
int locked = 1;
[...]
if (uap->port.sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock(&uap->port.lock);
else
spin_lock(&uap->port.lock);

[...]

if (locked)
spin_unlock(&uap->port.lock);
}


2nd variant is to check panic_cpu variable. It is used in printk.c.
We might move the function to panic.h:

static bool panic_in_progress(void)
{
return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID);
}

and then do:

if (panic_in_progress()) {
...


> > Though thinking more about it, is this really a Good Change (TM)? The
> > spinlock itself already disables interrupts, meaning the only case
> > where this change makes a difference is if the panic happens from
> > within the function that grabbed the spinlock (in which case the
> > callback is also likely to panic), or in an NMI that panics within
> > that window.

As already mentioned in the other reply, panic() sometimes stops
the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().

Another situation is when the CPU using the lock ends in some
infinite loop because something went wrong. The system is in
an unpredictable state during panic().

I am not sure if this is possible with the code under gsmi_dev.lock
but such things really happen during panic() in other subsystems.
Using trylock in the panic() code path is a good practice.

Best Regards,
Petr



Re: [Xen-devel] [PATCH v2 1/1] treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively

2019-04-09 Thread Petr Mladek
On Wed 2019-04-03 14:28:14, Sakari Ailus wrote:
> Ping.
> 
> On Tue, Mar 26, 2019 at 02:35:10PM +0100, Petr Mladek wrote:
> > Linus,
> > 
> > On Mon 2019-03-25 21:32:28, Sakari Ailus wrote:
> > > %pF and %pf are functionally equivalent to %pS and %ps conversion
> > > specifiers. The former are deprecated, therefore switch the current users
> > > to use the preferred variant.
> > > 
> > > The changes have been produced by the following command:
> > > 
> > >   git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
> > >   while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
> > > 
> > > And verifying the result.
> > 
> > I guess that the best timing for such tree-wide clean up is the end
> > of the merge window. Should we wait for 5.2 or is it still acceptable
> > to push this for 5.1-rc3?
> 
> The patch still cleanly applies to linux-next as wells as Linus's tree.
> Some %pf bits have appeared and fixed since (include/trace/events/timer.h);
> the fix is in linux-next so once that and this patch are merged, there are
> no remaining %pf (or %pF) users left.

I have pushed the patch into printk.git, branch for-5.2-pf-removal.
It is v2 without the include/trace/events/timer.h stuff.

Best Regards,
Petr

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/1] treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively

2019-03-26 Thread Petr Mladek
Linus,

On Mon 2019-03-25 21:32:28, Sakari Ailus wrote:
> %pF and %pf are functionally equivalent to %pS and %ps conversion
> specifiers. The former are deprecated, therefore switch the current users
> to use the preferred variant.
> 
> The changes have been produced by the following command:
> 
>   git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
>   while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
> 
> And verifying the result.

I guess that the best timing for such tree-wide clean up is the end
of the merge window. Should we wait for 5.2 or is it still acceptable
to push this for 5.1-rc3?

The %pf/%pF modifiers are deprecated since the commit
04b8eb7a4ccd9ef93 ("symbol lookup: introduce
dereference_symbol_descriptor()") in v4.16-rc1.

Note that include/trace/events/timer.h is handled separately
in linux-next by
https://lkml.kernel.org/r/20190321120921.16463-4-anna-ma...@linutronix.de

Or we could use v1 of this patch, see
https://lkml.kernel.org/r/20190322132108.25501-2-sakari.ai...@linux.intel.com

> Signed-off-by: Sakari Ailus 
> Acked-by: David Sterba  (for btrfs)
> Acked-by: Mike Rapoport  (for mm/memblock.c)
> Acked-by: Rafael J. Wysocki 

Reviewed-by: Petr Mladek 

Best Regards,
Petr

> ---
> I split this off from the set as there's a change in
> include/trace/events/timer.h that conflicts with v1 of this patch and
> should the second patch to be applied without that change, results into
> invalid use of %pf. To address the matter safely without conflicts or
> trying to print invalid pointer conversions, this patch and the other one
> changing include/trace/events/timer.h must be merged before second patch
> in v1 of this set can go in.
> 
> since v1:
> 
> - Drop such changes to include/trace/events/timer.h where %pf has already
>   been converted to %ps in linux-next master.
> 
>  arch/alpha/kernel/pci_iommu.c   | 20 ++--
>  arch/arm/mach-imx/pm-imx6.c |  2 +-
>  arch/arm/mm/alignment.c |  2 +-
>  arch/arm/nwfpe/fpmodule.c   |  2 +-
>  arch/microblaze/mm/pgtable.c|  2 +-
>  arch/sparc/kernel/ds.c  |  2 +-
>  arch/um/kernel/sysrq.c  |  2 +-
>  arch/x86/include/asm/trace/exceptions.h |  2 +-
>  arch/x86/kernel/irq_64.c|  2 +-
>  arch/x86/mm/extable.c   |  4 ++--
>  arch/x86/xen/multicalls.c   |  2 +-
>  drivers/acpi/device_pm.c|  2 +-
>  drivers/base/power/main.c   |  6 +++---
>  drivers/base/syscore.c  | 12 ++--
>  drivers/block/drbd/drbd_receiver.c  |  2 +-
>  drivers/block/floppy.c  | 10 +-
>  drivers/cpufreq/cpufreq.c   |  2 +-
>  drivers/mmc/core/quirks.h   |  2 +-
>  drivers/nvdimm/bus.c|  2 +-
>  drivers/nvdimm/dimm_devs.c  |  2 +-
>  drivers/pci/pci-driver.c| 14 +++---
>  drivers/pci/quirks.c|  4 ++--
>  drivers/pnp/quirks.c|  2 +-
>  drivers/scsi/esp_scsi.c |  2 +-
>  fs/btrfs/tests/free-space-tree-tests.c  |  4 ++--
>  fs/f2fs/f2fs.h  |  2 +-
>  fs/pstore/inode.c   |  2 +-
>  include/trace/events/btrfs.h|  2 +-
>  include/trace/events/cpuhp.h|  4 ++--
>  include/trace/events/preemptirq.h   |  2 +-
>  include/trace/events/rcu.h  |  4 ++--
>  include/trace/events/sunrpc.h   |  2 +-
>  include/trace/events/vmscan.h   |  4 ++--
>  include/trace/events/workqueue.h|  4 ++--
>  include/trace/events/xen.h  |  2 +-
>  init/main.c |  6 +++---
>  kernel/async.c  |  4 ++--
>  kernel/events/uprobes.c |  2 +-
>  kernel/fail_function.c  |  2 +-
>  kernel/irq/debugfs.c|  2 +-
>  kernel/irq/handle.c |  2 +-
>  kernel/irq/manage.c |  2 +-
>  kernel/irq/spurious.c   |  4 ++--
>  kernel/rcu/tree.c   |  2 +-
>  kernel/stop_machine.c   |  2 +-
>  kernel/time/sched_clock.c   |  2 +-
>  kernel/time/timer.c |  2 +-
>  kernel/workqueue.c  | 12 ++--
>  lib/error-inject.c  |  2 +-
>  lib/percpu-refcount.c   |  4 ++--
>  mm/memblock.c   | 12 ++--
>  mm/memory.c |  2 +-
>  mm/vmscan.c |  2 +-
>  net/ceph/osd_client.c  

Re: [Xen-devel] [PATCH v2 21/21] memblock: drop memblock_alloc_*_nopanic() variants

2019-01-30 Thread Petr Mladek
On Mon 2019-01-21 10:04:08, Mike Rapoport wrote:
> As all the memblock allocation functions return NULL in case of error
> rather than panic(), the duplicates with _nopanic suffix can be removed.
> 
> Signed-off-by: Mike Rapoport 
> Acked-by: Greg Kroah-Hartman 
> ---
>  arch/arc/kernel/unwind.c   |  3 +--
>  arch/sh/mm/init.c  |  2 +-
>  arch/x86/kernel/setup_percpu.c | 10 +-
>  arch/x86/mm/kasan_init_64.c| 14 --
>  drivers/firmware/memmap.c  |  2 +-
>  drivers/usb/early/xhci-dbc.c   |  2 +-
>  include/linux/memblock.h   | 35 ---
>  kernel/dma/swiotlb.c   |  2 +-
>  kernel/printk/printk.c |  9 +----

For printk:
Reviewed-by: Petr Mladek 
Acked-by: Petr Mladek 

Best Regards,
Petr

>  mm/memblock.c  | 35 ---
>  mm/page_alloc.c| 10 +-
>  mm/page_ext.c  |  2 +-
>  mm/percpu.c| 11 ---
>  mm/sparse.c|  6 ++
>  14 files changed, 31 insertions(+), 112 deletions(-)
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 21/21] memblock: drop memblock_alloc_*_nopanic() variants

2019-01-17 Thread Petr Mladek
On Wed 2019-01-16 15:44:21, Mike Rapoport wrote:
> As all the memblock allocation functions return NULL in case of error
> rather than panic(), the duplicates with _nopanic suffix can be removed.

[...]

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c4f0a41..ae65221 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1147,17 +1147,14 @@ void __init setup_log_buf(int early)
>   if (!new_log_buf_len)
>   return;
>  
> - if (early) {
> - new_log_buf =
> - memblock_alloc(new_log_buf_len, LOG_ALIGN);
> - } else {
> - new_log_buf = memblock_alloc_nopanic(new_log_buf_len,
> -   LOG_ALIGN);
> - }
> -
> + new_log_buf = memblock_alloc(new_log_buf_len, LOG_ALIGN);

The above change is enough.

>   if (unlikely(!new_log_buf)) {
> - pr_err("log_buf_len: %lu bytes not available\n",
> - new_log_buf_len);
> + if (early)
> + panic("log_buf_len: %lu bytes not available\n",
> + new_log_buf_len);

panic() is not needed here. printk() will just continue using
the (smaller) static buffer.

> + else
> + pr_err("log_buf_len: %lu bytes not available\n",
> +new_log_buf_len);
>   return;
>   }

Best Regards,
Petr

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 21/27] x86/ftrace: Adapt function tracing for PIE support

2018-05-24 Thread Petr Mladek
On Wed 2018-05-23 12:54:15, Thomas Garnier wrote:
> When using -fPIE/PIC with function tracing, the compiler generates a
> call through the GOT (call *__fentry__@GOTPCREL). This instruction
> takes 6 bytes instead of 5 on the usual relative call.
> 
> If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte nop
> so ftrace can handle the previous 5-bytes as before.
> 
> Position Independent Executable (PIE) support will allow to extended the
> KASLR randomization range below the -2G memory limit.
> 
> Signed-off-by: Thomas Garnier 
> ---
>  arch/x86/include/asm/ftrace.h   |  6 +++--
>  arch/x86/include/asm/sections.h |  4 
>  arch/x86/kernel/ftrace.c| 42 +++--
>  3 files changed, 48 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index c18ed65287d5..8f2decce38d8 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -25,9 +25,11 @@ extern void __fentry__(void);
>  static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  {
>   /*
> -  * addr is the address of the mcount call instruction.
> -  * recordmcount does the necessary offset calculation.
> +  * addr is the address of the mcount call instruction. PIE has always a
> +  * byte added to the start of the function.
>*/
> + if (IS_ENABLED(CONFIG_X86_PIE))
> + addr -= 1;

This seems to modify the address even for modules that are _not_ compiled with
PIE, see below.

>   return addr;
>  }
>  
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 01ebcb6f263e..73b3c30cb7a3 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip, unsigned 
> const char *old_code,
>   return 0;
>  }
>  
> +/* Bytes before call GOT offset */
> +const unsigned char got_call_preinsn[] = { 0xff, 0x15 };
> +
> +static int
> +ftrace_modify_initial_code(unsigned long ip, unsigned const char *old_code,
> +unsigned const char *new_code)
> +{
> + unsigned char replaced[MCOUNT_INSN_SIZE + 1];
> +
> + ftrace_expected = old_code;
> +
> + /*
> +  * If PIE is not enabled or no GOT call was found, default to the
> +  * original approach to code modification.
> +  */
> + if (!IS_ENABLED(CONFIG_X86_PIE) ||
> + probe_kernel_read(replaced, (void *)ip, sizeof(replaced)) ||
> + memcmp(replaced, got_call_preinsn, sizeof(got_call_preinsn)))
> + return ftrace_modify_code_direct(ip, old_code, new_code);

And this looks like an attempt to handle modules compiled without
PIE. Does it works with the right ip in that case?

I wonder if a better solution would be to update
scripts/recordmcount.c to store the incremented location into the module.

IMPORTANT: I have only vague picture about how this all works. It is
possible that I am completely wrong. The code might be correct,
especially if you tested this situation.

Best Regards,
Petr

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel