Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-05 Thread Giancarlo Ferrari
On Fri, Feb 05, 2021 at 09:44:59AM +, Russell King - ARM Linux admin wrote:
> On Fri, Feb 05, 2021 at 12:40:54AM +, Giancarlo Ferrari wrote:
> > Russell,
> > 
> > On Fri, Feb 05, 2021 at 12:18:33AM +, Russell King - ARM Linux admin 
> > wrote:
> > > On Thu, Feb 04, 2021 at 11:48:42PM +, Giancarlo Ferrari wrote:
> > > > Can I ask about having it integrated ?
> > > 
> > > Thanks for testing. Are you willing for me to add:
> > > 
> > > Tested-by: Giancarlo Ferrari 
> > > 
> > > to the commit log?
> > > 
> > 
> > Sure.
> > 
> > I have a question regarding the patch. I saw that the structure start at
> > the end of the relocation code:
> > 
> > data = reboot_code_buffer + relocate_new_kernel_size;
> > 
> > which means it overlap with the global symbol relocate_new_kernel_size.
> > I think is minor comment as the variable is only used in the fncpy()
> > then thrown away.
> 
> The same is true of the rest of the kernel image if that's how you'd
> like to look at it.
> 
> relocate_new_kernel_size is just there to tell the C code the size of
> the function, nothing more nothing less. It isn't there to be copied.
> The copied code doesn't use it.
> 

Yes, clear.

> > Something like:
> > 
> > data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long);
> 
> No. That will place the structure after the size variable, which we
> don't want, and...
> 
> > and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S):
> > 
> > adr r7, relocate_new_kernel_end
> 
> ... we will then need to follow this with:
>   add r7, r7, #4
> 
> or replace it with:
>   adr r7, relocate_new_kernel_end + 4
> 
> so that r7 points at "data".
>

It was what I meant with "accordingly".

> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Regards,


GF


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-05 Thread Russell King - ARM Linux admin
On Fri, Feb 05, 2021 at 12:40:54AM +, Giancarlo Ferrari wrote:
> Russell,
> 
> On Fri, Feb 05, 2021 at 12:18:33AM +, Russell King - ARM Linux admin 
> wrote:
> > On Thu, Feb 04, 2021 at 11:48:42PM +, Giancarlo Ferrari wrote:
> > > Can I ask about having it integrated ?
> > 
> > Thanks for testing. Are you willing for me to add:
> > 
> > Tested-by: Giancarlo Ferrari 
> > 
> > to the commit log?
> > 
> 
> Sure.
> 
> I have a question regarding the patch. I saw that the structure start at
> the end of the relocation code:
> 
> data = reboot_code_buffer + relocate_new_kernel_size;
> 
> which means it overlap with the global symbol relocate_new_kernel_size.
> I think is minor comment as the variable is only used in the fncpy()
> then thrown away.

The same is true of the rest of the kernel image if that's how you'd
like to look at it.

relocate_new_kernel_size is just there to tell the C code the size of
the function, nothing more nothing less. It isn't there to be copied.
The copied code doesn't use it.

> Something like:
> 
> data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long);

No. That will place the structure after the size variable, which we
don't want, and...

> and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S):
> 
> adr   r7, relocate_new_kernel_end

... we will then need to follow this with:
add r7, r7, #4

or replace it with:
adr r7, relocate_new_kernel_end + 4

so that r7 points at "data".

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-04 Thread Giancarlo Ferrari
Sorry for polluting,

On Fri, Feb 05, 2021 at 12:40:54AM +, Giancarlo Ferrari wrote:
> Russell,
> 
> On Fri, Feb 05, 2021 at 12:18:33AM +, Russell King - ARM Linux admin 
> wrote:
> > On Thu, Feb 04, 2021 at 11:48:42PM +, Giancarlo Ferrari wrote:
> > > Can I ask about having it integrated ?
> > 
> > Thanks for testing. Are you willing for me to add:
> > 
> > Tested-by: Giancarlo Ferrari 
> > 
> > to the commit log?
> > 
> 
> Sure.
> 
> I have a question regarding the patch. I saw that the structure start at
> the end of the relocation code:
> 
> data = reboot_code_buffer + relocate_new_kernel_size;
> 
> which means it overlap with the global symbol relocate_new_kernel_size.
> I think is minor comment as the variable is only used in the fncpy()
> then thrown away. Something like:
> 
> data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long);
> 
> and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S):
> 
> adr   r7, relocate_new_kernel_end
> 

my fault. Do as I didn't talk... one is in virtual memory the other in the
initial code.

> > I can move it into the fixes branch which I want to send to Linus by
> > Saturday at the very latest.
> > 
> 
> I am not used with all the merging process. However it sounds great !
> 
> > -- 
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> 
> Thanks in advance,
> 
> 
> GF

Thanks,


GF


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-04 Thread Giancarlo Ferrari
Russell,

On Fri, Feb 05, 2021 at 12:18:33AM +, Russell King - ARM Linux admin wrote:
> On Thu, Feb 04, 2021 at 11:48:42PM +, Giancarlo Ferrari wrote:
> > Can I ask about having it integrated ?
> 
> Thanks for testing. Are you willing for me to add:
> 
> Tested-by: Giancarlo Ferrari 
> 
> to the commit log?
> 

Sure.

I have a question regarding the patch. I saw that the structure start at
the end of the relocation code:

data = reboot_code_buffer + relocate_new_kernel_size;

which means it overlap with the global symbol relocate_new_kernel_size.
I think is minor comment as the variable is only used in the fncpy()
then thrown away. Something like:

data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long);

and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S):

adr r7, relocate_new_kernel_end

> I can move it into the fixes branch which I want to send to Linus by
> Saturday at the very latest.
> 

I am not used with all the merging process. However it sounds great !

> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Thanks in advance,


GF


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-04 Thread Russell King - ARM Linux admin
On Thu, Feb 04, 2021 at 11:48:42PM +, Giancarlo Ferrari wrote:
> Can I ask about having it integrated ?

Thanks for testing. Are you willing for me to add:

Tested-by: Giancarlo Ferrari 

to the commit log?

I can move it into the fixes branch which I want to send to Linus by
Saturday at the very latest.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-04 Thread Giancarlo Ferrari
Hi all,

On Mon, Feb 01, 2021 at 10:18:26PM +, Giancarlo Ferrari wrote:
> Russell,
> 
> On Mon, Feb 01, 2021 at 08:16:33PM +, Russell King - ARM Linux admin 
> wrote:
> > On Mon, Feb 01, 2021 at 08:07:37PM +, Giancarlo Ferrari wrote:
> > > Hi,
> > 
> > Hi,
> > 
> > > Why we should align 3 ? For the fncpy I suppose.
> > 
> > Slightly arbitary really - it gives a nice 8-byte alignment to the data.
> > .align 2 would also be sufficient.
> > 
> > > I don't know now how to proceed now, as you (Mark and you) do completely
> > > the patch.
> > 
> > Please can you test my patch and let us know if it solves your problem
> > (or even if it works! I haven't tested it beyond build-testing.)
> >

I have tested your patch and it works. I have tested using kexec -l/kexec -e.
Considering that .text var are no more written, I assume the issue is gone.

> 
> sure, unfortunately due to restriction, I hope to test it by the end of the
> week. I hope there will be no rush. Otherwise, please let me know.
> 
> > > You see is my first kernel patch submission :) .
> > 
> > Yay. Sorry for giving you a different patch - Mark is quite right that
> > there's a better solution to this problem, which is eliminating the
> > set_kernel_text_rw() call. The only reason I cooked up the patch was
> > doing that would be more in-depth (as you can see from the increased
> > size of the patch.)
> > 
> 
> I definitely agree with you.
> 
> > -- 
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> 
> Thanks again,
> 
> 
> GF

Can I ask about having it integrated ?

Thanks in advance,


GF


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Giancarlo Ferrari
Russell,

On Mon, Feb 01, 2021 at 08:16:33PM +, Russell King - ARM Linux admin wrote:
> On Mon, Feb 01, 2021 at 08:07:37PM +, Giancarlo Ferrari wrote:
> > Hi,
> 
> Hi,
> 
> > Why we should align 3 ? For the fncpy I suppose.
> 
> Slightly arbitary really - it gives a nice 8-byte alignment to the data.
> .align 2 would also be sufficient.
> 
> > I don't know now how to proceed now, as you (Mark and you) do completely
> > the patch.
> 
> Please can you test my patch and let us know if it solves your problem
> (or even if it works! I haven't tested it beyond build-testing.)
>

sure, unfortunately due to restriction, I hope to test it by the end of the
week. I hope there will be no rush. Otherwise, please let me know.

> > You see is my first kernel patch submission :) .
> 
> Yay. Sorry for giving you a different patch - Mark is quite right that
> there's a better solution to this problem, which is eliminating the
> set_kernel_text_rw() call. The only reason I cooked up the patch was
> doing that would be more in-depth (as you can see from the increased
> size of the patch.)
> 

I definitely agree with you.

> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Thanks again,


GF


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Russell King - ARM Linux admin
On Mon, Feb 01, 2021 at 08:07:37PM +, Giancarlo Ferrari wrote:
> Hi,

Hi,

> Why we should align 3 ? For the fncpy I suppose.

Slightly arbitary really - it gives a nice 8-byte alignment to the data.
.align 2 would also be sufficient.

> I don't know now how to proceed now, as you (Mark and you) do completely
> the patch.

Please can you test my patch and let us know if it solves your problem
(or even if it works! I haven't tested it beyond build-testing.)

> You see is my first kernel patch submission :) .

Yay. Sorry for giving you a different patch - Mark is quite right that
there's a better solution to this problem, which is eliminating the
set_kernel_text_rw() call. The only reason I cooked up the patch was
doing that would be more in-depth (as you can see from the increased
size of the patch.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Giancarlo Ferrari
Hi,

On Mon, Feb 01, 2021 at 04:08:38PM +, Russell King - ARM Linux admin wrote:
> On Mon, Feb 01, 2021 at 01:57:14PM +, Mark Rutland wrote:
> > We could simplify this slightly if we moved the kexec_& variables into a
> > struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region
> > reserved in the asm), then here we could do something like:
> > 
> > static struct kexec_vars *kexec_buffer_vars(void *buffer)
> > {
> > unsigned long code = ((unisigned long)relocate_new_kernel) & ~1;
> > unsigned long vars - (unsigned long)relocate_vars;
> > unsigned long offset = vars - code;
> > 
> > return buffer + offset;
> > }
> > 
> > ... and in machine_kexec() do:
> > 
> > struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer);
> > 
> > kv->start_address = image->start;
> > kv->indirection_page = page_list;
> > kv->mach_type = machine-arch_type;
> > kv->boot_atags = arch.kernel_r2;
> > 
> > ... if that looks any better to you?
> 
> Something like this?
> 
> diff --git a/arch/arm/include/asm/kexec-internal.h 
> b/arch/arm/include/asm/kexec-internal.h
> new file mode 100644
> index ..ecc2322db7aa
> --- /dev/null
> +++ b/arch/arm/include/asm/kexec-internal.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ARM_KEXEC_INTERNAL_H
> +#define _ARM_KEXEC_INTERNAL_H
> +
> +struct kexec_relocate_data {
> + unsigned long kexec_start_address;
> + unsigned long kexec_indirection_page;
> + unsigned long kexec_mach_type;
> + unsigned long kexec_r2;
> +};
> +
> +#endif
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index a1570c8bab25..be8050b0c3df 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -170,5 +171,9 @@ int main(void)
>DEFINE(MPU_RGN_PRBAR,  offsetof(struct mpu_rgn, prbar));
>DEFINE(MPU_RGN_PRLAR,  offsetof(struct mpu_rgn, prlar));
>  #endif
> +  DEFINE(KEXEC_START_ADDR,   offsetof(struct kexec_relocate_data, 
> kexec_start_address));
> +  DEFINE(KEXEC_INDIR_PAGE,   offsetof(struct kexec_relocate_data, 
> kexec_indirection_page));
> +  DEFINE(KEXEC_MACH_TYPE,offsetof(struct kexec_relocate_data, 
> kexec_mach_type));
> +  DEFINE(KEXEC_R2,   offsetof(struct kexec_relocate_data, kexec_r2));
>return 0; 
>  }
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index 5d84ad333f05..2b09dad7935e 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -22,11 +23,6 @@
>  extern void relocate_new_kernel(void);
>  extern const unsigned int relocate_new_kernel_size;
>  
> -extern unsigned long kexec_start_address;
> -extern unsigned long kexec_indirection_page;
> -extern unsigned long kexec_mach_type;
> -extern unsigned long kexec_boot_atags;
> -
>  static atomic_t waiting_for_crash_ipi;
>  
>  /*
> @@ -159,6 +155,7 @@ void (*kexec_reinit)(void);
>  void machine_kexec(struct kimage *image)
>  {
>   unsigned long page_list, reboot_entry_phys;
> + struct kexec_relocate_data *data;
>   void (*reboot_entry)(void);
>   void *reboot_code_buffer;
>  
> @@ -174,18 +171,17 @@ void machine_kexec(struct kimage *image)
>  
>   reboot_code_buffer = page_address(image->control_code_page);
>  
> - /* Prepare parameters for reboot_code_buffer*/
> - set_kernel_text_rw();
> - kexec_start_address = image->start;
> - kexec_indirection_page = page_list;
> - kexec_mach_type = machine_arch_type;
> - kexec_boot_atags = image->arch.kernel_r2;
> -
>   /* copy our kernel relocation code to the control code page */
>   reboot_entry = fncpy(reboot_code_buffer,
>_new_kernel,
>relocate_new_kernel_size);
>  
> + data = reboot_code_buffer + relocate_new_kernel_size;
> + data->kexec_start_address = image->start;
> + data->kexec_indirection_page = page_list;
> + data->kexec_mach_type = machine_arch_type;
> + data->kexec_r2 = image->arch.kernel_r2;
> +
>   /* get the identity mapping physical address for the reboot code */
>   reboot_entry_phys = virt_to_idmap(reboot_entry);
>  
> diff --git a/arch/arm/kernel/relocate_kernel.S 
> b/arch/arm/kernel/relocate_kernel.S
> index 72a08786e16e..218d524360fc 100644
> --- a/arch/arm/kernel/relocate_kernel.S
> +++ b/arch/arm/kernel/relocate_kernel.S
> @@ -5,14 +5,16 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>   .align  3   /* not needed for this code, but keeps fncpy() happy */
>  
>  ENTRY(relocate_new_kernel)
>  
> - ldr r0,kexec_indirection_page
> - ldr r1,kexec_start_address
> + adr r7, relocate_new_kernel_end
> + ldr r0, [r7, 

Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Giancarlo Ferrari
Hi,

On Mon, Feb 01, 2021 at 03:30:12PM +, Mark Rutland wrote:
> Hi,
> 
> On Mon, Feb 01, 2021 at 02:39:46PM +, Giancarlo Ferrari wrote:
> > On Mon, Feb 01, 2021 at 12:47:20PM +, Mark Rutland wrote:
> > > On Mon, Feb 01, 2021 at 12:44:56AM +, Giancarlo Ferrari wrote:
> > > > machine_kexec() need to set rw permission in text and rodata sections
> > > > to assign some variables (e.g. kexec_start_address). To do that at
> > > > the end (after flushing pdm in memory, etc.) it needs to invalidate
> > > > TLB [section] entries.
> > > 
> > > It'd be worth noting explicitly that set_kernel_text_rw() alters
> > > current->active_mm...
> > > 
> > > > If during the TLB invalidation an interrupt occours, which might cause
> > > > a context switch, there is the risk to inject invalid TLBs, with ro
> > > > permissions.
> > > 
> > > ... which is why if there's a context switch things can go wrong, since
> > > active_mm isn't stable, and so it's possible that set_kernel_text_rw()
> > > updates multiple tables, none of which might be the active table at the
> > > point we try to make an access.
> > 
> > Maybe the behaviour causing issue is not completely clear to me, and I do
> > apologize for that (moreover I haven't eougth debug capabilities).
> 
> I think we're in rough agreement that the issue is likely related to the
> context switch, but our understanding of the specifics differs (and I
> think we're missing a detail here).
> 
> > However, current-active_mm is switched among context switches. Correct ?
> 
> In some cases current->active_mm is not switched, and can be inherited
> over a context switch. When switching to a user task, we always switch
> to its mm (which becomes the active_mm), but when switching to a kthread
> we retain the previous task's mm as the active_mm as part of the lazy
> context switch.
> 
> So while a kthread is preemptible, its active_mm (and active ASID) can
> change under its feet. That could happen anywhere while the task is
> preemptible, e.g. in the middle of set_kernel_text_rw(), or
> mid-modification to the kexec variables.
> 

Yes.

In my understanding, even in the case of user process, when current->active_mm 
is switched,
we can run into trouble. For instance:

- Process A issue kexec (PageTables entry of A has 0x8000_-0x8010_ with 
ro
  permission and section is global, no NG bit set).

- A context switch happens in the middle of set_kernel_text_rw(), right after 
the
  section 0x8000_-0x8010_ has been invalidated.

- Process B, in its execution, re-inject its own PageTable entry with ro 
permission, which
  is not shared with Process A (and is not touched by the previous 
invalidation) in the MMU
  cache.

- When Process A, is rescheduled, it carries on with the invalidation, but 
unfortunately I have
  "wrong" entries in the MMU.

> > So, in principle, the invalidation, if stopped, is carried on where it
> > left.
> 
> That's true so long as all the processes we switch between share the
> same leaf tables for the region we're altering. If not, then the lazy
> context switch means that those tables can change under our feet.
> 
> I believe the tables mapping the kernel text are shared by all threads,
> and if so this _should_ be true. Russell might be able to confirm that
> or correct me if I have that wrong.
> 

I am not ready to put my hand on the fire, but I seen the behaviour described 
above.

> > I thought the issue was that the PageTable entry for the section 0x8010_
> > is global, thus not indexed by ASID (Address Space ID). By the fact that 
> > each
> > process has its own version of that entry, is the cause of the issue, as the
> > schedule process might bringing a spurious entry (with ro permission) in the
> > MMU cache.
> 
> The TLB invalidation performed under set_kernel_text_rw() affects all
> ASIDs on the current CPU, so there shouldn't be any stale RO TLB entries
> to hit unless the kthread is migrated to another CPU.
> 
> > If the entry is not global holds the ASID, and the issue cannot happen.
> 
> I don't think that's true, since switching to a different active_mm
> would also change ASID, and we'd have no additional guarantee.
> 

I agree, my fault.

> Thanks,
> Mark.

Thanks,


GF


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Russell King - ARM Linux admin
On Mon, Feb 01, 2021 at 04:32:40PM +, Mark Rutland wrote:
> I reckon here we need:
> 
>   __cpuc_flush_dcache_area(reboot_code_buffer,
>relocate_new_kernel_size + sizeof(*data));
> 
> ... to make sure both the instructions and data are visible with the MMU
> off (since fncpy() only cleans to the PoU, not the PoC).

We don't. When soft_restart() turns the MMU off, and it calls
flush_cache_all() before and afterwards to ensure that all dirty lines
are pushed out.

The flushing to PoU in fncpy() is to cover other cases.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Mark Rutland
On Mon, Feb 01, 2021 at 04:08:38PM +, Russell King - ARM Linux admin wrote:
> On Mon, Feb 01, 2021 at 01:57:14PM +, Mark Rutland wrote:
> > We could simplify this slightly if we moved the kexec_& variables into a
> > struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region
> > reserved in the asm), then here we could do something like:
> > 
> > static struct kexec_vars *kexec_buffer_vars(void *buffer)
> > {
> > unsigned long code = ((unisigned long)relocate_new_kernel) & ~1;
> > unsigned long vars - (unsigned long)relocate_vars;
> > unsigned long offset = vars - code;
> > 
> > return buffer + offset;
> > }
> > 
> > ... and in machine_kexec() do:
> > 
> > struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer);
> > 
> > kv->start_address = image->start;
> > kv->indirection_page = page_list;
> > kv->mach_type = machine-arch_type;
> > kv->boot_atags = arch.kernel_r2;
> > 
> > ... if that looks any better to you?
> 
> Something like this?

Nice!

That looks about right to me, modulo a bit of cache maintenance noted
below.

> diff --git a/arch/arm/include/asm/kexec-internal.h 
> b/arch/arm/include/asm/kexec-internal.h
> new file mode 100644
> index ..ecc2322db7aa
> --- /dev/null
> +++ b/arch/arm/include/asm/kexec-internal.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ARM_KEXEC_INTERNAL_H
> +#define _ARM_KEXEC_INTERNAL_H
> +
> +struct kexec_relocate_data {
> + unsigned long kexec_start_address;
> + unsigned long kexec_indirection_page;
> + unsigned long kexec_mach_type;
> + unsigned long kexec_r2;
> +};
> +
> +#endif
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index a1570c8bab25..be8050b0c3df 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -170,5 +171,9 @@ int main(void)
>DEFINE(MPU_RGN_PRBAR,  offsetof(struct mpu_rgn, prbar));
>DEFINE(MPU_RGN_PRLAR,  offsetof(struct mpu_rgn, prlar));
>  #endif
> +  DEFINE(KEXEC_START_ADDR,   offsetof(struct kexec_relocate_data, 
> kexec_start_address));
> +  DEFINE(KEXEC_INDIR_PAGE,   offsetof(struct kexec_relocate_data, 
> kexec_indirection_page));
> +  DEFINE(KEXEC_MACH_TYPE,offsetof(struct kexec_relocate_data, 
> kexec_mach_type));
> +  DEFINE(KEXEC_R2,   offsetof(struct kexec_relocate_data, kexec_r2));
>return 0; 
>  }
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index 5d84ad333f05..2b09dad7935e 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -22,11 +23,6 @@
>  extern void relocate_new_kernel(void);
>  extern const unsigned int relocate_new_kernel_size;
>  
> -extern unsigned long kexec_start_address;
> -extern unsigned long kexec_indirection_page;
> -extern unsigned long kexec_mach_type;
> -extern unsigned long kexec_boot_atags;
> -
>  static atomic_t waiting_for_crash_ipi;
>  
>  /*
> @@ -159,6 +155,7 @@ void (*kexec_reinit)(void);
>  void machine_kexec(struct kimage *image)
>  {
>   unsigned long page_list, reboot_entry_phys;
> + struct kexec_relocate_data *data;
>   void (*reboot_entry)(void);
>   void *reboot_code_buffer;
>  
> @@ -174,18 +171,17 @@ void machine_kexec(struct kimage *image)
>  
>   reboot_code_buffer = page_address(image->control_code_page);
>  
> - /* Prepare parameters for reboot_code_buffer*/
> - set_kernel_text_rw();
> - kexec_start_address = image->start;
> - kexec_indirection_page = page_list;
> - kexec_mach_type = machine_arch_type;
> - kexec_boot_atags = image->arch.kernel_r2;
> -
>   /* copy our kernel relocation code to the control code page */
>   reboot_entry = fncpy(reboot_code_buffer,
>_new_kernel,
>relocate_new_kernel_size);
>  
> + data = reboot_code_buffer + relocate_new_kernel_size;
> + data->kexec_start_address = image->start;
> + data->kexec_indirection_page = page_list;
> + data->kexec_mach_type = machine_arch_type;
> + data->kexec_r2 = image->arch.kernel_r2;

I reckon here we need:

__cpuc_flush_dcache_area(reboot_code_buffer,
 relocate_new_kernel_size + sizeof(*data));

... to make sure both the instructions and data are visible with the MMU
off (since fncpy() only cleans to the PoU, not the PoC).

Otherwise this all looks good to me.

Mark.

> +
>   /* get the identity mapping physical address for the reboot code */
>   reboot_entry_phys = virt_to_idmap(reboot_entry);
>  
> diff --git a/arch/arm/kernel/relocate_kernel.S 
> b/arch/arm/kernel/relocate_kernel.S
> index 72a08786e16e..218d524360fc 100644
> --- 

Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Russell King - ARM Linux admin
On Mon, Feb 01, 2021 at 01:57:14PM +, Mark Rutland wrote:
> We could simplify this slightly if we moved the kexec_& variables into a
> struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region
> reserved in the asm), then here we could do something like:
> 
> static struct kexec_vars *kexec_buffer_vars(void *buffer)
> {
>   unsigned long code = ((unisigned long)relocate_new_kernel) & ~1;
>   unsigned long vars - (unsigned long)relocate_vars;
>   unsigned long offset = vars - code;
> 
>   return buffer + offset;
> }
> 
> ... and in machine_kexec() do:
> 
>   struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer);
> 
>   kv->start_address = image->start;
>   kv->indirection_page = page_list;
>   kv->mach_type = machine-arch_type;
>   kv->boot_atags = arch.kernel_r2;
> 
> ... if that looks any better to you?

Something like this?

diff --git a/arch/arm/include/asm/kexec-internal.h 
b/arch/arm/include/asm/kexec-internal.h
new file mode 100644
index ..ecc2322db7aa
--- /dev/null
+++ b/arch/arm/include/asm/kexec-internal.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ARM_KEXEC_INTERNAL_H
+#define _ARM_KEXEC_INTERNAL_H
+
+struct kexec_relocate_data {
+   unsigned long kexec_start_address;
+   unsigned long kexec_indirection_page;
+   unsigned long kexec_mach_type;
+   unsigned long kexec_r2;
+};
+
+#endif
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index a1570c8bab25..be8050b0c3df 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -170,5 +171,9 @@ int main(void)
   DEFINE(MPU_RGN_PRBAR,offsetof(struct mpu_rgn, prbar));
   DEFINE(MPU_RGN_PRLAR,offsetof(struct mpu_rgn, prlar));
 #endif
+  DEFINE(KEXEC_START_ADDR, offsetof(struct kexec_relocate_data, 
kexec_start_address));
+  DEFINE(KEXEC_INDIR_PAGE, offsetof(struct kexec_relocate_data, 
kexec_indirection_page));
+  DEFINE(KEXEC_MACH_TYPE,  offsetof(struct kexec_relocate_data, 
kexec_mach_type));
+  DEFINE(KEXEC_R2, offsetof(struct kexec_relocate_data, kexec_r2));
   return 0; 
 }
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 5d84ad333f05..2b09dad7935e 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -22,11 +23,6 @@
 extern void relocate_new_kernel(void);
 extern const unsigned int relocate_new_kernel_size;
 
-extern unsigned long kexec_start_address;
-extern unsigned long kexec_indirection_page;
-extern unsigned long kexec_mach_type;
-extern unsigned long kexec_boot_atags;
-
 static atomic_t waiting_for_crash_ipi;
 
 /*
@@ -159,6 +155,7 @@ void (*kexec_reinit)(void);
 void machine_kexec(struct kimage *image)
 {
unsigned long page_list, reboot_entry_phys;
+   struct kexec_relocate_data *data;
void (*reboot_entry)(void);
void *reboot_code_buffer;
 
@@ -174,18 +171,17 @@ void machine_kexec(struct kimage *image)
 
reboot_code_buffer = page_address(image->control_code_page);
 
-   /* Prepare parameters for reboot_code_buffer*/
-   set_kernel_text_rw();
-   kexec_start_address = image->start;
-   kexec_indirection_page = page_list;
-   kexec_mach_type = machine_arch_type;
-   kexec_boot_atags = image->arch.kernel_r2;
-
/* copy our kernel relocation code to the control code page */
reboot_entry = fncpy(reboot_code_buffer,
 _new_kernel,
 relocate_new_kernel_size);
 
+   data = reboot_code_buffer + relocate_new_kernel_size;
+   data->kexec_start_address = image->start;
+   data->kexec_indirection_page = page_list;
+   data->kexec_mach_type = machine_arch_type;
+   data->kexec_r2 = image->arch.kernel_r2;
+
/* get the identity mapping physical address for the reboot code */
reboot_entry_phys = virt_to_idmap(reboot_entry);
 
diff --git a/arch/arm/kernel/relocate_kernel.S 
b/arch/arm/kernel/relocate_kernel.S
index 72a08786e16e..218d524360fc 100644
--- a/arch/arm/kernel/relocate_kernel.S
+++ b/arch/arm/kernel/relocate_kernel.S
@@ -5,14 +5,16 @@
 
 #include 
 #include 
+#include 
 #include 
 
.align  3   /* not needed for this code, but keeps fncpy() happy */
 
 ENTRY(relocate_new_kernel)
 
-   ldr r0,kexec_indirection_page
-   ldr r1,kexec_start_address
+   adr r7, relocate_new_kernel_end
+   ldr r0, [r7, #KEXEC_INDIR_PAGE]
+   ldr r1, [r7, #KEXEC_START_ADDR]
 
/*
 * If there is no indirection page (we are doing crashdumps)
@@ -57,34 +59,16 @@ ENTRY(relocate_new_kernel)
 
 2:
/* Jump to relocated kernel */
-   mov lr,r1
-   mov r0,#0
-   ldr 

Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Mark Rutland
Hi,

On Mon, Feb 01, 2021 at 02:39:46PM +, Giancarlo Ferrari wrote:
> On Mon, Feb 01, 2021 at 12:47:20PM +, Mark Rutland wrote:
> > On Mon, Feb 01, 2021 at 12:44:56AM +, Giancarlo Ferrari wrote:
> > > machine_kexec() need to set rw permission in text and rodata sections
> > > to assign some variables (e.g. kexec_start_address). To do that at
> > > the end (after flushing pdm in memory, etc.) it needs to invalidate
> > > TLB [section] entries.
> > 
> > It'd be worth noting explicitly that set_kernel_text_rw() alters
> > current->active_mm...
> > 
> > > If during the TLB invalidation an interrupt occours, which might cause
> > > a context switch, there is the risk to inject invalid TLBs, with ro
> > > permissions.
> > 
> > ... which is why if there's a context switch things can go wrong, since
> > active_mm isn't stable, and so it's possible that set_kernel_text_rw()
> > updates multiple tables, none of which might be the active table at the
> > point we try to make an access.
> 
> Maybe the behaviour causing issue is not completely clear to me, and I do
> apologize for that (moreover I haven't eougth debug capabilities).

I think we're in rough agreement that the issue is likely related to the
context switch, but our understanding of the specifics differs (and I
think we're missing a detail here).

> However, current-active_mm is switched among context switches. Correct ?

In some cases current->active_mm is not switched, and can be inherited
over a context switch. When switching to a user task, we always switch
to its mm (which becomes the active_mm), but when switching to a kthread
we retain the previous task's mm as the active_mm as part of the lazy
context switch.

So while a kthread is preemptible, its active_mm (and active ASID) can
change under its feet. That could happen anywhere while the task is
preemptible, e.g. in the middle of set_kernel_text_rw(), or
mid-modification to the kexec variables.

> So, in principle, the invalidation, if stopped, is carried on where it
> left.

That's true so long as all the processes we switch between share the
same leaf tables for the region we're altering. If not, then the lazy
context switch means that those tables can change under our feet.

I believe the tables mapping the kernel text are shared by all threads,
and if so this _should_ be true. Russell might be able to confirm that
or correct me if I have that wrong.

> I thought the issue was that the PageTable entry for the section 0x8010_
> is global, thus not indexed by ASID (Address Space ID). By the fact that each
> process has its own version of that entry, is the cause of the issue, as the
> schedule process might bringing a spurious entry (with ro permission) in the
> MMU cache.

The TLB invalidation performed under set_kernel_text_rw() affects all
ASIDs on the current CPU, so there shouldn't be any stale RO TLB entries
to hit unless the kthread is migrated to another CPU.

> If the entry is not global holds the ASID, and the issue cannot happen.

I don't think that's true, since switching to a different active_mm
would also change ASID, and we'd have no additional guarantee.

Thanks,
Mark.


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Giancarlo Ferrari
Hi,

On Mon, Feb 01, 2021 at 12:47:20PM +, Mark Rutland wrote:
> On Mon, Feb 01, 2021 at 12:44:56AM +, Giancarlo Ferrari wrote:
> > machine_kexec() need to set rw permission in text and rodata sections
> > to assign some variables (e.g. kexec_start_address). To do that at
> > the end (after flushing pdm in memory, etc.) it needs to invalidate
> > TLB [section] entries.
> 
> It'd be worth noting explicitly that set_kernel_text_rw() alters
> current->active_mm...
> 
> > If during the TLB invalidation an interrupt occours, which might cause
> > a context switch, there is the risk to inject invalid TLBs, with ro
> > permissions.
> 
> ... which is why if there's a context switch things can go wrong, since
> active_mm isn't stable, and so it's possible that set_kernel_text_rw()
> updates multiple tables, none of which might be the active table at the
> point we try to make an access.
> 

Maybe the behaviour causing issue is not completely clear to me, and I do
apologize for that (moreover I haven't eougth debug capabilities).
However, current-active_mm is switched among context switches. Correct ?
So, in principle, the invalidation, if stopped, is carried on where it
left.

I thought the issue was that the PageTable entry for the section 0x8010_
is global, thus not indexed by ASID (Address Space ID). By the fact that each
process has its own version of that entry, is the cause of the issue, as the
schedule process might bringing a spurious entry (with ro permission) in the
MMU cache.

If the entry is not global holds the ASID, and the issue cannot happen.

Please note that this behaviour was tested on a armv7 arch board.

> It would be nice to spell that out rather than saying "invalid TLBs".
> 
> We could disable preemption to prevent that, which is possibly better
> than disabling interrupts.
> 
> Overall, it would be much better to avoid having to mess with the kernel
> page tables. So rather than going:
> 
> 1. mark kernel RW
> 2. alter variables in reloc code
> 3. copy reloc code into buffer
> 4. branch to buffer
> 
> ... we should be able to go:
> 
> 1. copy reloc code into buffer
> 2. alter variables in copy of reloc code
> 3. branch to buffer
> 
> ... which would avoid this class of problem too.
> 
> Thanks,
> Mark.

Thanks,


GF


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Mark Rutland
On Mon, Feb 01, 2021 at 01:03:45PM +, Russell King - ARM Linux admin wrote:
> On Mon, Feb 01, 2021 at 12:47:20PM +, Mark Rutland wrote:
> > 1. copy reloc code into buffer
> > 2. alter variables in copy of reloc code
> > 3. branch to buffer
> > 
> > ... which would avoid this class of problem too.
> 
> Yep, slightly messy to do though:
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index 5d84ad333f05..6058e0d3a40d 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -174,18 +174,27 @@ void machine_kexec(struct kimage *image)
>  
>   reboot_code_buffer = page_address(image->control_code_page);
>  
> - /* Prepare parameters for reboot_code_buffer*/
> - set_kernel_text_rw();
> - kexec_start_address = image->start;
> - kexec_indirection_page = page_list;
> - kexec_mach_type = machine_arch_type;
> - kexec_boot_atags = image->arch.kernel_r2;
> -
>   /* copy our kernel relocation code to the control code page */
>   reboot_entry = fncpy(reboot_code_buffer,
>_new_kernel,
>relocate_new_kernel_size);
>  
> +#define set(what, val) \
> + do { \
> + uintptr_t __funcp_address; \
> + int __offset; \
> + void *__ptr; \
> + asm("" : "=r" (__funcp_address) : "0" (_new_kernel)); \
> + __offset = (uintptr_t)&(what) - (__funcp_address & ~1); \
> + __ptr = reboot_code_buffer + __offset; \
> + *(__typeof__(&(what)))__ptr = val; \
> + } while (0)
> +
> + set(kexec_start_address, image->start);
> + set(kexec_indirection_page, page_list);
> + set(kexec_mach_type, machine_arch_type);
> + set(kexec_boot_atags, image->arch.kernel_r2);

We could simplify this slightly if we moved the kexec_& variables into a
struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region
reserved in the asm), then here we could do something like:

static struct kexec_vars *kexec_buffer_vars(void *buffer)
{
unsigned long code = ((unisigned long)relocate_new_kernel) & ~1;
unsigned long vars - (unsigned long)relocate_vars;
unsigned long offset = vars - code;

return buffer + offset;
}

... and in machine_kexec() do:

struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer);

kv->start_address = image->start;
kv->indirection_page = page_list;
kv->mach_type = machine-arch_type;
kv->boot_atags = arch.kernel_r2;

... if that looks any better to you?

Mark.


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Russell King - ARM Linux admin
On Mon, Feb 01, 2021 at 12:47:20PM +, Mark Rutland wrote:
> 1. copy reloc code into buffer
> 2. alter variables in copy of reloc code
> 3. branch to buffer
> 
> ... which would avoid this class of problem too.

Yep, slightly messy to do though:

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 5d84ad333f05..6058e0d3a40d 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -174,18 +174,27 @@ void machine_kexec(struct kimage *image)
 
reboot_code_buffer = page_address(image->control_code_page);
 
-   /* Prepare parameters for reboot_code_buffer*/
-   set_kernel_text_rw();
-   kexec_start_address = image->start;
-   kexec_indirection_page = page_list;
-   kexec_mach_type = machine_arch_type;
-   kexec_boot_atags = image->arch.kernel_r2;
-
/* copy our kernel relocation code to the control code page */
reboot_entry = fncpy(reboot_code_buffer,
 _new_kernel,
 relocate_new_kernel_size);
 
+#define set(what, val) \
+   do { \
+   uintptr_t __funcp_address; \
+   int __offset; \
+   void *__ptr; \
+   asm("" : "=r" (__funcp_address) : "0" (_new_kernel)); \
+   __offset = (uintptr_t)&(what) - (__funcp_address & ~1); \
+   __ptr = reboot_code_buffer + __offset; \
+   *(__typeof__(&(what)))__ptr = val; \
+   } while (0)
+
+   set(kexec_start_address, image->start);
+   set(kexec_indirection_page, page_list);
+   set(kexec_mach_type, machine_arch_type);
+   set(kexec_boot_atags, image->arch.kernel_r2);
+
/* get the identity mapping physical address for the reboot code */
reboot_entry_phys = virt_to_idmap(reboot_entry);
 
-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Mark Rutland
On Mon, Feb 01, 2021 at 12:44:56AM +, Giancarlo Ferrari wrote:
> machine_kexec() need to set rw permission in text and rodata sections
> to assign some variables (e.g. kexec_start_address). To do that at
> the end (after flushing pdm in memory, etc.) it needs to invalidate
> TLB [section] entries.

It'd be worth noting explicitly that set_kernel_text_rw() alters
current->active_mm...

> If during the TLB invalidation an interrupt occours, which might cause
> a context switch, there is the risk to inject invalid TLBs, with ro
> permissions.

... which is why if there's a context switch things can go wrong, since
active_mm isn't stable, and so it's possible that set_kernel_text_rw()
updates multiple tables, none of which might be the active table at the
point we try to make an access.

It would be nice to spell that out rather than saying "invalid TLBs".

We could disable preemption to prevent that, which is possibly better
than disabling interrupts.

Overall, it would be much better to avoid having to mess with the kernel
page tables. So rather than going:

1. mark kernel RW
2. alter variables in reloc code
3. copy reloc code into buffer
4. branch to buffer

... we should be able to go:

1. copy reloc code into buffer
2. alter variables in copy of reloc code
3. branch to buffer

... which would avoid this class of problem too.

Thanks,
Mark.

> When trying to assign .text labels, this lead to the following:
> 
>  Unable to handle kernel paging request at virtual address 80112f38
>  pgd = fd7ef03e
>  [80112f38] *pgd=0001141e(bad)
>  Internal error: Oops: 80d [#1] PREEMPT SMP ARM
>  ...
> 
> Signed-off-by: Giancarlo Ferrari 
> ---
>  arch/arm/kernel/machine_kexec.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index 5d84ad3..23e8816 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -174,6 +174,13 @@ void machine_kexec(struct kimage *image)
>  
>   reboot_code_buffer = page_address(image->control_code_page);
>  
> + /*
> +  * If below part is not atomic TLB entries might be corrupted after TLB
> +  * invalidation, which leads to Data Abort in .text variable assignment
> +  */
> + raw_local_irq_disable();
> + local_fiq_disable();
> +
>   /* Prepare parameters for reboot_code_buffer*/
>   set_kernel_text_rw();
>   kexec_start_address = image->start;
> @@ -181,6 +188,9 @@ void machine_kexec(struct kimage *image)
>   kexec_mach_type = machine_arch_type;
>   kexec_boot_atags = image->arch.kernel_r2;
>  
> + local_fiq_enable();
> + raw_local_irq_enable();
> +
>   /* copy our kernel relocation code to the control code page */
>   reboot_entry = fncpy(reboot_code_buffer,
>_new_kernel,
> -- 
> 2.7.4
> 


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Russell King - ARM Linux admin
I wish others who know this code would get involved, and such stuff
wasn't left to me to research and work out whether a patch is correct
or not.

On Mon, Feb 01, 2021 at 12:44:56AM +, Giancarlo Ferrari wrote:
> machine_kexec() need to set rw permission in text and rodata sections
> to assign some variables (e.g. kexec_start_address). To do that at
> the end (after flushing pdm in memory, etc.) it needs to invalidate
> TLB [section] entries.
> 
> If during the TLB invalidation an interrupt occours, which might cause
> a context switch, there is the risk to inject invalid TLBs, with ro
> permissions.
> 
> When trying to assign .text labels, this lead to the following:
> 
>  Unable to handle kernel paging request at virtual address 80112f38
>  pgd = fd7ef03e
>  [80112f38] *pgd=0001141e(bad)
>  Internal error: Oops: 80d [#1] PREEMPT SMP ARM
>  ...
> 
> Signed-off-by: Giancarlo Ferrari 

I don't know this code very well, but I don't think this patch is
correct. What happens if we have CRASH_DUMP enabled, and we enter this
function with IRQs already disabled? Should we really be re-enabling
IRQs?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-02-01 Thread Giancarlo Ferrari
Hi all,

On Tue, Jan 12, 2021 at 04:49:06PM +, Giancarlo Ferrari wrote:
> machine_kexec() need to set rw permission in text and rodata sections
> to assign some variables (e.g. kexec_start_address). To do that at
> the end (after flushing pdm in memory, inv D-Cache, etc.) it needs to
> invalidate TLB [section] entries.
> 
> If during the TLB invalidation an interrupt occours, which might cause
> a context switch, there is the risk to inject invalid TLBs, with ro
> permissions.
> 
> When trying to assign .text labels, this lead to the following issue:
> 
> "Unable to handle kernel paging request at virtual address "
> 
> with FSR 0x80d.
> 
> Signed-off-by: Giancarlo Ferrari 
> ---
>  arch/arm/kernel/machine_kexec.c | 10 ++
>  1 file changed, 10 insertions(+)

has been re-submitted here:

https://lore.kernel.org/lkml/1612140296-12546-1-git-send-email-giancarlo.ferrar...@gmail.com/


GF


[PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-01-31 Thread Giancarlo Ferrari
machine_kexec() need to set rw permission in text and rodata sections
to assign some variables (e.g. kexec_start_address). To do that at
the end (after flushing pdm in memory, etc.) it needs to invalidate
TLB [section] entries.

If during the TLB invalidation an interrupt occours, which might cause
a context switch, there is the risk to inject invalid TLBs, with ro
permissions.

When trying to assign .text labels, this lead to the following:

 Unable to handle kernel paging request at virtual address 80112f38
 pgd = fd7ef03e
 [80112f38] *pgd=0001141e(bad)
 Internal error: Oops: 80d [#1] PREEMPT SMP ARM
 ...

Signed-off-by: Giancarlo Ferrari 
---
 arch/arm/kernel/machine_kexec.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 5d84ad3..23e8816 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -174,6 +174,13 @@ void machine_kexec(struct kimage *image)
 
reboot_code_buffer = page_address(image->control_code_page);
 
+   /*
+* If below part is not atomic TLB entries might be corrupted after TLB
+* invalidation, which leads to Data Abort in .text variable assignment
+*/
+   raw_local_irq_disable();
+   local_fiq_disable();
+
/* Prepare parameters for reboot_code_buffer*/
set_kernel_text_rw();
kexec_start_address = image->start;
@@ -181,6 +188,9 @@ void machine_kexec(struct kimage *image)
kexec_mach_type = machine_arch_type;
kexec_boot_atags = image->arch.kernel_r2;
 
+   local_fiq_enable();
+   raw_local_irq_enable();
+
/* copy our kernel relocation code to the control code page */
reboot_entry = fncpy(reboot_code_buffer,
 _new_kernel,
-- 
2.7.4



[PATCH] ARM: kexec: Fix panic after TLB are invalidated

2021-01-12 Thread Giancarlo Ferrari
machine_kexec() need to set rw permission in text and rodata sections
to assign some variables (e.g. kexec_start_address). To do that at
the end (after flushing pdm in memory, inv D-Cache, etc.) it needs to
invalidate TLB [section] entries.

If during the TLB invalidation an interrupt occours, which might cause
a context switch, there is the risk to inject invalid TLBs, with ro
permissions.

When trying to assign .text labels, this lead to the following issue:

"Unable to handle kernel paging request at virtual address "

with FSR 0x80d.

Signed-off-by: Giancarlo Ferrari 
---
 arch/arm/kernel/machine_kexec.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 76300f3..bbe912d 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -176,6 +176,13 @@ void machine_kexec(struct kimage *image)
 
reboot_code_buffer = page_address(image->control_code_page);
 
+   /*
+* If below part is not atomic TLB entries might be corrupted after TLB
+* invalidation, which leads to Data Abort in .text variable assignment
+*/
+   raw_local_irq_disable();
+   local_fiq_disable();
+
/* Prepare parameters for reboot_code_buffer*/
set_kernel_text_rw();
kexec_start_address = image->start;
@@ -183,6 +190,9 @@ void machine_kexec(struct kimage *image)
kexec_mach_type = machine_arch_type;
kexec_boot_atags = image->arch.kernel_r2;
 
+   local_fiq_enable();
+   raw_local_irq_enable();
+
/* copy our kernel relocation code to the control code page */
reboot_entry = fncpy(reboot_code_buffer,
 _new_kernel,
-- 
2.7.4