Re: Enable/Disable of ftrace events crashes kernel

2019-07-11 Thread Jan Kiszka via Xenomai
On 11.07.19 22:48, Richard Weinberger wrote:
> On Thu, Jul 11, 2019 at 8:30 PM Jan Kiszka  wrote:
>>
>> On 11.07.19 12:25, Richard Weinberger wrote:
>>> On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka  wrote:
 Can't reproduce so far, even with a while-true loop. Can you share your 
 .config?
>>>
>>> Sure, see attachment.
>>>
>>
>> This seems to fix the issue here:
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 119fd66d111e..8f647c208cf2 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -997,8 +997,8 @@ apicinterrupt IRQ_WORK_VECTOR   
>> irq_work_interrupt  smp_irq_work_interrupt
>>  \skip_label:
>> UNWIND_HINT_REGS
>> DISABLE_INTERRUPTS(CLBR_ANY)
>> -   testl   %ebx, %ebx  /* %ebx: return to kernel mode */
>> -   jnz retint_kernel_early
>> +   testb   $3, CS(%rsp)
>> +   jz  retint_kernel_early
>> jmp retint_user_early
>> .endif
>>  1001:
>>
>> Tests welcome!
> 
> With that change I can no longer trigger the crash.

Perfect.

> Can you please give more context? I'd like to understand the problem.
> 

We were basing the decision whether to switch GS on return or not on a stale
register (ebx). That register used to contain the information, but that changed
with "x86/entry/64: Remove %ebx handling from error_entry/exit". This caused CPU
state corruptions under certain conditions, apparently only when dealing with
#DB exceptions, not with the way more frequent #PF.

The issue is also present in 4.14, but in 4.4 and the unmaintained 4.9 as I
first thought.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Enable/Disable of ftrace events crashes kernel

2019-07-11 Thread Richard Weinberger via Xenomai
On Thu, Jul 11, 2019 at 8:30 PM Jan Kiszka  wrote:
>
> On 11.07.19 12:25, Richard Weinberger wrote:
> > On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka  wrote:
> >> Can't reproduce so far, even with a while-true loop. Can you share your 
> >> .config?
> >
> > Sure, see attachment.
> >
>
> This seems to fix the issue here:
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 119fd66d111e..8f647c208cf2 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -997,8 +997,8 @@ apicinterrupt IRQ_WORK_VECTOR   
> irq_work_interrupt  smp_irq_work_interrupt
>  \skip_label:
> UNWIND_HINT_REGS
> DISABLE_INTERRUPTS(CLBR_ANY)
> -   testl   %ebx, %ebx  /* %ebx: return to kernel mode */
> -   jnz retint_kernel_early
> +   testb   $3, CS(%rsp)
> +   jz  retint_kernel_early
> jmp retint_user_early
> .endif
>  1001:
>
> Tests welcome!

With that change I can no longer trigger the crash.
Can you please give more context? I'd like to understand the problem.

-- 
Thanks,
//richard



[PATCH v2 5/6] ipipe: Activate IRQ in ipipe_enable_irq

2019-07-11 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

Likely needed since c942cee46bba which split enabling and startup.

This fixes unpopulated vectors in the IOAPIC on x86 at least, possibly
more.

Signed-off-by: Jan Kiszka 
---
 kernel/irq/chip.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 65d345b727be..22386e509f68 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1087,6 +1087,7 @@ int ipipe_enable_irq(unsigned int irq)
struct irq_desc *desc;
struct irq_chip *chip;
unsigned long flags;
+   int err;
 
desc = irq_to_desc(irq);
if (desc == NULL)
@@ -1098,6 +1099,10 @@ int ipipe_enable_irq(unsigned int irq)
 
ipipe_root_only();
 
+   err = irq_activate(desc);
+   if (err)
+   return err;
+
raw_spin_lock_irqsave(>lock, flags);
if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
-- 
2.16.4




[PATCH v2 6/6] ipipe: Let ipipe_set_irq_affinity return an error

2019-07-11 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

x86 may generate one, so change the signature.

Signed-off-by: Jan Kiszka 
---
 include/linux/ipipe.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipipe.h b/include/linux/ipipe.h
index f7c07a207093..45e8c87bff93 100644
--- a/include/linux/ipipe.h
+++ b/include/linux/ipipe.h
@@ -433,12 +433,12 @@ void ipipe_prepare_panic(void);
 #ifndef ipipe_smp_p
 #define ipipe_smp_p (1)
 #endif
-void ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask);
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask);
 void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask);
 #else  /* !CONFIG_SMP */
 #define ipipe_smp_p (0)
 static inline
-void ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask) { }
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask) { return 0; }
 static inline void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask) { }
 static inline void ipipe_disable_smp(void) { }
 #endif /* CONFIG_SMP */
-- 
2.16.4




[PATCH v2 0/6] ipipe-noarch: IRQ tracing, enabling, and lockdep fixes/cleanups

2019-07-11 Thread Jan Kiszka via Xenomai
This included the patches from v1, the addition (patch 3), and the new
patches needed to add return codes and irq enabling to some services.

Jan

Jan Kiszka (6):
  ipipe: Restore trace_hardirqs_on_virt_caller
  ipipe: lockdep: Remove duplicate context checks
  ipipe: Add missing include for ipipe_root_only
  ipipe: Let ipipe_enable_irq return an error code
  ipipe: Activate IRQ in ipipe_enable_irq
  ipipe: Let ipipe_set_irq_affinity return an error

 include/linux/ipipe.h   |  6 +++---
 include/linux/kernel.h  |  1 +
 kernel/irq/chip.c   | 17 -
 kernel/locking/lockdep.c|  6 --
 kernel/trace/trace_preemptirq.c | 11 +++
 5 files changed, 27 insertions(+), 14 deletions(-)

-- 
2.16.4




[PATCH v2 3/6] ipipe: Add missing include for ipipe_root_only

2019-07-11 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

Breaks in non-debug builds otherwise, e.g.
https://travis-ci.com/xenomai-ci/xenomai/jobs/212725223

Signed-off-by: Jan Kiszka 
---
 include/linux/kernel.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 50b1e0c878e0..edd37052e585 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define USHRT_MAX  ((u16)(~0U))
-- 
2.16.4




[PATCH v2 4/6] ipipe: Let ipipe_enable_irq return an error code

2019-07-11 Thread Jan Kiszka via Xenomai
From: Jan Kiszka 

It's time to let ipipe_enable_irq return a proper error as it will gain
another function that may fail. Drop the WARN_ON_ONCE in favor of that.

Signed-off-by: Jan Kiszka 
---
 include/linux/ipipe.h |  2 +-
 kernel/irq/chip.c | 12 +++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/ipipe.h b/include/linux/ipipe.h
index b8a80eeade4f..f7c07a207093 100644
--- a/include/linux/ipipe.h
+++ b/include/linux/ipipe.h
@@ -475,7 +475,7 @@ static inline struct ipipe_threadinfo 
*ipipe_current_threadinfo(void)
 
 #define ipipe_task_threadinfo(p) (_thread_info(p)->ipipe_data)
 
-void ipipe_enable_irq(unsigned int irq);
+int ipipe_enable_irq(unsigned int irq);
 
 static inline void ipipe_disable_irq(unsigned int irq)
 {
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6c279e065879..65d345b727be 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1082,7 +1082,7 @@ __fixup_irq_handler(struct irq_desc *desc, 
irq_flow_handler_t handle, int is_cha
return handle;
 }
 
-void ipipe_enable_irq(unsigned int irq)
+int ipipe_enable_irq(unsigned int irq)
 {
struct irq_desc *desc;
struct irq_chip *chip;
@@ -1090,7 +1090,7 @@ void ipipe_enable_irq(unsigned int irq)
 
desc = irq_to_desc(irq);
if (desc == NULL)
-   return;
+   return -EINVAL;
 
chip = irq_desc_get_chip(desc);
 
@@ -1105,16 +1105,18 @@ void ipipe_enable_irq(unsigned int irq)
}
raw_spin_unlock_irqrestore(>lock, flags);
 
-   return;
+   return 0;
}
 
-   if (WARN_ON_ONCE(chip->irq_enable == NULL && chip->irq_unmask == NULL))
-   return;
+   if (chip->irq_enable == NULL && chip->irq_unmask == NULL)
+   return -ENOSYS;
 
if (chip->irq_enable)
chip->irq_enable(>irq_data);
else
chip->irq_unmask(>irq_data);
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(ipipe_enable_irq);
 
-- 
2.16.4




Re: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up

2019-07-11 Thread Jan Kiszka via Xenomai
On 11.07.19 18:48, Philippe Gerum wrote:
> On 7/11/19 6:34 PM, Jan Kiszka wrote:
>> On 11.07.19 18:00, Philippe Gerum wrote:
>>> On 7/11/19 5:09 PM, Jan Kiszka wrote:
 On 11.07.19 16:40, Philippe Gerum wrote:
> On 7/5/19 9:38 AM, Jan Kiszka wrote:
>
>> This addresses it on x86 for me:
>>
>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> index 6c279e065879..d503b875f086 100644
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -1099,7 +1099,8 @@ void ipipe_enable_irq(unsigned int irq)
>>  ipipe_root_only();
>>  
>>  raw_spin_lock_irqsave(>lock, flags);
>> -if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
>> +if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP &&
>> +!WARN_ON(irq_activate(desc))) {
>>  desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
>>  chip->irq_startup(>irq_data);
>>  }
>>
>> Probably upstream commit c942cee46bba (genirq: Separate activation and
>> startup) makes this necessary.
>>
>> Philippe, I suppose this is either not essential on arm, or external
>> interrupts weren't tested yet, like I missed on x86. Fine to make this a
>> noarch patch?
>
> No issue. I've not been working on/with the I-pipe but Dovetail instead
> in the past weeks, so testing of 4.19 is still very limited on my end. I
> have several full-fledged real world ARM*-based application systems to
> improve this, just need to find a way to squeeze this work in.
>

 One question remains, though: Should we just do the WARN_ON() thing within 
 the
 existing API or change that API to return a potential error? Variant one I 
 have
 ready, but I have no feeling for the risk that there is actually an error.

 The same goes for ipipe_set_irq_affinity that will require the activation 
 as
 well but cannot return an error so far.

>>>
>>> Moving from void to non-void would be backward-compatible provided we
>>> don't tag these services as __must_check, so propagating the status
>>> would make sense.
>>
>> ...but it would also be risky as we then had no reporting of an error. If we
>> change the API, I would do that in way users (namely drivers) have a chance 
>> to
>> become aware of this change.
>>
> 
> Coupling error propagation and WARN_ON (e.g. in pipeline debug mode)
> should not be a problem.
> 

Evaluating the error code of ipipe_enable_irq and ipipe_set_irq_affinity in
Xenomai will mean requiring a minimal ipipe core version for 4.19. So I will
push the burden of dealing with unprepared drivers to Xenomai (WARN_ON there)
and rather enforce that ipipe update. Luckily, we have no release out yet that
support 4.19.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Enable/Disable of ftrace events crashes kernel

2019-07-11 Thread Jan Kiszka via Xenomai
On 11.07.19 20:30, Jan Kiszka wrote:
> On 11.07.19 12:25, Richard Weinberger wrote:
>> On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka  wrote:
>>> Can't reproduce so far, even with a while-true loop. Can you share your 
>>> .config?
>>
>> Sure, see attachment.
>>
> 
> This seems to fix the issue here:
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 119fd66d111e..8f647c208cf2 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -997,8 +997,8 @@ apicinterrupt IRQ_WORK_VECTOR 
> irq_work_interrupt  smp_irq_work_interrupt
>  \skip_label:
>   UNWIND_HINT_REGS
>   DISABLE_INTERRUPTS(CLBR_ANY)
> - testl   %ebx, %ebx  /* %ebx: return to kernel mode */
> - jnz retint_kernel_early
> + testb   $3, CS(%rsp)
> + jz  retint_kernel_early
>   jmp retint_user_early
>   .endif
>  1001:
> 
> Tests welcome!
> 
> Interestingly, 4.14 should have the same problem, but I failed to
> reproduce there so far.

Uhh, it's a regression in all our x86 stable trees, due to a backport of an
upstream commit. The above is definitely correct and hopefully also the fix for
this issue.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: Enable/Disable of ftrace events crashes kernel

2019-07-11 Thread Jan Kiszka via Xenomai
On 11.07.19 12:25, Richard Weinberger wrote:
> On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka  wrote:
>> Can't reproduce so far, even with a while-true loop. Can you share your 
>> .config?
> 
> Sure, see attachment.
> 

This seems to fix the issue here:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 119fd66d111e..8f647c208cf2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -997,8 +997,8 @@ apicinterrupt IRQ_WORK_VECTOR   
irq_work_interrupt  smp_irq_work_interrupt
 \skip_label:
UNWIND_HINT_REGS
DISABLE_INTERRUPTS(CLBR_ANY)
-   testl   %ebx, %ebx  /* %ebx: return to kernel mode */
-   jnz retint_kernel_early
+   testb   $3, CS(%rsp)
+   jz  retint_kernel_early
jmp retint_user_early
.endif
 1001:

Tests welcome!

Interestingly, 4.14 should have the same problem, but I failed to
reproduce there so far.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up

2019-07-11 Thread Philippe Gerum via Xenomai
On 7/11/19 6:34 PM, Jan Kiszka wrote:
> On 11.07.19 18:00, Philippe Gerum wrote:
>> On 7/11/19 5:09 PM, Jan Kiszka wrote:
>>> On 11.07.19 16:40, Philippe Gerum wrote:
 On 7/5/19 9:38 AM, Jan Kiszka wrote:

> This addresses it on x86 for me:
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 6c279e065879..d503b875f086 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1099,7 +1099,8 @@ void ipipe_enable_irq(unsigned int irq)
>   ipipe_root_only();
>  
>   raw_spin_lock_irqsave(>lock, flags);
> - if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
> + if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP &&
> + !WARN_ON(irq_activate(desc))) {
>   desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
>   chip->irq_startup(>irq_data);
>   }
>
> Probably upstream commit c942cee46bba (genirq: Separate activation and
> startup) makes this necessary.
>
> Philippe, I suppose this is either not essential on arm, or external
> interrupts weren't tested yet, like I missed on x86. Fine to make this a
> noarch patch?

 No issue. I've not been working on/with the I-pipe but Dovetail instead
 in the past weeks, so testing of 4.19 is still very limited on my end. I
 have several full-fledged real world ARM*-based application systems to
 improve this, just need to find a way to squeeze this work in.

>>>
>>> One question remains, though: Should we just do the WARN_ON() thing within 
>>> the
>>> existing API or change that API to return a potential error? Variant one I 
>>> have
>>> ready, but I have no feeling for the risk that there is actually an error.
>>>
>>> The same goes for ipipe_set_irq_affinity that will require the activation as
>>> well but cannot return an error so far.
>>>
>>
>> Moving from void to non-void would be backward-compatible provided we
>> don't tag these services as __must_check, so propagating the status
>> would make sense.
> 
> ...but it would also be risky as we then had no reporting of an error. If we
> change the API, I would do that in way users (namely drivers) have a chance to
> become aware of this change.
> 

Coupling error propagation and WARN_ON (e.g. in pipeline debug mode)
should not be a problem.

-- 
Philippe.



Re: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up

2019-07-11 Thread Philippe Gerum via Xenomai
On 7/11/19 5:09 PM, Jan Kiszka wrote:
> On 11.07.19 16:40, Philippe Gerum wrote:
>> On 7/5/19 9:38 AM, Jan Kiszka wrote:
>>
>>> This addresses it on x86 for me:
>>>
>>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>>> index 6c279e065879..d503b875f086 100644
>>> --- a/kernel/irq/chip.c
>>> +++ b/kernel/irq/chip.c
>>> @@ -1099,7 +1099,8 @@ void ipipe_enable_irq(unsigned int irq)
>>> ipipe_root_only();
>>>  
>>> raw_spin_lock_irqsave(>lock, flags);
>>> -   if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
>>> +   if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP &&
>>> +   !WARN_ON(irq_activate(desc))) {
>>> desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
>>> chip->irq_startup(>irq_data);
>>> }
>>>
>>> Probably upstream commit c942cee46bba (genirq: Separate activation and
>>> startup) makes this necessary.
>>>
>>> Philippe, I suppose this is either not essential on arm, or external
>>> interrupts weren't tested yet, like I missed on x86. Fine to make this a
>>> noarch patch?
>>
>> No issue. I've not been working on/with the I-pipe but Dovetail instead
>> in the past weeks, so testing of 4.19 is still very limited on my end. I
>> have several full-fledged real world ARM*-based application systems to
>> improve this, just need to find a way to squeeze this work in.
>>
> 
> One question remains, though: Should we just do the WARN_ON() thing within the
> existing API or change that API to return a potential error? Variant one I 
> have
> ready, but I have no feeling for the risk that there is actually an error.
> 
> The same goes for ipipe_set_irq_affinity that will require the activation as
> well but cannot return an error so far.
> 

Moving from void to non-void would be backward-compatible provided we
don't tag these services as __must_check, so propagating the status
would make sense.

-- 
Philippe.



Re: Question on deleting a locked RT_MUTEX

2019-07-11 Thread Philippe Gerum via Xenomai
On 7/11/19 5:34 PM, Richard Weinberger wrote:
> Philippe,
> 
> On Thu, Jul 11, 2019 at 4:30 PM Philippe Gerum  wrote:
>>> If I have a mutex m1 and a misbehaving RT_TASK rt1 which exits right after
>>> acquiring mutex m1, what is the recommended way to get the lock back into a
>>> well defined state?
>>>
>>
>> Unfortunately, I see no option other than leaking the orphaned mutex and
>> re-creating a new one on the same descriptor with rt_mutex_create().
> 
> Will it leak on the kernel side or "just" in the userspace process?
> As long the application does not badly deadlock I think we can recover 
> somehow.
> 

It will leak on both sides.

-- 
Philippe.



Re: Question on deleting a locked RT_MUTEX

2019-07-11 Thread Richard Weinberger via Xenomai
Philippe,

On Thu, Jul 11, 2019 at 4:30 PM Philippe Gerum  wrote:
> > If I have a mutex m1 and a misbehaving RT_TASK rt1 which exits right after
> > acquiring mutex m1, what is the recommended way to get the lock back into a
> > well defined state?
> >
>
> Unfortunately, I see no option other than leaking the orphaned mutex and
> re-creating a new one on the same descriptor with rt_mutex_create().

Will it leak on the kernel side or "just" in the userspace process?
As long the application does not badly deadlock I think we can recover somehow.

-- 
Thanks,
//richard



Re: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up

2019-07-11 Thread Jan Kiszka via Xenomai
On 11.07.19 16:40, Philippe Gerum wrote:
> On 7/5/19 9:38 AM, Jan Kiszka wrote:
> 
>> This addresses it on x86 for me:
>>
>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> index 6c279e065879..d503b875f086 100644
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -1099,7 +1099,8 @@ void ipipe_enable_irq(unsigned int irq)
>>  ipipe_root_only();
>>  
>>  raw_spin_lock_irqsave(>lock, flags);
>> -if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
>> +if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP &&
>> +!WARN_ON(irq_activate(desc))) {
>>  desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
>>  chip->irq_startup(>irq_data);
>>  }
>>
>> Probably upstream commit c942cee46bba (genirq: Separate activation and
>> startup) makes this necessary.
>>
>> Philippe, I suppose this is either not essential on arm, or external
>> interrupts weren't tested yet, like I missed on x86. Fine to make this a
>> noarch patch?
> 
> No issue. I've not been working on/with the I-pipe but Dovetail instead
> in the past weeks, so testing of 4.19 is still very limited on my end. I
> have several full-fledged real world ARM*-based application systems to
> improve this, just need to find a way to squeeze this work in.
> 

One question remains, though: Should we just do the WARN_ON() thing within the
existing API or change that API to return a potential error? Variant one I have
ready, but I have no feeling for the risk that there is actually an error.

The same goes for ipipe_set_irq_affinity that will require the activation as
well but cannot return an error so far.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up

2019-07-11 Thread Philippe Gerum via Xenomai
On 7/5/19 9:38 AM, Jan Kiszka wrote:

> This addresses it on x86 for me:
> 
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 6c279e065879..d503b875f086 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1099,7 +1099,8 @@ void ipipe_enable_irq(unsigned int irq)
>   ipipe_root_only();
>  
>   raw_spin_lock_irqsave(>lock, flags);
> - if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
> + if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP &&
> + !WARN_ON(irq_activate(desc))) {
>   desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
>   chip->irq_startup(>irq_data);
>   }
> 
> Probably upstream commit c942cee46bba (genirq: Separate activation and
> startup) makes this necessary.
> 
> Philippe, I suppose this is either not essential on arm, or external
> interrupts weren't tested yet, like I missed on x86. Fine to make this a
> noarch patch?

No issue. I've not been working on/with the I-pipe but Dovetail instead
in the past weeks, so testing of 4.19 is still very limited on my end. I
have several full-fledged real world ARM*-based application systems to
improve this, just need to find a way to squeeze this work in.

-- 
Philippe.



Re: Question on deleting a locked RT_MUTEX

2019-07-11 Thread Philippe Gerum via Xenomai
On 7/11/19 10:15 AM, Richard Weinberger via Xenomai wrote:
> Hi!
> 
> doc/asciidoc/MIGRATION.adoc states:
> - For consistency with the standard glibc implementation, deleting a
>   RT_MUTEX object in locked state is no longer a valid operation.
> 
> I'm not sure how this affects my Xenomai 2 application which is being
> ported to Xenomai 3.
> 
> If I have a mutex m1 and a misbehaving RT_TASK rt1 which exits right after
> acquiring mutex m1, what is the recommended way to get the lock back into a
> well defined state?
> 

Unfortunately, I see no option other than leaking the orphaned mutex and
re-creating a new one on the same descriptor with rt_mutex_create().

-- 
Philippe.



Re: Best way to detect if a filedescriptor is a cobalt filedescriptor (/socket)

2019-07-11 Thread Philippe Gerum via Xenomai
On 7/10/19 11:20 AM, Lange Norbert wrote:
> 
> 
>> -Original Message-
>> From: Jan Kiszka 
>> Sent: Mittwoch, 10. Juli 2019 08:13
>> To: Lange Norbert ; Xenomai
>> (xenomai@xenomai.org) ; Philippe Gerum
>> 
>> Subject: Re: Best way to detect if a filedescriptor is a cobalt 
>> filedescriptor
>> (/socket)
>>
>> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
>> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
>> ATTACHMENTS.
>>
>>
>> On 09.07.19 16:49, Lange Norbert via Xenomai wrote:
>>> Hi,
>>>
>>> I am opening a packetsocket, which is supposed to be realtime.
>>> Unfortunatly if the rtpacket (rtnet?) module is not loaded, then this will
>> just silently fall back to a linux packet socket. Then later demote thread
>> during accesses.
>>>
>>> How would I be able to detect this early during startup? I could
>> __STD(close) the descriptor and check the returncode for EBADF I suppose...
>>>
>>
>> Yeah, looks this is some feature we lost while embedding the RTDM file
>> descriptor range into the regular Linux space.
> 
> My scheme does not work either, __STD(close) seems to return 0 for the cobalt 
> fd,
> But doesn’t seem to do anything beyond that.
> 

The logic is the other way around: ask the Cobalt co-kernel to perform
some action on a (supposedly) Xenomai-registered fd, otherwise have
libcobalt fallback to glibc if the fd is not managed by the co-kernel.
That would be calling __RT(close), not __STD(close) first.

v2 of the patch switching from -EBADF to -ENODEV is needed to have this
working:
https://lab.xenomai.org/xenomai-rpm.git/commit/?h=for-upstream/next=0d7384acca2a71323be0d7412fd99aca6f0450d8

Which includes these bits:

diff --git a/kernel/cobalt/rtdm/fd.c b/kernel/cobalt/rtdm/fd.c
index 81943c2f3..06f462d1d 100644
--- a/kernel/cobalt/rtdm/fd.c
+++ b/kernel/cobalt/rtdm/fd.c
@@ -835,7 +835,7 @@ int rtdm_fd_close(int ufd, unsigned int magic)
if (magic != 0 && fd->magic != magic) {
 ebadf:
xnlock_put_irqrestore(_lock, s);
-   return -EBADF;
+   return -ENODEV;
}

set_compat_bit(fd);


Another other option is to use __RT(fcntl(fd, F_GETFD)): EINVAL would be
returned for this query on a Cobalt-managed fd, otherwise a success code
would denote a regular fd.

> Note: I am using Philippe's "cobalt: switch hand over status to -ENODEV for 
> non-RTDM fd" patch,
> so potential it’s a regression of this patch.
> 
>> We could either add a compile-time or runtime feature to libcobalt that
>> permits to disable this silent fallback again or introduce alternative open 
>> and
>> socket implementations that do not expose this behavior. Spontaneously, I
>> would be in favor or a runtime switch for the existing implementations.
> 
> I assumed that __RT(open) would call the __cobalt_open function (without 
> fallback),
> __STD(open) would call libc' open, and  __wrap_open would be the only 
> function trying both.
> Turns out that __wrap and __cobalt are identical, but I don’t understand the 
> reasoning behind it.

To have an opportunity for late binding on the __wrap symbols too, so
that one can override the libcobalt wrappers for interposing its own
version of the cobalt calls. OTOH, __cobalt bindings are guaranteed to
tap into the original libcobalt implementation. Some users need this to
provide their own flavor of Xenomai POSIX system calls on top of libcobalt.

-- 
Philippe.



RE: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up

2019-07-11 Thread Lange Norbert via Xenomai
> -Original Message-
> From: Jan Kiszka 
> Sent: Mittwoch, 10. Juli 2019 23:31
> To: Lange Norbert ; Xenomai
> (xenomai@xenomai.org) ; Philippe Gerum
> 
> Subject: Re: ipipe 4.19: spurious APIC interrupt when setting rt_igp to up
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 09.07.19 19:54, Jan Kiszka wrote:
> > On 09.07.19 18:33, Jan Kiszka wrote:
> >> On 09.07.19 18:21, Lange Norbert wrote:
> >>> Hello,
> >>>
> >>> maxcpus=1 still causes the spurious int, this time fully locking up.
> >>>
> >>> I attached the debug/irq directory after the cause.
>  Some things that might be relevant:
> >>> -   the SOC would use PINCTRL_BROXTON under linux, but this is disabled
> (not fixed up for Xenomai)
> >>> -   I have the regular igb driver in use, and am unbinding the network
> card prior to binding the rt_igp driver
> >>>
> >>
> >> Thanks. What's the interrupt number that Xenomai is using? Should be
> >> the same that the Linux driver is using as well.
> >
> > Found already: Should be IRQ 130-132 for device 00:03.0. If the
> > directory state was like that while Xenomai was still holding those
> > interrupts, the problem it that there are no vectors assigned to them.
> > Can you confirm that rt_igb was still loaded and the interface was up?
> >
> > Are those interrupts MSI or MSI-X? Can't read that from the logs.
> >
> > I probably need to get some rt_igb running somewhere...
> >
>
> Still no luck, even on a box with a igb-driven NIC (I350):
>
> [  667.928036] rt_igb :06:00.1: Intel(R) Gigabit Ethernet Network
> Connection [  667.928064] rt_igb :06:00.1: rteth0: (PCIe:5.0Gb/s:Width
> x4) 00:25:90:5d:10:19 [  667.928149] rt_igb :06:00.1: rteth0: PBA No:
> 010A00-000 [  667.928153] rt_igb :06:00.1: Using MSI-X interrupts. 1 rx
> queue(s), 1 tx queue(s) xeon-d:~ # cat /proc/xenomai/irq
>   IRQ CPU0...CPU15
>47:   0...   79 rteth0-TxRx-0
>
> I'm currently using the two attached patches on top of ipipe-core-4.19.57-
> x86-3.

With those 2 patches ist now fixed on my end,
So far I used this:

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6c279e065879..d503b875f086 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1099,7 +1099,8 @@ void ipipe_enable_irq(unsigned int irq)
 ipipe_root_only();

 raw_spin_lock_irqsave(>lock, flags);
-if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
+if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP &&
+!WARN_ON(irq_activate(desc))) {
 desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
 chip->irq_startup(>irq_data);
 }

>
> Did you cross-check if the running kernel contains the fix(es)?

Yes, the old one.
Thanks for the fix.

Norbert


This message and any attachments are solely for the use of the intended 
recipients. They may contain privileged and/or confidential information or 
other information protected from disclosure. If you are not an intended 
recipient, you are hereby notified that you received this email in error and 
that any review, dissemination, distribution or copying of this email and any 
attachment is strictly prohibited. If you have received this email in error, 
please contact the sender and delete the message and any attachment from your 
system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You



Re: Enable/Disable of ftrace events crashes kernel

2019-07-11 Thread Richard Weinberger via Xenomai
On Thu, Jul 11, 2019 at 12:21 PM Jan Kiszka  wrote:
> Can't reproduce so far, even with a while-true loop. Can you share your 
> .config?

Sure, see attachment.

-- 
Thanks,
//richard
-- next part --
A non-text attachment was scrubbed...
Name: .config
Type: application/x-config
Size: 121667 bytes
Desc: not available
URL: 
<http://xenomai.org/pipermail/xenomai/attachments/20190711/0d3c0780/attachment.bin>


Re: Enable/Disable of ftrace events crashes kernel

2019-07-11 Thread Jan Kiszka via Xenomai
On 11.07.19 00:29, Richard Weinberger via Xenomai wrote:
> Hi!
> 
> I can reliable kill Linux on qemu by writing a few times 1 and 0 to
> /sys/kernel/debug/tracing/events/cobalt_core/enable
> 
> Didn't test on real hardware so far.
> The following splat happened on ipipe-core-4.19.57-x86-3 plus
> xenomai-git as of today.
> 
> [   33.664656] Kernel panic - not syncing: Machine halted.
> [   33.665323] CPU: 2 PID: 2088 Comm: bash Not tainted 4.19.57 #1
> [   33.666142] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.11.0-0-g63451fc-prebuilt.qemu-project.org 04/01/2014
> [   33.667524] I-pipe domain: Linux
> [   33.667895] Call Trace:
> [   33.668354]  <#DF>
> [   33.668695]  dump_stack+0x8e/0xb3
> [   33.669104]  panic+0xdd/0x238
> [   33.669456]  df_debug+0x24/0x30
> [   33.669834]  do_double_fault+0x95/0x120
> [   33.670323]  double_fault+0x3f/0x60
> [   33.670794] RIP: 0010:xnintr_core_clock_handler+0xad/0x370
> [   33.671426] Code: c0 48 09 c2 49 89 96 80 1a 00 00 49 8d ae 88 1a
> 00 00 48 8d 59 08 48 87 5d 00 48 c7 c0 d0 e3 02 00 48 83 01 01 cc 1f
> 44 00 00 <41> 8b 86 10 03 00 00 49 81 4e 08 00 40 00 00 83 c0 01 41 89
> 86 10
> [   33.673615] RSP: 0018:964ebbb03f58 EFLAGS: 00010002
> [   33.674235] RAX: 0002e3d0 RBX: 964ebbb315c0 RCX: 
> 964ebbb3bb00
> [   33.675079] RDX: 0013d41dbbce RSI: fc25fc34 RDI: 
> 964ebbb315c0
> [   33.675923] RBP: 964ebbb31748 R08: 964ebb000249 R09: 
> 0002e320
> [   33.676761] R10: 0040 R11:  R12: 
> 0002
> [   33.677600] R13: 0002fcc0 R14: 964ebbb2fcc0 R15: 
> 964ebbb2fcc0
> [   33.678444]  
> [   33.678704]  
> [   33.678955]  dispatch_irq_head+0x84/0x110
> [   33.679437]  __ipipe_handle_irq+0x7c/0x1d0
> [   33.679927]  apic_timer_interrupt+0x12/0x40
> [   33.680448]  
> [   33.680805] RIP: 0010:smp_call_function_many+0x1e0/0x250
> [   33.681505] Code: 5f 97 00 3b 05 d5 70 47 01 0f 83 99 fe ff ff 48
> 63 c8 48 8b 13 48 03 14 cd 00 b7 c9 ac 8b 4a 18 83 e1 01 74 0a f3 90
> 8b 4a 18 <83> e1 01 75 f6 eb c8 48 c7 c2 20 b9 f5 ac 48 89 ee 89 df e8
> b8 5f
> [   33.684312] RSP: 0018:a2478079bc00 EFLAGS: 0202 ORIG_RAX:
> ff13
> [   33.685347] RAX: 0001 RBX: 964ebbb35a00 RCX: 
> 0003
> [   33.686198] RDX: 964ebbab9c80 RSI:  RDI: 
> 964ebbb35a08
> [   33.687044] RBP: 964ebbb35a08 R08: 000b R09: 
> aba22300
> [   33.687883] R10: a2478079bc20 R11: f000 R12: 
> aba22200
> [   33.688725] R13:  R14: 0001 R15: 
> 0040
> [   33.689577]  ? optimize_nops+0xe0/0xe0
> [   33.690055]  ? alternatives_text_reserved+0x60/0x60
> [   33.690643]  ? optimize_nops+0xe0/0xe0
> [   33.691092]  ? xnintr_core_clock_handler+0xa9/0x370
> [   33.691657]  ? trace_event_raw_event_irq_event+0xa0/0xa0
> [   33.692489]  on_each_cpu+0x23/0x50
> [   33.692902]  ? xnintr_core_clock_handler+0xa8/0x370
> [   33.693464]  text_poke_bp+0x63/0xe0
> [   33.693875]  __jump_label_transform.isra.0+0x12f/0x140
> [   33.694466]  arch_jump_label_transform+0x26/0x40
> [   33.695093]  __jump_label_update+0x78/0xb0
> [   33.695567]  static_key_slow_inc_cpuslocked+0x83/0x90
> [   33.696147]  static_key_slow_inc+0x11/0x20
> [   33.696622]  tracepoint_probe_register_prio+0x214/0x290
> [   33.697241]  __ftrace_event_enable_disable+0x96/0x260
> [   33.697905]  __ftrace_set_clr_event_nolock+0xe8/0x130
> [   33.698488]  system_enable_write+0xb3/0xf0
> [   33.698537] BUG: Unhandled exception over domain Xenomai at
> 0xabb5413d - switching to ROOT
> [   33.699032]  __vfs_write+0x31/0x180
> [   33.700443]  ? selinux_file_permission+0x118/0x130
> [   33.700979]  ? security_file_permission+0x27/0xb0
> [   33.701491]  vfs_write+0xa8/0x190
> [   33.701856]  ksys_write+0x55/0xd0
> [   33.702220]  do_syscall_64+0x64/0x160
> [   33.702644]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   33.703210] RIP: 0033:0x7fcc38f5bd04
> [   33.703603] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f
> 1f 80 00 00 00 00 8b 05 2a fb 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3
> 48 83
> [   33.705712] RSP: 002b:7ffd5b051008 EFLAGS: 0246 ORIG_RAX:
> 0001
> [   33.706673] RAX: ffda RBX: 0002 RCX: 
> 7fcc38f5bd04
> [   33.707552] RDX: 0002 RSI: 564c21421700 RDI: 
> 0001
> [   33.708399] RBP: 564c21421700 R08: 000a R09: 
> 
> [   33.709264] R10: 000a R11: 0246 R12: 
> 0002
> [   33.710197] R13: 0001 R14: 7fcc39227720 R15: 
> 0002
> [   33.711080] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.57 #1
> [   33.711974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.11.0-0-g63451fc-prebuilt.qemu-project.org 04/01/2014
> [  

Re: [PATCH 1/2] demo/alchemy/altency: Use sigdebug_reason()

2019-07-11 Thread Philippe Gerum via Xenomai
On 7/10/19 11:46 PM, Richard Weinberger via Xenomai wrote:
> Since cobalt adds 0xfccf to si_value we can no longer
> use the raw value.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  demo/alchemy/altency.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/demo/alchemy/altency.c b/demo/alchemy/altency.c
> index 83cc806b5d7f..4b49755edaba 100644
> --- a/demo/alchemy/altency.c
> +++ b/demo/alchemy/altency.c
> @@ -461,7 +461,7 @@ static void sigdebug(int sig, siginfo_t *si, void 
> *context)
>  {
>   const char fmt[] = "%s, aborting.\n"
>   "(enabling CONFIG_XENO_OPT_DEBUG_TRACE_RELAX may help)\n";
> - unsigned int reason = si->si_value.sival_int;
> + unsigned int reason = sigdebug_reason(si);
>   int n __attribute__ ((unused));
>   static char buffer[256];
>  
> 

Ack.

-- 
Philippe.



Question on deleting a locked RT_MUTEX

2019-07-11 Thread Richard Weinberger via Xenomai
Hi!

doc/asciidoc/MIGRATION.adoc states:
- For consistency with the standard glibc implementation, deleting a
  RT_MUTEX object in locked state is no longer a valid operation.

I'm not sure how this affects my Xenomai 2 application which is being
ported to Xenomai 3.

If I have a mutex m1 and a misbehaving RT_TASK rt1 which exits right after
acquiring mutex m1, what is the recommended way to get the lock back into a
well defined state?

-- 
Thanks,
//richard