date:20060110

Re: [Xenomai-core] latency kernel part crashes on ppc64

2006-01-10 Thread Philippe Gerum


Gilles Chanteperdrix wrote:

Philippe Gerum wrote:
  Likely not this time (keep it cold anyway, who knows); I strongly suspect a bug in 
  xnarch_switch_to() for all archs but x86


Now that you are talking about it, I may be the one who owes a beer to
everyone by first having put a bug in ia64 context-switch code... If I
am not wrong, the bug should be observed on ia64 too. Unfortunately, I
am unable to compile my ia64 kernel right now, so I wrote a patch for
power PC, and would be glad if some power PC owner could try it.



Nope, same lockup with hybrid scheduling.


It should work on PPC 64 too...





Index: include/asm-powerpc/system.h
===
--- include/asm-powerpc/system.h(revision 410)
+++ include/asm-powerpc/system.h(working copy)
@@ -182,13 +182,19 @@
 {
 struct task_struct *prev = out_tcb-active_task;
 struct task_struct *next = in_tcb-user_task;
+static unsigned long last_ksp;
+static xnarchtcb_t *last_tcb;
 
+last_tcb = out_tcb;

 in_tcb-active_task = next ?: prev;
 
 if (next  next != prev) /* Switch to new user-space thread? */

{
struct mm_struct *mm = next-active_mm;
 
+if (prev != out_tcb-user_task)

+last_ksp = prev-thread.ksp;
+
/* Switch the mm context.*/
 
 #ifdef CONFIG_PPC64

@@ -245,6 +251,22 @@
 else
 /* Kernel-to-kernel context switch. */
 rthal_switch_context(out_tcb-kspp,in_tcb-kspp);
+
+/* If we are not the epilogue of a regular linux schedule(),... */
+if (likely(test_bit(xnarch_current_cpu(),rthal_cpu_realtime)) 
+/* we are a user-space thread,... */
+out_tcb-user_task 
+/* the last context switch used linux switch routine,... */
+last_tcb-active_task != out_tcb-user_task 
+/* but the last xenomai context was a kernel thread,... */
+last_tcb-user_task != last_tcb-active_task)
+{
+/* then linux context switch routine saved kernel thread stack pointer
+   in last user-space context. We hence put the stack pointers in the
+   right place. */
+last_tcb-ksp = last_tcb-active_task-thread.ksp;
+last_tcb-active_task-thread.ksp = last_ksp;
+}
 }
 
 static inline void xnarch_finalize_and_switch (xnarchtcb_t *dead_tcb,



--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Re: [Xenomai-core] latency kernel part crashes on ppc64

2006-01-10 Thread Philippe Gerum


Gilles Chanteperdrix wrote:

Philippe Gerum wrote:
  Likely not this time (keep it cold anyway, who knows); I strongly suspect a bug in 
  xnarch_switch_to() for all archs but x86


Now that you are talking about it, I may be the one who owes a beer to
everyone by first having put a bug in ia64 context-switch code... If I
am not wrong, the bug should be observed on ia64 too. Unfortunately, I
am unable to compile my ia64 kernel right now, so I wrote a patch for
power PC, and would be glad if some power PC owner could try it.



Nope, same lockup with hybrid scheduling.


It should work on PPC 64 too...





Index: include/asm-powerpc/system.h
===
--- include/asm-powerpc/system.h(revision 410)
+++ include/asm-powerpc/system.h(working copy)
@@ -182,13 +182,19 @@
 {
 struct task_struct *prev = out_tcb-active_task;
 struct task_struct *next = in_tcb-user_task;
+static unsigned long last_ksp;
+static xnarchtcb_t *last_tcb;
 
+last_tcb = out_tcb;

 in_tcb-active_task = next ?: prev;
 
 if (next  next != prev) /* Switch to new user-space thread? */

{
struct mm_struct *mm = next-active_mm;
 
+if (prev != out_tcb-user_task)

+last_ksp = prev-thread.ksp;
+
/* Switch the mm context.*/
 
 #ifdef CONFIG_PPC64

@@ -245,6 +251,22 @@
 else
 /* Kernel-to-kernel context switch. */
 rthal_switch_context(out_tcb-kspp,in_tcb-kspp);
+
+/* If we are not the epilogue of a regular linux schedule(),... */
+if (likely(test_bit(xnarch_current_cpu(),rthal_cpu_realtime)) 
+/* we are a user-space thread,... */
+out_tcb-user_task 
+/* the last context switch used linux switch routine,... */
+last_tcb-active_task != out_tcb-user_task 
+/* but the last xenomai context was a kernel thread,... */
+last_tcb-user_task != last_tcb-active_task)
+{
+/* then linux context switch routine saved kernel thread stack pointer
+   in last user-space context. We hence put the stack pointers in the
+   right place. */
+last_tcb-ksp = last_tcb-active_task-thread.ksp;
+last_tcb-active_task-thread.ksp = last_ksp;
+}
 }
 
 static inline void xnarch_finalize_and_switch (xnarchtcb_t *dead_tcb,



--

Philippe.

[Xenomai-core] Re: nucleus:shared irq, draft v.2

2006-01-10 Thread Dmitry Adamushko

On 09/01/06, Gilles Chanteperdrix [EMAIL PROTECTED] wrote:
 Dmitry Adamushko wrote:
   Hi everybody,
  
   I'm presenting it one last time as a draft, so it's a right time to give
 all
   the remarks concerning design/implementation issues for all interested
   parties.
  
   Any feedback is wellcome.

 Maybe I missed some recent changes in Xenomai HAL or Adeos, in which
 case, please forget my remark, but calling xnarch_hook_irq,
 i.e. rthal_irq_request when nklock is locked, irq off, may lead to
 deadlock on multi-processor machines. The situation is (or used to be)
 as follows:

 CPU1
 holds the nklock
 calls rthal_irq_request, which calls ipipe_critical_enter, which:
triggers the critical IPI
spins, waiting for other CPUs to enter __ipipe_do_critical_sync

 CPU2
 spins on nklock, irq off
 never handles the critical IPI, because irqs are off


Nop, your remark is actually right to the place. I had some doubts
regarding the use of nklock here but, flankly, I didn't even consider
that possible deadlock. Thanks for a hint.

 I only skimmed over the discussion about ipipe_virtualize_irq_from. I do
 not know if it finally flushes all pending interrupts an all CPUs when
 called with a NULL handler. But if it does, the call to
 xnintr_shirq_spin seems redundant in xnintr_detach.

Well, the problem is that the shirq-handlers list, i.e. the shirq
object itself may be in use
at the moment when xnintr_detach() is called. That said, the shirq
object as well as all the elements of shirq-handlers (those that have
been removed by xnintr_detach() from the list) must remain valid as
long as there is a isr handler for the given irq running (i.e.
xnintr_irq_handler() ).

To sum it up:

xnintr_shirq_spin() is for:

o  shirq must be deleted only when the xnintr_irq_handler() for the
given irq has completed;

o  even if there is no need to delete the shirq object (there are
other intr objects in the list) the xnintr_detach() must wait until
the handler gets completed, thas keeping intr-link valid.

Ok, maybe my explanation is a bit confusing but the direct analogy is
the way used in Linux, namely the synchroize_irq() call (take a look
at how it's used in free_irq()).

This is a kind of RCU. The element is removed from the list but it's
completely freed only when all the read clients are completed (in our
case, xnintr_irq_handler() and handle_IRQ_events() in Linux).


 --


   Gilles Chanteperdrix.



--
Best regards,
Dmitry Adamushko

[Xenomai-core] Re: nucleus:shared irq, draft v.2

2006-01-10 Thread Dmitry Adamushko

On 09/01/06, Jan Kiszka [EMAIL PROTECTED] wrote:

 My mail client unfortunately refused to inline your patches, so I have to:

  --- intr-cvs.c  2005-12-02 19:56:20.0 +0100
  +++ intr.c  2006-01-09 21:26:44.0 +0100
 [...]
  @@ -204,11 +242,66 @@
   int xnintr_attach (xnintr_t *intr,
 void *cookie)
   {
  +rthal_irq_handler_t irq_handler;
  +unsigned irq = intr-irq;
  +xnshared_irq_t *shirq;
  +int err = 0;
  +spl_t s;
  +
   intr-hits = 0;
   intr-cookie = cookie;
  -return
 xnarch_hook_irq(intr-irq,xnintr_irq_handler,intr-iack,intr);
  +
  +xnlock_get_irqsave(nklock,s);
  +
  +irq_handler = rthal_irq_handler(rthal_domain,irq);
  +
  +if (irq_handler)
  +   {
  +   xnintr_t *old;
  +   shirq = rthal_irq_cookie(rthal_domain,irq);
  +   
  +   if (irq_handler != xnintr_irq_handler)
  +   {
  +   err = -EBUSY;
  +   goto unlock_and_exit;
  +   }
  +   
  +   old = link2intr(getheadq(shirq-handlers));
  +
  +   if (!(old-flags  intr-flags  XN_ISR_SHARED))
  +   {
  +   err = -EBUSY;
  +   goto unlock_and_exit;
  +   }
  +   
  +   appendq(shirq-handlers,intr-link);
  +   }
  +else
  +   {
  +   shirq = xnmalloc(sizeof(*shirq));

 Why not allocating that piece of memory before taking the global lock?
 If you don't need it, you can drop it again afterward. If you need but
 the returned pointer is NULL, you can still check at the same place you
 do now. Just an idea to avoid complex functions inside the global lock
 for new xeno-code.

Ack. I thought about that actually.


  +   
  +   if (!shirq)
  +   {
  +   err = -ENOMEM;
  +   goto unlock_and_exit;
  +   }
  +
  +   initq(shirq-handlers);
  +   appendq(shirq-handlers,intr-link);
  +
  +   err = xnarch_hook_irq(irq,xnintr_irq_handler,intr-iack,shirq);
  +   
  +   if (err)
  +   xnfree(shirq);
  +   }
  +
  +unlock_and_exit:
  +
  +xnlock_put_irqrestore(nklock,s);
  +return err;
   }
 
  +
   /*!
* \fn int xnintr_detach (xnintr_t *intr)
* \brief Detach an interrupt object.
  @@ -241,7 +334,32 @@
   int xnintr_detach (xnintr_t *intr)
 
   {
  -return xnarch_release_irq(intr-irq);
  +unsigned irq = intr-irq;
  +xnshared_irq_t *shirq;
  +int cleanup = 0;
  +int err = 0;
  +spl_t s;
  +
  +xnlock_get_irqsave(nklock,s);
  +
  +shirq = rthal_irq_cookie(rthal_domain,irq);
  +
  +removeq(shirq-handlers,intr-link);
  +
  +if (!countq(shirq-handlers))
  +   {
  +   err = xnarch_release_irq(irq);
  +   cleanup = 1;
  +   }
  +
  +xnlock_put_irqrestore(nklock,s);
  +
  +xnintr_shirq_spin(shirq);
  +
  +if (cleanup)
  +   xnfree(shirq);
  +
  +return err;
   }
 
   /*!
  @@ -350,17 +468,45 @@

 I guess here starts the new IRQ handler. BTW, diff -p helps a lot when
 reading patches as a human being and not as a patch tool. ;)

 
   {
   xnsched_t *sched = xnpod_current_sched();
  -xnintr_t *intr = (xnintr_t *)cookie;
  -int s;
  +xnshared_irq_t *shirq = (xnshared_irq_t *)cookie;
  +xnholder_t holder;
  +spl_t flags;
  +int s = 0;
 
   xnarch_memory_barrier();
 
   xnltt_log_event(xeno_ev_ienter,irq);
 
  +xnlock_get_irqsave(nklock,flags);
  +
  +shirq = rthal_irq_cookie(rthal_domain,irq);
  +xnintr_shirq_lock(shirq);
  +
  +xnlock_put_irqrestore(nklock,flags);

 Why _irqsave in IRQ context?

 Regarding this locking isssue in general, I first had some RCU-mechanism
 in mind to avoid the reader-side lock in the IRQ handler. But as IRQs
 typically touch some nucleus service with its own nklock-acquire anyway,
 optimising this here does not seem to be worth the effort now. As long
 as we have the global lock, it's ok and much simpler I think.

As Gilles has pointed out, nklock is dangerous here so let's forget
about it so far.
Actually, the way the shirq-handlers list is used remains some kind
of RCU, I guess.

Look, xnintr_irq_handler() doesn't hold the lock while iterating
through the list.

That's why xnintr_shirq_spin() is there, I tried to explain the idea
in my previous answer to the Gilles's letter.

Actually, I still have some doubts regarding this way, in particular,
whether everything is ok with atomicity.

I don't have my computer at hand and it seems I am occuping the comp
of my friend at the moment :) I'll post other details later if need
be.


  +
  +if (!shirq)
  +   {
  +   xnintr_shirq_unlock(shirq);
  +   xnltt_log_event(xeno_ev_iexit,irq);
  +   return;
  +   }
  +
   ++sched-inesting;
  -s = intr-isr(intr);
  +
  +holder = getheadq(shirq-handlers);
  +
  +while (holder)
  +   {
  +   xnintr_t *intr = link2intr(holder);
  +   holder = nextq(shirq-handlers,holder);
  +
  +   s |= intr-isr(intr);
  +   ++intr-hits;
  +   }
  +
  +xnintr_shirq_unlock(shirq);
  +
   --sched-inesting;
  -++intr-hits;
 
   if (s  XN_ISR_ENABLE)
  xnarch_enable_irq(irq);

Re: [Xenomai-core] latency kernel part crashes on ppc64

Re: [Xenomai-core] latency kernel part crashes on ppc64

[Xenomai-core] Re: nucleus:shared irq, draft v.2

[Xenomai-core] Re: nucleus:shared irq, draft v.2

4 matches

Site Navigation

Mail list logo

Footer information