Re: [Xenomai-core] Handling PCI MSI interrupts

2006-02-17 Thread Jeroen Van den Keybus
I tried this patch and it doesn't solve the issue I'm facing. With andwithout this patch, my symptoms are the same.

I tested (and intended) the patch for MSI (w/o maskbits), not MSI-X.
What e1000 chip are you using exactly? Easiest way to tell is by using
'/sbin/lspci'. I may be able to help you out with MSI-X as well, but in
that case, I have no hardware platform to test on.

You can check whether or not MSI is actually being used by doing
'/sbin/lspci -v' and look for the Capability: Message Signalled
Interrupt. When the driver is running in MSI mode, it should read
'Enable+' instead of 'Enable-'.

Finally, verify how interrupts are dispatched. Have a look at /proc/interrupts for this (cat /proc/interrupts').

I'm running a Dell 2850, dual CPU machine.
As it's a Dell, I assume there's two Intel Penium CPU's inside. Are you running with SMP enabled ? 

When I build a kernel withoutAdeos then things are fine.When I build with Adeos and MSI enabled the
following occurs:1) If BIOS has USB disabled then the system will hang without even anum-lock respose (i.e. tapping the num-lock key doesn't toggle the light).The hang occurs just about the time the E1000 driver would load and enable
an MSI interrupt.2) If BIOS has USB enabled then the system will run much longer but may hangduring heavy interrupt load on the E1000 driver.
Are you using the e1000 driver in NAPI mode ? It is recommended to do
this, especially on the preemptible kernel, as it may significantly
reduce the interrupt volume. In that case, I think it is doubtful if
using MSI would give you any benefit at all over normal, shared IRQs. 

My assumption based on past experience is that no num-lock response means aninfinite interrupt loop.

The local (internal) CPU APIC hasn't been informed that the interrupt
has been dealt with and it will therefore allow no other interrupts
anymore to arrive in the CPU (including your keyboard's). In fact, your
CPU is idle.

[The original 8259 was designed to detect the IRET instruction bit
pattern on the databus and use that as an acknowledge signal. Upon
arrival of the second 8259 in the PC/AT, this could no longer be done.
I don't know if the APIC could do it today (it seems possible,
theoretically). ]
When I build a kernel with Adeos but disable MSI then the system works fine
for the most part.There is one scenario where the system will still hangdoing disk and network accesses under a moderate load of I/O.

Hm. That may indicate another issue.

Both of these tests are just to get a stable kernel before I really startusing Adeos.So Adeos is in its default configuration and I haven't loaded
Xenomai modules when these hangs occur.
I'm currently running the 2.6.14.4 kernel with the 2.6.14-1.0-12 patch ofadeos and then I included your msi.c patch from the previous e-mail.If youhave any further hints or suggestions I'll try them.Meanwhile I'm trying
different versions of various drivers (e1000 and scsi) as well as updatingthe patch level of the kernel itself.
Try upgrading the kernel. The kernel usually comes with updated drivers
as well. Currently I'm running 2.6.16-rc2, which I had to patch
manually for Adeos (about 3 'hunks' from the 2.6.15-i386-1.2-00 patch
didn't apply properly). By using 2.6.16-rc2, I got much better Intel
(especially i865 graphics) chipset support than 2.6.15. Note, however,
that I did the bug fixing in this thread on a plain 2.6.15, though (and
the msi.c code is nearly identical). 

I would recommend upgrading to 2.6.15 with the latest Adeos patch and try to get a stable system before enabling MSI.
Jeroen.



RE: [Xenomai-core] Handling PCI MSI interrupts

2006-02-17 Thread Russell Johnson

  I tested (and intended) the patch for MSI (w/o maskbits), not MSI-X.
  What e1000 chip are you using exactly? Easiest way to tell is by using 
  '/sbin/lspci'. I may be able to help you out with MSI-X as well, but in 
  that case, I have no hardware platform to test on.

  You can check whether or not MSI is actually being used by doing 
  '/sbin/lspci -v' and look for the Capability: Message Signalled 
  Interrupt. When the driver is running in MSI mode, it should read 
  'Enable+' instead of 'Enable-'.

This e1000 chip actually doesn't have MSI support.  I had assumed that since
the e1000 driver caused the hanging and disabling MSI in the kernel caused
the hang to go away that the problem was MSI in the e1000.  The e1000 driver
only enables MSI on newer chips than what are in the Dell 28xx machines.

  As it's a Dell, I assume there's two Intel Penium CPU's 
  inside. Are you running with SMP enabled ?

SMP is enabled.

  The local (internal) CPU APIC hasn't been informed that the interrupt 
  has been dealt with and it will therefore allow no other interrupts 
  anymore to arrive in the CPU (including your keyboard's). 
  In fact, your CPU is idle.

I have used a PCI analyzer to see infinite loops on this machine for past
similar kernel issues and assumed it would be the same due to the symptoms.

  When I build a kernel with Adeos but disable MSI then the 
  system works fine for the most part.  There is one scenario 
  where the system will still hang
  doing disk and network accesses under a moderate load of I/O. 
  
  Hm. That may indicate another issue.
 
 Indeed. This behaviour has not been reported yet with patches 
 from the Adeos I-pipe series. Does it also happen with SMP 
 disabled, or Hyperthreading disabled?

It did happen with SMP disabled and I have always left hyperthreading
disabled because it is my understanding that hyperthreading is not supported
by the adeos patch.

  Try upgrading the kernel. The kernel usually comes with updated drivers 
  as well. Currently I'm running 2.6.16-rc2, which I had to patch manually

  for Adeos (about 3 'hunks' from the 2.6.15-i386-1.2-00 patch didn't 
  apply properly). By using 2.6.16-rc2, I got much better Intel 
  (especially i865 graphics) chipset support than 2.6.15. Note, however, 
  that I did the bug fixing in this thread on a plain 2.6.15, though (and 
  the msi.c code is nearly identical).
  
  I would recommend upgrading to 2.6.15 with the latest Adeos patch and 
  try to get a stable system before enabling MSI.

In short, MSI doesn't seem to have been my issue.  I now have a more stable
kernel.  Apparently this system had some other faults with the specific
configuration options I was using.  I had to patch to the 2.6.14.7 level
(was at .4) and change some of the options in my .config.  Specifically, I
had to leave ACPI enabled (I had disabled as a test a while back).  With
ACPI disabled, the machine would still hang if the USB was disabled in the
BIOS.

After learning how to check for MSI, no devices in my system seem to
actually be using MSI.  The code patches you provided were never actually
executed.  Time will tell if my system is stable.

Thanks for your help!
Russ





Re: [Xenomai-core] Handling PCI MSI interrupts

2006-02-17 Thread Philippe Gerum

Russell Johnson wrote:

I tested (and intended) the patch for MSI (w/o maskbits), not MSI-X.
What e1000 chip are you using exactly? Easiest way to tell is by using 
'/sbin/lspci'. I may be able to help you out with MSI-X as well, but in 
that case, I have no hardware platform to test on.



You can check whether or not MSI is actually being used by doing 
'/sbin/lspci -v' and look for the Capability: Message Signalled 
Interrupt. When the driver is running in MSI mode, it should read 
'Enable+' instead of 'Enable-'.



This e1000 chip actually doesn't have MSI support.  I had assumed that since
the e1000 driver caused the hanging and disabling MSI in the kernel caused
the hang to go away that the problem was MSI in the e1000.  The e1000 driver
only enables MSI on newer chips than what are in the Dell 28xx machines.



Same problem here actually; the e1000 driver attempts to enable MSI routing for 
recent adapters (i82547 rev. #2, if I read this code correctly) due to bugs in 
older revisions. Unfortunately, the dual Xeon I've been using to check for 
CONFIG_PCI_MSI has an older adapter, so the routing is still done by the IO-APIC, 
and the bug does not trigger.




As it's a Dell, I assume there's two Intel Penium CPU's 
inside. Are you running with SMP enabled ?



SMP is enabled.


The local (internal) CPU APIC hasn't been informed that the interrupt 
has been dealt with and it will therefore allow no other interrupts 
anymore to arrive in the CPU (including your keyboard's). 
In fact, your CPU is idle.



I have used a PCI analyzer to see infinite loops on this machine for past
similar kernel issues and assumed it would be the same due to the symptoms.


   When I build a kernel with Adeos but disable MSI then the 
   system works fine for the most part.  There is one scenario 
   where the system will still hang
   doing disk and network accesses under a moderate load of I/O. 


Hm. That may indicate another issue.


Indeed. This behaviour has not been reported yet with patches 
from the Adeos I-pipe series. Does it also happen with SMP 
disabled, or Hyperthreading disabled?



It did happen with SMP disabled and I have always left hyperthreading
disabled because it is my understanding that hyperthreading is not supported
by the adeos patch.


Adeos should not have any problem with HT; actually it has no impact on the 
interrupt sub-system it deals with, we just happen to see multiple CPUs, which is 
common case handled by the SMP support.





Try upgrading the kernel. The kernel usually comes with updated drivers 
as well. Currently I'm running 2.6.16-rc2, which I had to patch manually



for Adeos (about 3 'hunks' from the 2.6.15-i386-1.2-00 patch didn't 
apply properly). By using 2.6.16-rc2, I got much better Intel 
(especially i865 graphics) chipset support than 2.6.15. Note, however, 
that I did the bug fixing in this thread on a plain 2.6.15, though (and 
the msi.c code is nearly identical).


I would recommend upgrading to 2.6.15 with the latest Adeos patch and 
try to get a stable system before enabling MSI.



In short, MSI doesn't seem to have been my issue.  I now have a more stable
kernel.  Apparently this system had some other faults with the specific
configuration options I was using.  I had to patch to the 2.6.14.7 level
(was at .4) and change some of the options in my .config.  Specifically, I
had to leave ACPI enabled (I had disabled as a test a while back).  With
ACPI disabled, the machine would still hang if the USB was disabled in the
BIOS.


You might want to try booting with acpi=ht, so that the ACPI kitchen sink is 
warmed up far enough to enumerate LAPICs but not more.




After learning how to check for MSI, no devices in my system seem to
actually be using MSI.  The code patches you provided were never actually
executed.  Time will tell if my system is stable.

Thanks for your help!


You are welcome.


Russ






--

Philippe.



Re: [Xenomai-core] Handling PCI MSI interrupts

2006-02-17 Thread Jeroen Van den Keybus
Could you post the patch you are successfully using to boot your box? TIA,


--- linux-2.6.15/drivers/pci/msi.c 2006-01-03 04:21:10.0 +0100
+++ linux-2.6.15-ipipe/drivers/pci/msi.c 2006-02-17 16:48:21.0 +0100
@@ -185,10 +185,20 @@
 spin_unlock_irqrestore(msi_lock, flags);
}

+#if defined(CONFIG_IPIPE)
+/* Attention: only MSI without maskbits is currently fixed for I-PIPE */
+static void ack_msi_irq_wo_maskbit(unsigned int vector)
+{
+ __ack_APIC_irq();
+}
+#endif /* CONFIG_IPIPE */
+
static void end_msi_irq_wo_maskbit(unsigned int vector)
{
 move_native_irq(vector);
+#if !defined(CONFIG_IPIPE)
 ack_APIC_irq();
+#endif /* !CONFIG_IPIPE */
}

static void end_msi_irq_w_maskbit(unsigned int vector)
@@ -244,7 +254,11 @@
 .shutdown = shutdown_msi_irq,
 .enable = do_nothing,
 .disable = do_nothing,
+#if defined(CONFIG_IPIPE)
+
.ack
= ack_msi_irq_wo_maskbit,
+#else /* CONFIG_IPIPE */
 .ack = do_nothing,
+#endif /* !CONFIG_IPIPE */

.end
= end_msi_irq_wo_maskbit,
 .set_affinity = set_msi_irq_affinity
};


Jeroen.


Re: [Xenomai-core] Handling PCI MSI interrupts

2006-02-17 Thread Philippe Gerum

Jeroen Van den Keybus wrote:

Could you post the patch you are successfully using to boot your
box? TIA,



--- linux-2.6.15/drivers/pci/msi.c  2006-01-03 04:21:10.0 +0100
+++ linux-2.6.15-ipipe/drivers/pci/msi.c2006-02-17 
16:48:21.0 +0100

@@ -185,10 +185,20 @@
spin_unlock_irqrestore(msi_lock, flags);
 }

+#if defined(CONFIG_IPIPE)
+/* Attention: only MSI without maskbits is currently fixed for I-PIPE */
+static void ack_msi_irq_wo_maskbit(unsigned int vector)
+{
+   __ack_APIC_irq();
+}
+#endif /* CONFIG_IPIPE */
+
 static void end_msi_irq_wo_maskbit(unsigned int vector)
 {
move_native_irq(vector);
+#if !defined(CONFIG_IPIPE)
ack_APIC_irq();
+#endif /* !CONFIG_IPIPE */


ack_APIC_irq() is nullified when CONFIG_IPIPE is enabled, and __ack_APIC_irq() 
stands for the actual APIC acknowledging code. So the change above is not needed.



 }

 static void end_msi_irq_w_maskbit(unsigned int vector)
@@ -244,7 +254,11 @@
.shutdown   = shutdown_msi_irq,
.enable = do_nothing,
.disable= do_nothing,
+#if defined(CONFIG_IPIPE)
+   .ack= ack_msi_irq_wo_maskbit,
+#else /* CONFIG_IPIPE */
.ack= do_nothing,
+#endif /* !CONFIG_IPIPE */
.end= end_msi_irq_wo_maskbit,
.set_affinity   = set_msi_irq_affinity
 };



Ok; unless my brain is completely toast, the last patch I recently posted does the 
same, but extends the support to the MSI and MSI-X with masking bit cases. Could 
you test in on your box with a vanilla 2.6.15 when time allows? If it works, then 
I will roll out a new Adeos/x86 patch including this fix. TIA,


--- 2.6.15/drivers/pci/msi.c2006-01-03 04:21:10.0 +0100
+++ 2.6.15-ipipe/drivers/pci/msi.c2006-02-16 10:30:27.0 +0100
@@ -149,6 +149,21 @@
 msi_set_mask_bit(vector, 0);
 }

+#ifdef CONFIG_IPIPE
+static void ack_MSI_irq_w_maskbits(unsigned int vector)
+{
+mask_MSI_irq(vector);
+__ack_APIC_irq();
+}
+static void ack_MSI_irq_wo_maskbits(unsigned int vector)
+{
+__ack_APIC_irq();
+}
+#else /* !CONFIG_IPIPE */
+#define ack_MSI_irq_wo_maskbits  do_nothing
+#define ack_MSI_irq_w_maskbits   mask_MSI_irq
+#endif /* CONFIG_IPIPE */
+
 static unsigned int startup_msi_irq_wo_maskbit(unsigned int vector)
 {
 struct msi_desc *entry;
@@ -212,7 +227,7 @@
 .shutdown= shutdown_msi_irq,
 .enable= unmask_MSI_irq,
 .disable= mask_MSI_irq,
-.ack= mask_MSI_irq,
+.ack= ack_MSI_irq_w_maskbits,
 .end= end_msi_irq_w_maskbit,
 .set_affinity= set_msi_irq_affinity
 };
@@ -228,7 +243,7 @@
 .shutdown= shutdown_msi_irq,
 .enable= unmask_MSI_irq,
 .disable= mask_MSI_irq,
-.ack= mask_MSI_irq,
+.ack= ack_MSI_irq_w_maskbits,
 .end= end_msi_irq_w_maskbit,
 .set_affinity= set_msi_irq_affinity
 };
@@ -244,7 +259,7 @@
 .shutdown= shutdown_msi_irq,
 .enable= do_nothing,
 .disable= do_nothing,
-.ack= do_nothing,
+.ack= ack_MSI_irq_wo_maskbits,
 .end= end_msi_irq_wo_maskbit,
 .set_affinity= set_msi_irq_affinity
 };
--

Philippe.



Re: [Xenomai-core] Handling PCI MSI interrupts

2006-02-17 Thread Jeroen Van den Keybus
Ok; unless my brain is completely toast, the last patch I recently posted does thesame, but extends the support to the MSI and MSI-X with masking bit cases.

Correct. 

 Could you test in on your box with a vanilla 2.6.15 when time allows? If it works, then

I will roll out a new Adeos/x86 patch including this fix. TIA,
I'll do that. Give me half an hour.


Jeroen.




Re: [Xenomai-core] Handling PCI MSI interrupts

2006-02-17 Thread Philippe Gerum

Jeroen Van den Keybus wrote:
Ok, done. The patch works. Took me longer than expected, as I had to 
find out that 8 spaces don't make a TAB for 'patch'...




Perfect. Thanks.


Jeroen.




--

Philippe.