Re: [uClinux-dev] Re: MCF5282 network stall problem

Greg Ungerer Wed, 05 Mar 2008 22:56:04 -0800

Hi Juan,

Juan Alberto Muñoz Susin wrote:

I test the IRQF_DISABLED setting OK.


Thanks.

I agree about the TX and RX interrupt overlap. I'm sending the full patch
that I use to solve the problem avoiding setting IRQF_DISABLED. It has been
generated from 20071001 patch. It has common fixes whith Claude´s patch and
some additional code modifications, like four different interrupt routines,
that I think may be usefull to generate a patch in next system versions. The
main code changes are:


I have tried this (converting it to also work on the 520x I am using
for testing at the moment). It works (as expected), but I am seeing
something odd.

I get a lot of interrup events for each packet. So for example,
after booting and getting a dhcp address I see:


/> ifconfig
eth0      Link encap:Ethernet  HWaddr 00:14:15:00:00:17
          inet addr:10.46.12.213  Bcast:10.46.12.255  Mask:255.255.255.0
          UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

/> cat /proc/interrupts
           CPU0
 68:        958      M68K-INTC  timer
 90:       1949      M68K-INTC  UART
 91:          0      M68K-INTC  UART
 92:          0      M68K-INTC  UART
100:         70      M68K-INTC  fec(TXF)
101:          0      M68K-INTC  fec(TXB)
102:          0      M68K-INTC  fec(TXFIFO)
103:          0      M68K-INTC  fec(TXCR)
104:         42      M68K-INTC  fec(RXF)
105:          0      M68K-INTC  fec(RXB)
106:        126      M68K-INTC  fec(MII)
107:          0      M68K-INTC  fec(LC)
108:          0      M68K-INTC  fec(HBERR)
109:          0      M68K-INTC  fec(GRA)
110:          0      M68K-INTC  fec(EBERR)
111:          0      M68K-INTC  fec(BABT)
112:          0      M68K-INTC  fec(BABR)
/>


So we have sent 5 packets, but have processed 70 TX events.

That wasn't the case with out these driver changes. It reports
5 TX events in the above case.

Do you see this in your setup?

Regards
Greg

1.- It has four interrupt routines to handle TXF, RXF, MII and "spurious"
interrupts of FEC:

static irqreturn_t fec_enet_interrupt_NULL(int irq, void * dev_id);
static irqreturn_t fec_enet_interrupt_TXF(int irq, void * dev_id);
static irqreturn_t fec_enet_interrupt_RXF(int irq, void * dev_id);
static irqreturn_t fec_enet_interrupt_MII(int irq, void * dev_id);

...

        for (idp = id; idp->name; idp++) {
                if ((idp->irq != 23) && (idp->irq != 27) && (idp->irq !=
29)) {
                        if (request_irq(b+idp->irq, fec_enet_interrupt_NULL,
0, idp->name, dev) != 0)
                                printk("FEC: Could not allocate %s
IRQ(%d)!\n", idp->name, b+idp->irq);
                }

                if (idp->irq == 23) {
                        if (request_irq(b+idp->irq, fec_enet_interrupt_TXF,
0, idp->name, dev) != 0)
                                printk("FEC: Could not allocate %s
IRQ(%d)!\n", idp->name, b+idp->irq);
                }

                if (idp->irq == 27) {
                        if (request_irq(b+idp->irq, fec_enet_interrupt_RXF,
0, idp->name, dev) != 0)
                                printk("FEC: Could not allocate %s
IRQ(%d)!\n", idp->name, b+idp->irq);
                }

                if (idp->irq == 29) {
                        if (request_irq(b+idp->irq, fec_enet_interrupt_MII,
0, idp->name, dev) != 0)
                                printk("FEC: Could not allocate %s
IRQ(%d)!\n", idp->name, b+idp->irq);
                }

        }

2.- The RX interrupt is masked in RXF interrupt server routine. This is not
needed setting IRQF_DISABLED or acknowledging the RX event after processing
the ring, like Claude´s patch does.
static irqreturn_t
fec_enet_interrupt_RXF(int irq, void * dev_id)
{
        struct  net_device *dev = dev_id;
        volatile fec_t  *fecp;
        uint    int_events;
        int handled = 0;

        fecp = (volatile fec_t*)dev->base_addr;
        int_events = fecp->fec_ievent;

        fecp->fec_imask = (FEC_ENET_TXF | FEC_ENET_MII);
        fecp->fec_ievent = (int_events & (FEC_NOTUSED | FEC_ENET_RXF));

        if(int_events & FEC_ENET_RXF) {
                handled = 1;
                fec_enet_rx(dev);
        }

        fecp->fec_imask = (FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII);
return IRQ_RETVAL(handled);
 }

3.- It mask TXB and RXB events, like Claude´s patch:

        fecp->fec_imask = (FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII);

4.- I change the "ilip" variable initial value from 0x28 (interrupt level 5,
priority 0) to 0x27 (interrupt level 4, priority 7) to avoid interrupt level
of TXF event was greater than interrupt level of RXF event. I think this
must be included to avoid problems if IRQF_DISABLED behaviour changes in
next versions.

                for (i = 23, ilip = 0x27; (i < 36); i++)
                        icrp[i] = ilip--;

Regards

        Alberto

-----Mensaje original-----
De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
En nombre de Greg Ungerer
Enviado el: miércoles, 27 de febrero de 2008 6:45
Para: uClinux development list
Asunto: Re: [uClinux-dev] Re: MCF5282 network stall problem

Hi Juan,

Juan Alberto Muñoz Susin wrote:
I have similar problems with uClinux-dist-20070130 with patch20070130-200701001.
I think the origin of the problem is not a FEC RX ring overflow due topackets that FEC insert in the ring. I see that a receive FECinterrupt is beggining its execution before the execution of theprevious one is finished. It makes that the driver pointer to thereceiver ring takes an incorrect value and lost synchronization of FECand driver. Sequence
example:

        Interrupt N                             Interrupt N+1
        ===========                             =============
        bdp = fep->cur_rx
                                                bdp = fep->cur_rx

                                                bdp += 1
                                                bdp += 1
                                                save bdp (initial bdp+2)
        bdp += 1
        save bdp (initial bdp+1)
The MCF5282 hardware interrupt controller avoids these problemssetting automaticaly the interrupt level mask of the microprocessorstatus register to the level of the interrupt that has been served.But something in the interrupt scheduller of uClinux 2.6.x set theinterrupt level mask to 0 when the FEC interrupt service routine begins
its execution.
The problem can be avoided adding the flag IRQF_DISABLED to request_irq:

        if (request_irq(b+idp->irq, fec_enet_interrupt, IRQF_DISABLED,
idp->name, dev) != 0)
So, the interrupt scheduller of uClinux 2.6.x set the interrupt levelmask of the microprocessor status register to 7 when the FEC interruptservice routine begins its execution. It avoids that a FEC RXinterrupt can begin to execute before the finish of the execution of the
previous one.

I think this is in fact the real culprit of all the recently reported
problems with the FEC driver.

The problem is a little larger than just the re-entrance of RX interrupts.
If you look at the interrupt handling for the current FEC driver it vectors
all interrupt types through the common
fec_enet_interrupt() code. And it handles all types, so a new TX interrupt
could interrupt an old RX interrupt that was actually in the middle of
processing TX interrupts!
On the other hand, setting the IRQF_DISABLED flag avoids the executionof all maskable system interrupts during the FEC interrupt routines.Due to this fact, I think that it isn't the best way to solve theproblem, but it´s a way to verify the origin of the problem. I thinkthe fix proposed by Claude, acknowledging the RX event afterprocessing the ring, is better than setting IRQF_DISABLED flag.
Removing the RX buffer interrupt is probably ok. But the bigger problem is
the common interrupt handler, and that needs the interrupts to be
IRQF_DISABLED to fix.

Another solution may be to setup the IRQ's to only handle their type (so RX
interrupt types call direct to fec_enet_rx() and TX interrupts call direct
to fec_enet_tx()). This would then not need IRQF_DISABLED either.
In my opinion, the best way to solve this problem and to avoid theposibility of similar problems in other drivers, will be keep theinterrupt level mask of the microprocessor status register to the reallevel of interrupt that has been proccessing. But I don´t know if it is
possible.

That is really what IRQF_DISABLED is meant to do. So it is the right way to
flag that this interrupt must be run with interrupts still disabled.

As a side note, I suspect this may have come to light after the changes I
made to the interrupt sub-system a couple of revisions ago. It made the
ColdFire interrupt handling consistent with other Linux architectures - and
it fixed the problem that interrupts where always disabled for the entire
duration of the handler.

So for now I plan on applying the IRQF_DISABLED flag to the
request_irq() calls as the primary fix for this problem.
Can all involved please test and report goodness or otherwise?

Claude: if you want to look at cleaning up the use of FEC_ENET_TXB/RXB,
         usage I would apply that as a separate patch as well.

Regards
Greg
-----Mensaje original-----
De: [EMAIL PROTECTED][mailto:[EMAIL PROTECTED]
En nombre de Phil Wilshire
Enviado el: domingo, 17 de febrero de 2008 20:20
Para: uClinux development list
Asunto: Re: [uClinux-dev] Re: MCF5282 network stall problem

Hi Claude

Thank you very much for the details.
They are much appreciated.
I'll be taking a look at the 5272 code I have and see if we cantimprove that too.
Oh that everyone would explain their patches so well.

Best Regards
   Phil WIlshire

Claude wrote:
Hi again,

Marco Cavallini wrote:
 >Hi Claude,
 >I had similar problems on a MCF5282, which seems being solved with
modifications to cache setting code, I'd like to try your patch as
soon  >as I have some spare time (I hope this week).
>Feel free to contact me privately to discuss your patch and otherthings >I've done to work this board properly.
I had already disabled the CACHE completely but FEC problem persisted.
In order to let people having older driver versions I'd like to makea summary of the changes here:
First, since FEC driver only processes buffers holding entirepackets, it is not necessary to enable interrupts for RXB and TXBevents (rx buffer and tx buffer), but only for RXF and TXF events (rxand tx frames). This may produce unnecessary interrupts to CPU.Interrupts are enabled infec_enet_init() and fec_restart() functions. If you take a look tothe patch, only necessary int's are enabled:
  fecp->fec_imask = (FEC_ENET_TXF | /*FEC_ENET_TXB |*/
            FEC_ENET_RXF /*| FEC_ENET_RXB */ | FEC_ENET_MII);
Commented lines have been left for convenience. They should beremoved, however, to clean up code.
On the other hand, other changes affected the interrupt service routine.In my opinion, it's fine that, in order to process all packetswaiting in the queue, the ring should be entirely processed when aninterrupt is raised when receiving a packet (see the loop infec_enet_rx(), which is called from the ISR). But I don't agree withmaking a loop in the ISR to handle all different events (tx, rx andmii events) that are waiting, since they should be acknowledge in a
different manner.
The old FEC driver does the following, in the ISR:

while ("pending interrupts"){
    acknowledge all interrupts
    if RX event, then process RX event
    if TX event, then process TX event
    if MII event, then process MII event }

The fixed code (the patch's one) does this, instead:

if ("pending interrupts") {
if RX event, then process RX event and then acknowledge RX event(clearing the corresponding flag)if TX event, then process TX event and then acknowledge TX event(clearing the corresponding flag)if MII event, then process MII event and then acknowledge MIIevent (clearing the corresponding flag) }
I saw that the first solution produced, after some time, a lost ofsynchronization when processing the ring. It seems that acknowledgingthe RX event before processing the ring makes the FEC to allowentering more packets till produce a ring overflow.
Finally, the "pending interrupts" condition above, was implemented asfollows (in the old FEC driver):
    ((int_events = fecp->ievent) != 0)

The fixed code does:
    ((int_events = (fecp->ievent & (FEC_ENET_RXF | FEC_ENET_TXF |
FEC_ENET_MII))) != 0)
This avoids the ISR (after receiving a RX,TX or MII event) to keeprunning if other flag in the fecp->ievent register is raised for anyreason (error flag, for example). However, the replacement of "while"
for "if" implicitly avoids it.


Well, I hope this solution could help someone. At least it has helped
me!!
Claude










_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev
_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev
--
------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     [EMAIL PROTECTED]
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com
_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


------------------------------------------------------------------------

_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


--
------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     [EMAIL PROTECTED]
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com
_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

Re: [uClinux-dev] Re: MCF5282 network stall problem

Reply via email to