Hello,

On 07/24/2010 11:11 AM, Alexander Holler wrote:
> Hello,
> 
> I'm currently testing a MCP2515 connected to an AT91-board.
> 
> Besides some small changes to use an gpio-irq, I've come across about a
> problem in the driver which leads to an endless IRQ. For the test I've
> used two nodes, a PEAK USB controller and said MCP2515, both configured
> to use a bitrate of 500k. When I'm now sending a simple message from the
> peak, the mcp251x-driver will end in an endless irq and the MCP2515 will
> endless send error-messages.
> 
> Here is an debug output which should describe the problem:
> 
> ---------------------------------------------
> peak:
> cansend can0 "5A1#11.2233.44556677.01 / 123#DEADBEEF / 5AA# /"
> 
> at91:
> [   93.730000] mcp251x_can_ist entered
> [   93.770000] mcp251x_can_ist intf: 0xa0
> [   93.890000] mcp251x_can_ist eflag: 0xb
>  forever:
> [   93.930000] mcp251x_can_ist intf: 0x80
> [   94.060000] mcp251x_can_ist eflag: 0xb
> [   94.100000] mcp251x_can_ist intf: 0x80
> [   94.220000] mcp251x_can_ist eflag: 0xb
> ...
> 
> peak:
> candump any,0:0,#ffffffff
>  forever:
>   can0  20000004  [8] 00 04 00 00 00 00 00 00   ERRORFRAME
> ---------------------------------------------

Looks like the infamous bus error flooding, which can hang low-end
systems. There seem to be electrical problem on the bus triggering bus
errors on the MCP251x. Just run "candump any,0:0,#ffffffff" on the at91
system as well.

> Using the at91_can-driver (and can-hw) instead of the mcp251x on the
> AT91 box I can't reproduce this, so it seems not to be a problem of the
> cabling or the sender (peak) side.

Also here running "candump any,0:0,#ffffffff" on the at91 will provide
more information.

> Anyway, currently I don't care for the reason of the fault on the bus,
> but I think at least that endless loop is something which should be
> adressed because this renders the AT91 system almost useless (and could
> be triggered from remote).
> 
> My first simple solution for this is currently the following patch:
> 
> --------------------------------
> diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c
> index b11a0cb..e1a3745 100644
> --- a/drivers/net/can/mcp251x.c
> +++ b/drivers/net/can/mcp251x.c
> @@ -835,7 +835,7 @@ static irqreturn_t mcp251x_can_ist(int irq, void
> *dev_id)
>                         }
>                 }
> 
> -               if (intf == 0)
> +               if ((intf & ~CANINTF_MERRF) == 0)
>                         break;
> 
>                 if (intf & (CANINTF_TX2IF | CANINTF_TX1IF |
> CANINTF_TX0IF)) {
> --------------------------------
> 
> With that patch the IRQ will be released even if the MERRF is set in the
> CANINTF-register. This will not fix the problem, but at least the system
> is still usable in such a case.

You seem not to use the MCP251x driver from the mainline kernel. There,
MERR is not enabled any more. Here is the commit message:

commit bf66f3736a945dd4e92d86427276c6eeab0a6c1d
Author: Christian Pellegrin <[email protected]>
Date:   Wed Feb 3 07:39:54 2010 +0000

    can: mcp251x: Move to threaded interrupts instead of workqueues.
    
    This patch addresses concerns about efficiency of handling incoming
    packets. Handling of interrupts is done in a threaded interrupt handler
    which has a smaller latency than workqueues. This change needed a rework
    of the locking scheme that was much simplified. Some other (more or less
    longstanding) bugs are fixed: utilization of just half of the RX
    buffers, useless wait for interrupt on open, more reliable reset
    sequence. The MERR interrupt is not used anymore: it overloads the CPU
    in error-passive state without any additional information. One shot mode
    is disabled because it's not clear if it can be handled efficiently on
    this CAN controller.
    
    Signed-off-by: Christian Pellegrin <[email protected]>
    Acked-by: Wolfgang Grandegger <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

> 
> I'm no CAN expert, but I think a correct solution would be to restart
> the MCP2515 after a MERRF was received and didn't go away (restart-ms).
> 
> The datasheet for the MCP2515 doesn't talk much about MERRF while
> receiving (just a sentence in section 7.4 Message Error Interrupt), and
> the chart for message reception shows why the Error Frames are generated
> endless.
> 
> Any pointers/suggestions?

See above. Just disable the MERR interrrupt.

Wolfgang.
_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Reply via email to