Re: [RFC PATCH] can: improved CAN error state handling

Marc Kleine-Budde Sun, 25 Jul 2010 06:35:56 -0700

Wolfgang Grandegger wrote:
> I found some time to work on this issue.
> 
> So far we only report CAN error state changes to the worse:
> "error-active -> error-warning -> error-passive -> bus-off", but not
> "bus-off -> error-active" or "error-passive -> error-warning ->
> error-active". There have been request from other users to correct
> this behavior.
> 
> This patch reports any state change reported by the SJA1000 and
> MSCAN-MPC5200 controller. It introduces "CAN_ERR_CRTL_ACTIVE" to
> signal state changes to error active. Furthermore, the SJA1000
> triggers now the bus-off recovery procedure when restarted. So far,
> the controller was *stopped* and then restarted. I did two tests to
> analyze the state changes behavior:
> 
> 1. First I forced the controller into the error-passive by sending
>    a message (with cansend) without cable connected (*). Then I
>    reconnected the cable (**) and continued to send messages (with
>    cangen) (***) until the error active state is reached:
> 
>    On SJA1000:
> 
>    # candump any,0:0,#FFFFFFFF
>    (*)
>    can0  20000004  [8] 00 08 00 00 00 00 60 00   ERROR-WARNING
>    can0  20000004  [8] 00 20 00 00 00 00 80 00   ERROR-PASSIVE
>    (**)
>    can0  123  [4] DE AD BE EF
>    (***)
>    can0  42A  [1] 00
>    ...
>    can0  42A  [1] 07
>    can0  20000004  [8] 00 08 00 00 00 00 7F 00   ERROR-WARNING


Is it possible to tell apart whether the driver goes from passive to
warning or from active to warning? Without the need to have a state
machine in userspace, too?

Looking at the tx/rx error level might be an option, but I've seen that
the level can rise very quickly.

>    can0  42A  [1] 08
>    ...
>    can0  42A  [1] 27
>    can0  20000004  [8] 00 40 00 00 00 00 5F 00   ERROR-ACTIVE
>                                           \-- tx-err !!!
>    On MSCAN:
> 
>    # candump any,0:0,#FFFFFFFF
>    (*)
>    can2  20000004  [8] 00 08 00 00 00 00 00 00   ERROR-WARNING
>    can2  20000004  [8] 00 20 00 00 00 00 00 00   ERROR-PASSIVE
>    (**)
>    can2  123  [4] DE AD BE EF
>    can2  20000004  [8] 00 08 00 00 00 00 00 00   ERROR-WARNING
>    (***)
>    can2  42A  [1] 00
>    can2  42A  [1] 01
>    ...
>    can2  42A  [1] 1E
>    can2  42A  [1] 1F
>    can2  20000004  [8] 00 40 00 00 00 00 00 00   ERROR-ACTIVE
> 
>    As you can see, the state change to error-active is signaled
>    differently. The MSCAN seems not to respect the specs :-(.

Do you mean the fact that error active is signalled with tx err == 0?

From the patch below you only changed the part of the driver that runs
if a change of state is detected "(old_state != priv->can.state)".

Maybe it's a problem in the driver, that calculated the new state wrong.
Maybe due to a problem with the hardware. (I don't know the driver nor
the hardware).

Looking at you patches below I want to rise my old proposal again. I
think we should move the state machine that assembles and sends the
error frames into a common function. Each driver just needs to have a
proper function that calculates the current state and then call the
state handling state machine.

>    When time permits, I will also try the MSCAN of the MPC5121.

> 2. I forced the controller into bus-off by short-circuiting
>    the low and hi lines and sending a message (*), Then I
>    restarted the controller manually with
>    "ip link set can0 type can restart" (**):
> 
>    On SJA1000:
> 
>    # ./candump -t d any,0:0,#FFFFFFFF
>    (*)
>    can0  20000004  [8] 00 08 00 00 00 00 88 00   ERROR-WARNING
>    can0  20000004  [8] 00 20 00 00 00 00 88 00   ERROR-PASSIVE
>    can0  20000044  [8] 00 00 00 00 00 00 7F 00   BUS-OFF
>    (**)
>    can0  20000100  [8] 00 00 00 00 00 00 00 00   RESTARTED
>    can0  20000004  [8] 00 40 00 00 00 00 00 00   ERROR-ACTIVE
>                                           \-- tx-err !!!
> 
>    On MSCAN:
> 
>    # ./candump -t d any,0:0,#FFFFFFFF
>    (*)
>    can2  20000004  [8] 00 08 00 00 00 00 00 00   ERROR-WARNING
>    can2  20000040  [8] 00 00 00 00 00 00 00 00   BUS-OFF
>    (**)
>    can2  20000100  [8] 00 00 00 00 00 00 00 00   RESTARTED
>    can2  20000004  [8] 00 40 00 00 00 00 00 00   ERROR-ACTIVE
> 
>    The MSCAN seems not to signal the ERROR-PASSIVE state before
>    going bus-off.

I think the signalling of ERROR-PASSIVE depends on multiple things:
- the current bitrate.
  I think this doesn't count so much in this short circuit test.
- the CAN hardware itself: does it signal this "in-the-middle-passive"
  interrupt
- overall speed and load of your system:
  reaction time on interrupts,
  is your system fast enough to handle these two back-to-back interrupts
  as two single events
- driver design, i.e.
- is the state change signalled from the hard-irq or from napi:
  the second interrupt may occur somewhere between hardirq and napi,
  so napi just sees bus off
- design of the error state machine:
  in the at91 and flexcan I always send a warning interrupt if going
  from active to passive.

The last point is another argument for a common state machine.

> Would be interesting to see, how other CAN controllers behave.
> Volunteers are welcome.
> 
> What do you think? Would that patch/approach fit your needs?
> 
> Wolfgang.

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Re: [RFC PATCH] can: improved CAN error state handling

Reply via email to