Wolfgang Grandegger wrote: > I found some time to work on this issue. > > So far we only report CAN error state changes to the worse: > "error-active -> error-warning -> error-passive -> bus-off", but not > "bus-off -> error-active" or "error-passive -> error-warning -> > error-active". There have been request from other users to correct > this behavior. > > This patch reports any state change reported by the SJA1000 and > MSCAN-MPC5200 controller. It introduces "CAN_ERR_CRTL_ACTIVE" to > signal state changes to error active. Furthermore, the SJA1000 > triggers now the bus-off recovery procedure when restarted. So far, > the controller was *stopped* and then restarted. I did two tests to > analyze the state changes behavior: > > 1. First I forced the controller into the error-passive by sending > a message (with cansend) without cable connected (*). Then I > reconnected the cable (**) and continued to send messages (with > cangen) (***) until the error active state is reached: > > On SJA1000: > > # candump any,0:0,#FFFFFFFF > (*) > can0 20000004 [8] 00 08 00 00 00 00 60 00 ERROR-WARNING > can0 20000004 [8] 00 20 00 00 00 00 80 00 ERROR-PASSIVE > (**) > can0 123 [4] DE AD BE EF > (***) > can0 42A [1] 00 > ... > can0 42A [1] 07 > can0 20000004 [8] 00 08 00 00 00 00 7F 00 ERROR-WARNING
Is it possible to tell apart whether the driver goes from passive to warning or from active to warning? Without the need to have a state machine in userspace, too? Looking at the tx/rx error level might be an option, but I've seen that the level can rise very quickly. > can0 42A [1] 08 > ... > can0 42A [1] 27 > can0 20000004 [8] 00 40 00 00 00 00 5F 00 ERROR-ACTIVE > \-- tx-err !!! > On MSCAN: > > # candump any,0:0,#FFFFFFFF > (*) > can2 20000004 [8] 00 08 00 00 00 00 00 00 ERROR-WARNING > can2 20000004 [8] 00 20 00 00 00 00 00 00 ERROR-PASSIVE > (**) > can2 123 [4] DE AD BE EF > can2 20000004 [8] 00 08 00 00 00 00 00 00 ERROR-WARNING > (***) > can2 42A [1] 00 > can2 42A [1] 01 > ... > can2 42A [1] 1E > can2 42A [1] 1F > can2 20000004 [8] 00 40 00 00 00 00 00 00 ERROR-ACTIVE > > As you can see, the state change to error-active is signaled > differently. The MSCAN seems not to respect the specs :-(. Do you mean the fact that error active is signalled with tx err == 0? From the patch below you only changed the part of the driver that runs if a change of state is detected "(old_state != priv->can.state)". Maybe it's a problem in the driver, that calculated the new state wrong. Maybe due to a problem with the hardware. (I don't know the driver nor the hardware). Looking at you patches below I want to rise my old proposal again. I think we should move the state machine that assembles and sends the error frames into a common function. Each driver just needs to have a proper function that calculates the current state and then call the state handling state machine. > When time permits, I will also try the MSCAN of the MPC5121. > 2. I forced the controller into bus-off by short-circuiting > the low and hi lines and sending a message (*), Then I > restarted the controller manually with > "ip link set can0 type can restart" (**): > > On SJA1000: > > # ./candump -t d any,0:0,#FFFFFFFF > (*) > can0 20000004 [8] 00 08 00 00 00 00 88 00 ERROR-WARNING > can0 20000004 [8] 00 20 00 00 00 00 88 00 ERROR-PASSIVE > can0 20000044 [8] 00 00 00 00 00 00 7F 00 BUS-OFF > (**) > can0 20000100 [8] 00 00 00 00 00 00 00 00 RESTARTED > can0 20000004 [8] 00 40 00 00 00 00 00 00 ERROR-ACTIVE > \-- tx-err !!! > > On MSCAN: > > # ./candump -t d any,0:0,#FFFFFFFF > (*) > can2 20000004 [8] 00 08 00 00 00 00 00 00 ERROR-WARNING > can2 20000040 [8] 00 00 00 00 00 00 00 00 BUS-OFF > (**) > can2 20000100 [8] 00 00 00 00 00 00 00 00 RESTARTED > can2 20000004 [8] 00 40 00 00 00 00 00 00 ERROR-ACTIVE > > The MSCAN seems not to signal the ERROR-PASSIVE state before > going bus-off. I think the signalling of ERROR-PASSIVE depends on multiple things: - the current bitrate. I think this doesn't count so much in this short circuit test. - the CAN hardware itself: does it signal this "in-the-middle-passive" interrupt - overall speed and load of your system: reaction time on interrupts, is your system fast enough to handle these two back-to-back interrupts as two single events - driver design, i.e. - is the state change signalled from the hard-irq or from napi: the second interrupt may occur somewhere between hardirq and napi, so napi just sees bus off - design of the error state machine: in the at91 and flexcan I always send a warning interrupt if going from active to passive. The last point is another argument for a common state machine. > Would be interesting to see, how other CAN controllers behave. > Volunteers are welcome. > > What do you think? Would that patch/approach fit your needs? > > Wolfgang. cheers, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Socketcan-core mailing list [email protected] https://lists.berlios.de/mailman/listinfo/socketcan-core
