Wolfgang Grandegger wrote: > On 07/25/2010 03:35 PM, Marc Kleine-Budde wrote: >> Wolfgang Grandegger wrote: >>> I found some time to work on this issue. >>> >>> So far we only report CAN error state changes to the worse: >>> "error-active -> error-warning -> error-passive -> bus-off", but not >>> "bus-off -> error-active" or "error-passive -> error-warning -> >>> error-active". There have been request from other users to correct >>> this behavior. >>> >>> This patch reports any state change reported by the SJA1000 and >>> MSCAN-MPC5200 controller. It introduces "CAN_ERR_CRTL_ACTIVE" to >>> signal state changes to error active. Furthermore, the SJA1000 >>> triggers now the bus-off recovery procedure when restarted. So far, >>> the controller was *stopped* and then restarted. I did two tests to >>> analyze the state changes behavior: >>> >>> 1. First I forced the controller into the error-passive by sending >>> a message (with cansend) without cable connected (*). Then I >>> reconnected the cable (**) and continued to send messages (with >>> cangen) (***) until the error active state is reached: >>> >>> On SJA1000: >>> >>> # candump any,0:0,#FFFFFFFF >>> (*) >>> can0 20000004 [8] 00 08 00 00 00 00 60 00 ERROR-WARNING >>> can0 20000004 [8] 00 20 00 00 00 00 80 00 ERROR-PASSIVE >>> (**) >>> can0 123 [4] DE AD BE EF >>> (***) >>> can0 42A [1] 00 >>> ... >>> can0 42A [1] 07 >>> can0 20000004 [8] 00 08 00 00 00 00 7F 00 ERROR-WARNING >> Is it possible to tell apart whether the driver goes from passive to >> warning or from active to warning? Without the need to have a state >> machine in userspace, too? > > Yes, I realized that side effect as well. The user must monitor state > changes to understand if the state gets worse or better. We could use an > error bit to indicate state improvements.
I like that idea! From naive point of view, in the userspace I want
information that I can interpret stateless (i.e. without the need for a
statemachine == history).
>> Looking at the tx/rx error level might be an option, but I've seen that
>> the level can rise very quickly.
>
> Be aware that the tx/rx error counters are *not* readable on all CAN
> controllers.
Right....
>>> can0 42A [1] 08
>>> ...
>>> can0 42A [1] 27
>>> can0 20000004 [8] 00 40 00 00 00 00 5F 00 ERROR-ACTIVE
>>> \-- tx-err !!!
>>> On MSCAN:
>>>
>>> # candump any,0:0,#FFFFFFFF
>>> (*)
>>> can2 20000004 [8] 00 08 00 00 00 00 00 00 ERROR-WARNING
>>> can2 20000004 [8] 00 20 00 00 00 00 00 00 ERROR-PASSIVE
>>> (**)
>>> can2 123 [4] DE AD BE EF
>>> can2 20000004 [8] 00 08 00 00 00 00 00 00 ERROR-WARNING
>>> (***)
>>> can2 42A [1] 00
>>> can2 42A [1] 01
>>> ...
>>> can2 42A [1] 1E
>>> can2 42A [1] 1F
>>> can2 20000004 [8] 00 40 00 00 00 00 00 00 ERROR-ACTIVE
>>>
>>> As you can see, the state change to error-active is signaled
>>> differently. The MSCAN seems not to respect the specs :-(.
>> Do you mean the fact that error active is signalled with tx err == 0?
>
> Yes, the SJA1000 triggers an interrupt to signal error active when the
> error counter goes below 96.
Okay.
>> From the patch below you only changed the part of the driver that runs
>> if a change of state is detected "(old_state != priv->can.state)".
>>
>> Maybe it's a problem in the driver, that calculated the new state wrong.
>> Maybe due to a problem with the hardware. (I don't know the driver nor
>> the hardware).
>
> I monitored the interrupt generation as well:
>
> [ 1234.089446] mpc5xxx_can f0000900.can: error interrupt (canrflg=0x44)
> [ 1234.092228] mpc5xxx_can f0000900.can: error interrupt (canrflg=0x48)
> [ 1240.960487] mpc5xxx_can f0000900.can: error interrupt (canrflg=0x44)
> [ 1352.336647] mpc5xxx_can f0000900.can: error interrupt (canrflg=0x40)
>
> The last interrupt comes that late, even if the state is misinterpreted.
> Note also that the MSCAN has a dedicated state field for TX and RX.
> Therefore the state is defined without a state machine in software.
>
>> Looking at you patches below I want to rise my old proposal again. I
>> think we should move the state machine that assembles and sends the
>> error frames into a common function. Each driver just needs to have a
>> proper function that calculates the current state and then call the
>> state handling state machine.
>
> The error handling code is already quite large and complex, especially
> for the purpose it serves. Any common function improving that situation
> would be nice. But unfortunately, we rely on the hardware :-(.
>
>>> When time permits, I will also try the MSCAN of the MPC5121.
>>> 2. I forced the controller into bus-off by short-circuiting
>>> the low and hi lines and sending a message (*), Then I
>>> restarted the controller manually with
>>> "ip link set can0 type can restart" (**):
>>>
>>> On SJA1000:
>>>
>>> # ./candump -t d any,0:0,#FFFFFFFF
>>> (*)
>>> can0 20000004 [8] 00 08 00 00 00 00 88 00 ERROR-WARNING
>>> can0 20000004 [8] 00 20 00 00 00 00 88 00 ERROR-PASSIVE
>>> can0 20000044 [8] 00 00 00 00 00 00 7F 00 BUS-OFF
>>> (**)
>>> can0 20000100 [8] 00 00 00 00 00 00 00 00 RESTARTED
>>> can0 20000004 [8] 00 40 00 00 00 00 00 00 ERROR-ACTIVE
>>> \-- tx-err !!!
>>>
>>> On MSCAN:
>>>
>>> # ./candump -t d any,0:0,#FFFFFFFF
>>> (*)
>>> can2 20000004 [8] 00 08 00 00 00 00 00 00 ERROR-WARNING
>>> can2 20000040 [8] 00 00 00 00 00 00 00 00 BUS-OFF
>>> (**)
>>> can2 20000100 [8] 00 00 00 00 00 00 00 00 RESTARTED
>>> can2 20000004 [8] 00 40 00 00 00 00 00 00 ERROR-ACTIVE
>>>
>>> The MSCAN seems not to signal the ERROR-PASSIVE state before
>>> going bus-off.
>> I think the signalling of ERROR-PASSIVE depends on multiple things:
>> - the current bitrate.
>> I think this doesn't count so much in this short circuit test.
>> - the CAN hardware itself: does it signal this "in-the-middle-passive"
>> interrupt
>> - overall speed and load of your system:
>> reaction time on interrupts,
>> is your system fast enough to handle these two back-to-back interrupts
>> as two single events
>> - driver design, i.e.
>> - is the state change signalled from the hard-irq or from napi:
>> the second interrupt may occur somewhere between hardirq and napi,
>> so napi just sees bus off
>> - design of the error state machine:
>> in the at91 and flexcan I always send a warning interrupt if going
>> from active to passive.
>>
>> The last point is another argument for a common state machine.
>
> Well, yes, a common state handling function would be nice, but as the
> hardware is not really smart, a simple state machine is not enough. It's
> not just reading the current state and check if it has changed. Have a
> look to the SJA1000 and MSCAN driver, e.g.:
>
> if (isrc & IRQ_EPI) {
> /* error passive interrupt */
> dev_dbg(dev->dev.parent, "error passive interrupt\n",
> status);
> if (status & SR_ES) {
> if (priv->can.state == CAN_STATE_ERROR_WARNING)
> state = CAN_STATE_ERROR_PASSIVE;
> else
> state = CAN_STATE_ERROR_WARNING;
> } else {
> state = CAN_STATE_ERROR_ACTIVE;
> }
> }
That what I was proposing. The individual can driver implements a
function to determine the current state and then calls the common error
frame fillout function.
> No need for another state machine. But a common function is still useful
> to fill an error frame for the state change, of course. I will try to
> come up with a proposal soon.
I think there's need for (another) state machine, let me try to explain.
In the interrupt or napi handler each driver determines the current can
state.
This can vary from a simple: read register and translate into
CAN_STATE_*, to a more complex function like you've written above. This
functions can even have fixups for strange hardware, like the MSCAN: The
mscan may think it's still in error warning, but we know tx/rx error
level below 96 means error active.
If the can state has changed the driver calls the
"fill-out-error-frame-function". This function will take care of strange
hardware again. E.g. if the can state changes from error passive to
error active the function can generate the missing to-error-warning
meesage. (If this is wanted). The same goes for the
state-is-getting-worse messages.
cheers, Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Socketcan-core mailing list [email protected] https://lists.berlios.de/mailman/listinfo/socketcan-core
