Re: [RFC PATCH] can: improved CAN error state handling

Marc Kleine-Budde Sun, 25 Jul 2010 08:44:05 -0700

Wolfgang Grandegger wrote:
> On 07/25/2010 04:48 PM, Marc Kleine-Budde wrote:
>> Wolfgang Grandegger wrote:
>>> On 07/25/2010 03:35 PM, Marc Kleine-Budde wrote:
>>>> Wolfgang Grandegger wrote:
>>>>> I found some time to work on this issue.
>>>>>
>>>>> So far we only report CAN error state changes to the worse:
>>>>> "error-active -> error-warning -> error-passive -> bus-off", but not
>>>>> "bus-off -> error-active" or "error-passive -> error-warning ->
>>>>> error-active". There have been request from other users to correct
>>>>> this behavior.
>>>>>
>>>>> This patch reports any state change reported by the SJA1000 and
>>>>> MSCAN-MPC5200 controller. It introduces "CAN_ERR_CRTL_ACTIVE" to
>>>>> signal state changes to error active. Furthermore, the SJA1000
>>>>> triggers now the bus-off recovery procedure when restarted. So far,
>>>>> the controller was *stopped* and then restarted. I did two tests to
>>>>> analyze the state changes behavior:
>>>>>
>>>>> 1. First I forced the controller into the error-passive by sending
>>>>>    a message (with cansend) without cable connected (*). Then I
>>>>>    reconnected the cable (**) and continued to send messages (with
>>>>>    cangen) (***) until the error active state is reached:
>>>>>
>>>>>    On SJA1000:
>>>>>
>>>>>    # candump any,0:0,#FFFFFFFF
>>>>>    (*)
>>>>>    can0  20000004  [8] 00 08 00 00 00 00 60 00   ERROR-WARNING
>>>>>    can0  20000004  [8] 00 20 00 00 00 00 80 00   ERROR-PASSIVE
>>>>>    (**)
>>>>>    can0  123  [4] DE AD BE EF
>>>>>    (***)
>>>>>    can0  42A  [1] 00
>>>>>    ...
>>>>>    can0  42A  [1] 07
>>>>>    can0  20000004  [8] 00 08 00 00 00 00 7F 00   ERROR-WARNING
>>>> Is it possible to tell apart whether the driver goes from passive to
>>>> warning or from active to warning? Without the need to have a state
>>>> machine in userspace, too?
>>> Yes, I realized that side effect as well. The user must monitor state
>>> changes to understand if the state gets worse or better. We could use an
>>> error bit to indicate state improvements.
>> I like that idea! From naive point of view, in the userspace I want
>> information that I can interpret stateless (i.e. without the need for a
>> statemachine == history).
> 
> I tend to introduce the error class:
> 
>  #define CAN_ERR_STATE_CHANGE 0x00000200U /* state changed / data[1]
> 
> apart from
> 
>  #define CAN_ERR_CRTL   0x00000004U /* controller problems / data[1]
> 
> For any state change the CAN_ERR_STATE_CHANGE bit is set. If the state
> got worse, the CAN_ERR_CRTL bit will be set as well. This would also
> ensure backward compatibility.


What about CAN_CTRLMODE_BERR_REPORTING? How strict should it be
interpreted? In the flexcan driver, if CAN_CTRLMODE_BERR_REPORTING is
not active I don't send any bus error related error frames. Even if the
napi function is entered and the bus error bits are set!

>>> No need for another state machine. But a common function is still useful
>>> to fill an error frame for the state change, of course. I will try to
>>> come up with a proposal soon.
>> I think there's need for (another) state machine, let me try to explain.
>>
>> In the interrupt or napi handler each driver determines the current can
>> state.
>> This can vary from a simple: read register and translate into
>> CAN_STATE_*, to a more complex function like you've written above. This
>> functions can even have fixups for strange hardware, like the MSCAN: The
>> mscan may think it's still in error warning, but we know tx/rx error
>> level below 96 means error active.
>>
>> If the can state has changed the driver calls the
>> "fill-out-error-frame-function". This function will take care of strange
>> hardware again. E.g. if the can state changes from error passive to
>> error active the function can generate the missing to-error-warning
>> meesage. (If this is wanted). The same goes for the
>> state-is-getting-worse messages.
> 
> OK, the prototype of a common function could be (similar to your
> do_state function in the flexcan driver):
> 
>   int can_handle_state_change(struct net_device *dev,
>                               struct can_frame *cf,
>                               enum can_state new_state,
>                               int tx_err, int rx_err)

I'd drop the "struct can_frame *cf", the function can take care of
allocating the frame and returning in case of an error. But this would
break you proposed detect "getting worse".

Apart from this, {tx,rx}_err can be derived via the
priv->can.do_get_berr_counter function.

> If "tx_err" or "rx_err == -1", they are ignored. If "new_state ==

If priv->can.do_get_berr_counter if NULL, don't use it.

> CAN_STATE_UNKNOWN", the state could be derived from tx_err or rx_err.
> What do you think?

The CAN_STATE_UNKNOWN fallback is a good idea.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Re: [RFC PATCH] can: improved CAN error state handling

Reply via email to