Re: [RFC PATCH] can: improved CAN error state handling

Wolfgang Grandegger Sun, 25 Jul 2010 08:08:17 -0700

On 07/25/2010 04:48 PM, Marc Kleine-Budde wrote:
> Wolfgang Grandegger wrote:
>> On 07/25/2010 03:35 PM, Marc Kleine-Budde wrote:
>>> Wolfgang Grandegger wrote:
>>>> I found some time to work on this issue.
>>>>
>>>> So far we only report CAN error state changes to the worse:
>>>> "error-active -> error-warning -> error-passive -> bus-off", but not
>>>> "bus-off -> error-active" or "error-passive -> error-warning ->
>>>> error-active". There have been request from other users to correct
>>>> this behavior.
>>>>
>>>> This patch reports any state change reported by the SJA1000 and
>>>> MSCAN-MPC5200 controller. It introduces "CAN_ERR_CRTL_ACTIVE" to
>>>> signal state changes to error active. Furthermore, the SJA1000
>>>> triggers now the bus-off recovery procedure when restarted. So far,
>>>> the controller was *stopped* and then restarted. I did two tests to
>>>> analyze the state changes behavior:
>>>>
>>>> 1. First I forced the controller into the error-passive by sending
>>>>    a message (with cansend) without cable connected (*). Then I
>>>>    reconnected the cable (**) and continued to send messages (with
>>>>    cangen) (***) until the error active state is reached:
>>>>
>>>>    On SJA1000:
>>>>
>>>>    # candump any,0:0,#FFFFFFFF
>>>>    (*)
>>>>    can0  20000004  [8] 00 08 00 00 00 00 60 00   ERROR-WARNING
>>>>    can0  20000004  [8] 00 20 00 00 00 00 80 00   ERROR-PASSIVE
>>>>    (**)
>>>>    can0  123  [4] DE AD BE EF
>>>>    (***)
>>>>    can0  42A  [1] 00
>>>>    ...
>>>>    can0  42A  [1] 07
>>>>    can0  20000004  [8] 00 08 00 00 00 00 7F 00   ERROR-WARNING
>>> Is it possible to tell apart whether the driver goes from passive to
>>> warning or from active to warning? Without the need to have a state
>>> machine in userspace, too?
>>
>> Yes, I realized that side effect as well. The user must monitor state
>> changes to understand if the state gets worse or better. We could use an
>> error bit to indicate state improvements.
> 
> I like that idea! From naive point of view, in the userspace I want
> information that I can interpret stateless (i.e. without the need for a
> statemachine == history).


I tend to introduce the error class:

 #define CAN_ERR_STATE_CHANGE 0x00000200U /* state changed / data[1]

apart from

 #define CAN_ERR_CRTL   0x00000004U /* controller problems / data[1]

For any state change the CAN_ERR_STATE_CHANGE bit is set. If the state
got worse, the CAN_ERR_CRTL bit will be set as well. This would also
ensure backward compatibility.

...

>> No need for another state machine. But a common function is still useful
>> to fill an error frame for the state change, of course. I will try to
>> come up with a proposal soon.
> 
> I think there's need for (another) state machine, let me try to explain.
> 
> In the interrupt or napi handler each driver determines the current can
> state.
> This can vary from a simple: read register and translate into
> CAN_STATE_*, to a more complex function like you've written above. This
> functions can even have fixups for strange hardware, like the MSCAN: The
> mscan may think it's still in error warning, but we know tx/rx error
> level below 96 means error active.
> 
> If the can state has changed the driver calls the
> "fill-out-error-frame-function". This function will take care of strange
> hardware again. E.g. if the can state changes from error passive to
> error active the function can generate the missing to-error-warning
> meesage. (If this is wanted). The same goes for the
> state-is-getting-worse messages.

OK, the prototype of a common function could be (similar to your
do_state function in the flexcan driver):

  int can_handle_state_change(struct net_device *dev,
                              struct can_frame *cf,
                              enum can_state new_state,
                              int tx_err, int rx_err)

If "tx_err" or "rx_err == -1", they are ignored. If "new_state ==
CAN_STATE_UNKNOWN", the state could be derived from tx_err or rx_err.
What do you think?

Wolfgang.
_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Re: [RFC PATCH] can: improved CAN error state handling

Reply via email to