Re: [PATCH RFC v2] Add support for Janz ICAN3 CAN module

Wolfgang Grandegger Wed, 17 Feb 2010 00:48:44 -0800

Ira W. Snyder wrote:
> On Tue, Feb 16, 2010 at 11:20:26PM +0100, Wolfgang Grandegger wrote:
>> Ira W. Snyder wrote:
>>> On Mon, Feb 15, 2010 at 06:13:11PM +0100, Wolfgang Grandegger wrote:
>>>> Hi Ira,
>>>>
>>>> Ira W. Snyder wrote:
>>>>> +/* Get the MSYNC bits from the "old-style" interface control registers */
>>>>> +static void janz_get_msync(struct janz_ican3 *mod, u8 *locl, u8 *peer)
>>>>> +{
>>>>> + janz_set_page(mod, 0);
>>>>> + *peer = ioread8(mod->regs + MSYNC_PEER);
>>>>> + *locl = ioread8(mod->regs + MSYNC_LOCL);
>>>>> +}
>>>> What are your arguments against using structures to describe the
>>>> register layout?
>>>>
>>> I don't have any strong arguments against using a structure to describe
>>> register layout, except:
>>>
>>> 1) all datasheets I've ever used have registers described by hexadecimal
>>> offsets. Using a structure, you have to manually calculate offsets when
>>> checking that the structure is defined correctly. Once it is correct,
>>> this isn't a big deal.
>> In general structs are preferred as they allow type checking and result
>> usually in more readable code. #defines are more handy if you deal with
>> offset instead of addresses.
>>
>>> 2) These registers only exist in a single DPM page, page #0. They are
>>> not "always accessible", as in a device which uses mmio. You must switch
>>> the DPM page before accessing them.
>>>
>>> Does it really make sense to create a structure just for calculating the
>>> offsets in one DPM page? For the "janz onboard registers" (described
>>> with "JANZ_OB_* defines), I agree. I'll make a structure for those.
>> Please check if it results in better code.
>>
>> [snip]
> 
> Ok, rethinking my earlier decision for the "onboard registers", I think
> that a structure makes things much harder to read. The registers have
> completely different meaning depending on the type of access. Meaning,
> reads have one meaning, and writes have another. This cannot be
> expressed with structures.
> 
> For example, the register at offset 0x3.
> 
> Read: MODULbus number (the value from the hex switch on the board)
> Write: MODULbus interrupt enable (enables a certain daughterboard's
> interrupt line)
> 
> Those are two completely different functions. If I were to use a
> structure, one of those meanings will be lost in the name.
> 
> It is much more clear that this code is correct, and reads the hex
> switch:
> 
> /* read the hexadecimal switch */
> hex = ioread8(regs + JANZ_OB_MBUS_NUM);
> 
> Than this:
> 
> /* read the hexadecimal switch */
> hex = ioread8(&regs->interrupt_enable);


I clearly *disagree*! And you have *proper* type checking in the latter
case. Structures are cumbersome if you have to deal with offsets but for
addresses they are perfectly fine.

> I'd bet that someone will question the latter code example. The naming
> just seems wrong for the function the comment describes.
> 
>>>>> + if ((xord & 0x30) == 0x30) {
>>>>> +         dev_err(mod->dev, "no mbox for writing\n");
>>>>> +         return -ENOMEM;
>>>>> + }
>>>> Here and in many other places macro definitions would make the code more
>>>> readable.
>>>>
>>> Yep, There are a few of these. My problem is this: what names do I use?
>>> The way this device works is stupid. You'll want to read pages C-333 to
>>> C-335 in the manual.
>>>
>>> The basics are:
>>> You have two registers, MSYNC_PEER and MSYNC_LOCL. The firmware can
>>> only write to MSYNC_PEER, and the driver can only write to MSYNC_LOCL.
>>>
>>> The registers are laid out like this:
>>>
>>> MSYNC_PEER:
>>> bit0: RB0
>>> bit1: RB1
>>> bit2: LW
>>> bit4: WB0
>>> bit5: WB1
>>>
>>> MSYNC_LOCL:
>>> bit0: RB0
>>> bit1: RB1
>>> bit4: WB0
>>> bit5: WB1
>>> bit6: LW
>>>
>>> When you get an interrupt, and you want to read a message from the
>>> controller, you must XOR the RBn bits. If you get a 1 back for a certain
>>> bit position, then you have a buffer to read. The same is true for the
>>> WBn bits, except that the XOR will be 0 when you have a buffer to read.
>>>
>>> In addition, you must read/set the LW (last written) bit to correspond
>>> to the buffer you last wrote to.
>>>
>>> I would be happy to use defines to make this easier to read, but I have
>>> *no idea* what to name them. Suggestions please.
>> #define MSYNC_PEER_RB0 0x01
>> #define MSYNC_PEER_RB1 0x02
>>
>> and so on.
>>
>> BTW: an inline function for that bit gymnastics would be useful as well.
>>
>> [snip]
>>>>> + memset(&msg, 0, sizeof(msg));
>>>>> + msg.control = 0x00;
>>>>> + msg.spec    = MSG_INITFDPMQUEUE;
>>>>> + msg.len     = cpu_to_le16(8);
>>>> Please do not align expressions. Just use *one* space before and after
>>>> "=". Please fix globally.
>>>>
>>> Ok. This will be in the next version of the patch. I thought it made the
>>> code easier to read.
>> It's required by the coding style.
>>
>>>>> +static void can_to_janz(struct janz_ican3 *mod, struct can_frame *cf,
>>>>> +                 struct janz_fast_desc *desc)
>>>> Use a better name please.
>>>>
>>> I'll try to think something up. All these two functions do is convert a
>>> struct can_frame to the appropriate structure for the firmware, and
>>> back. How about "janz_convert_tocan() and janz_convert_fromcan()"?
>> can_frame_to_janz and janz_to_can_frame would already be fine. "can" is
>> to general.
>>
>> [snip]
>>>>> +
>>>>> + cf->can_id |= idflags;
>>>>> + cf->data[1] = d1;
>>>> Hm, the data field to be used depends on the error type.
>>>>
>>> This code is copied from esd_pci331.c. If I don't have a good example,
>>> how can I be expected to do things correctly!
>>>
>>> How do I find out exactly which field belongs to which data type?
>> See include/linux/can/error.h
>>
>>>>> +
>>>>> + netif_rx(skb);
>>>>> +
>>>>> + stats->rx_packets++;
>>>>> + stats->rx_bytes += cf->can_dlc;
>>>>> + return 0;
>>>>> +}
>> [snip]
>>>>> + rxerr = msg->data[4];
>>>>> + txerr = msg->data[5];
>>>> Should go to field 6 and 7.
>>>>
>>> You're confused. These come from the firmware, they are not going into a
>>> struct can_frame. These are the buffer locations in the message sent by
>>> the firmware for rx/tx error counter registers.
>> Right. Sorry for the noise.
>>
>>>>> +
>>>>> + /* state is error-active by default */
>>>>> + state = CAN_STATE_ERROR_ACTIVE;
>>>>> +
>>>>> + if (rxerr >= 96 || txerr >= 96)
>>>>> +         state = CAN_STATE_ERROR_WARNING;
>>>>> +
>>>>> + if (rxerr >= 128 || txerr >= 128)
>>>>> +         state = CAN_STATE_ERROR_PASSIVE;
>>>>> +
>>>>> + if (rxerr >= 255 || txerr >= 255)
>>>>> +         state = CAN_STATE_BUS_OFF;
>>>> You could use "else if" if you revert the order.
>>> Ok, I'll try that in the next version.
>>>
>>>> Also, 255 does not yet
>>>> mean bus-error, strictly speaking. Have you seen the device going bus-off?
>>>>
>>> Well, these are byte-sized registers. They can never exceed 255 (0xff),
>>> they would exceed their bit-width. What *should* I do here?
>>>
>>> I don't think I've seen the device go bus-off. How would I cause that
>>> situation?
>> Short-circuit the CAN low and high wires and send a message. Do you get
>> any notification?
>>
> 
> Yep, this gave the following:
> 
> r...@labslcor3 ~/socketcan/can-utils # ./candump -t d any,0:0,#FFFFFFFF
>  (0.000000)  can0  20000004  [8] 00 20 00 00 00 00 00 00   ERRORFRAME
                                       \ CAN_ERR_CRTL_TX_PASSIVE

>  (0.000000)  can0  20000004  [8] 00 08 00 00 00 00 00 00   ERRORFRAME
                                       \ CAN_ERR_CRTL_TX_WARNING

They should come in different order and the bus-off message is missing.

> In the other window:
> 
> r...@labslcor3 ~/socketcan/can-utils # ./cansend can0 5a1#aabbccdd
> 
> After I get these two messages, the controller doesn't send anything
> else until I reset it. I'll dig through the datasheet and see if there
> is something else I need to enable to get more/better error messages.

That's the expected behavour for bus-off. A bus-off recovery is required
to reactivate the controller.

> I tried enabling the "bus-error" feature, but it does not change the
> messages the firmware sends to me, both without a cable connected, and
> with the CAN low and high wires shorted together.

The "bus-error" feature will report each individual bus error.

>>>>> +
>>>>> + /* check if we should generate an error frame at all */
>>>>> + if (state == mod->can.state || mod->can.state == CAN_STATE_STOPPED) {
>>>>> +         dev_dbg(mod->dev, "no error frame needed: state %d\n", state);
>>>>> +         return;
>>>>> + }
>>>> Shouldn't this check be done first?
>>>>
>>> The variable "state" isn't set then. This code is basically copied from
>>> esd_pci331.c, but I moved the checking of priv->can.state out of each
>>> 'case', into one place.
>>>
>>>>> + dev_dbg(mod->dev, "state change: state %d\n", state);
>>>>> + mod->can.state = state;
>>>>> +
>>>>> + if (state == CAN_STATE_BUS_OFF) {
>>>>> +         janz_err_frame(mod, CAN_ERR_BUSOFF, 0);
>>>>> +         return;
>>>>> + }
>>>>> +
>>>>> + if (state == CAN_STATE_ERROR_PASSIVE) {
>>>>> +         err = (rxerr >= 128) ? CAN_ERR_CRTL_RX_PASSIVE
>>>>> +                              : CAN_ERR_CRTL_TX_PASSIVE;
>>>>> +         janz_err_frame(mod, CAN_ERR_CRTL, err);
>>>>> +         return;
>>>>> + }
>>>>> +
>>>>> + if (state == CAN_STATE_ERROR_WARNING) {
>>>>> +         err = (rxerr >= 96) ? CAN_ERR_CRTL_RX_WARNING
>>>>> +                             : CAN_ERR_CRTL_TX_WARNING;
>>>>> +         janz_err_frame(mod, CAN_ERR_CRTL, err);
>>>>> +         return;
>>>>> + }
>>>> If you use "else if" as suggested, the code could be shortened a lot, I
>>>> think (by doing everything within the if/else block).
>>>>
>>>> Just for curiosity, what does "candump -t d any,0:0,#FFFFFFFF" report
>>>> when you trigger a bus-error or when you send a message with no cable
>>>> connected.
>>>>
>>> Nothing. The device just eventually runs out of send buffers when no
>>> cable is connected. It doesn't send any error messages.
>> Hm, if you do not get any notification this code is useless.
>>
> 
> Actually, I must have been running something incorrectly. See the output
> from candump provided above. I now get this in both cases:
> 1) no cable connected
> 2) CAN high and low wires shorted together

OK.

>>>>> + /* nothing needed for error-active state */
>>>>> +}
>>>>> +
>>>>> +static void janz_handle_unknown(struct janz_ican3 *mod, struct janz_msg 
>>>>> *msg)
>>>> Handle what?
>>>>
>>> The firmware has about 50 different messages it can send to the driver.
>>> I only handled the few that we expect to see. I put this in place so we
>>> get a log of message types that the driver doesn't handle yet.
>>>
>>> Should I remove it, and just drop all messages we don't handle without
>>> printing anything? Seems like a hindrance to debugging problems.
>> I have just a problem with the name, which is not specific enough. A
>> better name would be "janz_handle_unknown_message".
>>
>> [snip]
>>>>> + switch (msg->spec) {
>>>>> + case MSG_IDVERS:
>>>>> +         janz_handle_idvers(mod, msg);
>>>>> +         break;
>>>>> + case MSG_MSGLOST:
>>>>> + case MSG_FMSGLOST:
>>>>> +         janz_handle_msglost(mod, msg);
>>>>> +         break;
>>>> You shoud create error messages for msglost as well.
>>>>
>>> Ok. What kind of messages? This message means that the driver is not
>>> reading CAN frames fast enough, and the firmware ran out of locations to
>>> store them. It is exactly the same as this driver's xmit() routine
>>> running out of buffers to place CAN messages for sending.
>> CAN_ERR_CRTL_RX_OVERFLOW and CAN_ERR_CRTL_RX_OVERFLOW from
>> http://lxr.linux.no/#linux+v2.6.32/include/linux/can/error.h.
>>
>> [snip]
>>>>> +         /* write back the control bits with IVALID unset */
>>>>> +         control &= ~DESC_IVALID;
>>>>> +         iowrite8(control, addr);
>>>> This seems to be duplicated code. Here a helper function would make
>>>> sense in contrast to your *one* line functions, e.g. to enable the
>>>> interrupts, which just increases code size.
>>>>
>>> The one-liners are marked inline, so they should get compiled to
>>> io(read|write)8(), without any function call overhead. I used them
>>> because their *names* are descriptive.
>> You do that in a few cases but in many other you use ioread8() directly.
>> I'm not worried about overhead.
>>
>>> It is much easier to tell that this clears interrupts:
>>> janz_clr_int(mod);
>>>
>>> Than this:
>>> ioread8(mod->regs + DPM_REG_INT);
>>>
>>> Don't you think?
>> /* Clear interrupt */
>> ioread8(mod->regs + DPM_REG_INT);
>>
>> does make pretty clear what the call does and I do not need to lookup
>> how it's implemented. If the interrupt is cleared in more than one place
>> or if it's a complex action, a function is useful, of course.
>>
>>> I'm not sure how to clearly function-ize the clearing of the IVALID bit
>>> in descriptors, but I'll try for the next version.
>> Yes, a set of functions would be handy, I think.
>>
>> [snip]
>>>>> + /* bring the bus online */
>>>>> + ret = janz_set_bus_state(mod, true);
>>>>> + if (ret) {
>>>>> +         dev_err(mod->dev, "unable to set bus-on\n");
>>>>> +         return ret;
>>>>> + }
>>>> How is bus-off recovery supposed to work? In general, if the card/hw
>>>> recovery automatically, we use the following procedure:
>>>>
>>>> restart_ms == 0: the device should be *stopped* on bus-off allowing
>>>>                  to user/app to restart it manually using this function.
>>>>
>>>> restart_ms  > 0: the device is allowed to recover from bus-off
>>>>                  automatically.
>>>>
>>>> Could that be implemented?
>>>>
>>> The datasheet doesn't mention the word "restart" or "recover". I guess
>>> it doesn't recover automatically. Again, this code was copied from the
>>> esd_pci331.c driver. Is that driver's error recovery broken? What should
>>> I be doing here?
>> Makes sense as the SJA1000 does not recover automatically. But then some
>> kind of notification should be send otherwise the app does not known
>> what's going on.
>>
> 
> What kind of notification should be sent? Are the error messages
> provided in the candump output above sufficient?

The hardware must send a notification otherwise the software can not act
upon. Have a look how bus-off is handle on the SJA1000:

http://lxr.linux.no/#linux+v2.6.32/drivers/net/can/sja1000/sja1000.c#L378

Bus-off recovery is then done automatically if restart-ms > 0. Otherwise
the use needs to trigger a restart manually using:

  ./ip link set can0 type can restart

or with the equivalent netlink call.

>>>>> + /* disable our IRQ, then hookup the IRQ handler */
>>>>> + janz_disable_interrupts(mod);
>>>>> + ret = request_irq(mod->irq, janz_irq, IRQF_SHARED, DRV_NAME, mod);
>>>>> + if (ret) {
>>>>> +         dev_err(dev, "unable to request IRQ\n");
>>>>> +         goto out_iounmap_ctrl;
>>>>> + }
>>>> Is this interrupt exclisively for CAN? ... or do you need a dispatcher
>>>> in the MODULbus driver?
>>>>
>>> The interrupt is a function of the PCI bridge chip on the CMOD-IO
>>> carrier board. However, it uses a "write-1 to enable" and "write-1 to
>>> disable" interface, meaning each MODULbus module can have their own
>>> interrupt handler. The interrupt handler must check that its module is
>>> interrupting, just like any handler on a shared IRQ line.
>>>
>>> I don't need a dispatcher, but each module does need to know it's module
>>> number. See the janz_irq() function, and notice how I use a bitwise and
>>> with (1 << mod->num) to determine if this module is interrupting.
>>>
>>> AFAIK, Linux always calls all interrupt handlers attached to a shared
>>> interrupt line.
>> OK, that's fine.
>>
> 
> Thanks again for the review. I've started incorporating your comments,
> and I'll have another version soon.

Thanks. As you want to go mainline with that driver, it's a bit more
work than usual, especially because it touches various kernel
sub-systems. We are more relaxed adding it to the SVN repository.

Wolfgang.
_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Re: [PATCH RFC v2] Add support for Janz ICAN3 CAN module

Reply via email to