Thanks Benny,

Did you manage to remove all the problem in your system? Did you do
anything else than applying the patch you attached earlier?


Kindest regards,

//Erik

On Mon, Sep 13, 2010 at 8:52 AM, Benny B. Simonsen <[email protected]> wrote:
>  Den 12-09-2010 21:24, Erik Calissendorff skrev:
>>
>> Hi we are having similar problem with our MCP2515 connected to a Gumstix
>> Overo.
>>
>> Below is a dump from candump of packages when sending a burst of data
>> representing
>> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
>> with one character per packet.
>>
>>  (1279702812.562408)  can0  1A300280  [1] 61
>>  (1279702812.568237)  can0    4  [8] 00 01 00 00 00 00 00 00
>>  (1279702812.569458)  can0  1A300280  [1] 67
>>  (1279702812.571502)  can0  1A300280  [1] 68
>>  (1279702812.571594)  can0  1A300280  [1] 69
>>  (1279702812.571899)  can0  1A300280  [1] 6A
>>  (1279702812.577453)  can0    4  [8] 00 01 00 00 00 00 00 00
>>  (1279702812.577575)  can0  1A300280  [1] 6F
>>  (1279702812.581634)  can0    4  [8] 00 01 00 00 00 00 00 00
>>  (1279702812.581756)  can0  1A300280  [1] 73
>>  (1279702812.582397)  can0  1A300280  [1] 74
>>  (1279702812.583129)  can0  1A300280  [1] 75
>>  (1279702812.584869)  can0  1A300280  [1] 76
>>  (1279702812.585449)  can0  1A300280  [1] 77
>>  (1279702812.586273)  can0  1A300280  [1] 78
>>  (1279702812.587310)  can0  1A300280  [1] 79
>>  (1279702812.588348)  can0  1A300280  [1] 7A
>>  (1279702812.589599)  can0  1A300280  [1] 41
>>  (1279702812.590606)  can0  1A300280  [1] 42
>>  (1279702812.591613)  can0  1A300280  [1] 43
>>  (1279702812.592864)  can0  1A300280  [1] 44
>>  (1279702812.593780)  can0  1A300280  [1] 45
>>  (1279702812.594604)  can0  1A300280  [1] 46
>>  (1279702812.595825)  can0  1A300280  [1] 47
>>  (1279702812.596893)  can0  1A300280  [1] 48
>>  (1279702812.597930)  can0  1A300280  [1] 49
>>  (1279702812.599426)  can0  1A300280  [1] 4A
>>  (1279702812.600646)  can0  1A300280  [1] 4B
>>  (1279702812.600952)  can0  1A300280  [1] 4C
>>  (1279702812.602111)  can0  1A300280  [1] 4D
>>  (1279702812.603118)  can0  1A300280  [1] 4E
>>  (1279702812.604156)  can0  1A300280  [1] 4F
>>  (1279702812.605224)  can0  1A300280  [1] 50
>>  (1279702812.606658)  can0  1A300280  [1] 51
>>  (1279702812.607269)  can0  1A300280  [1] 52
>>  (1279702812.608459)  can0  1A300280  [1] 53
>>  (1279702812.609344)  can0  1A300280  [1] 54
>>  (1279702812.610992)  can0  1A300280  [1] 55
>>  (1279702812.611389)  can0  1A300280  [1] 56
>>  (1279702812.612487)  can0  1A300280  [1] 57
>>  (1279702812.613494)  can0  1A300280  [1] 58
>>  (1279702812.615020)  can0  1A300280  [1] 59
>>  (1279702812.616241)  can0  1A300280  [1] 5A
>>  (1279702812.616638)  can0  1A300280  [1] 30
>>  (1279702812.624145)  can0    4  [8] 00 01 00 00 00 00 00 00
>>  (1279702812.624237)  can0  1A300280  [1] 37
>>  (1279702812.625061)  can0  1A300280  [1] 38
>>  (1279702812.625823)  can0  1A300280  [1] 39
>>
>> As you can see we are getting a number of dropped frames with id 4,
>> unsure where these frames comes from.
>>
>> I have applied the patch from Benny mentioned earlier in this thread
>> to the 2.6.35 kernel, and the result is better then before are now
>> seeing these id 4 frames.
>>
>> Is there anything else that I should try?
>>
>>
>>
>> Kindest regards,
>>
>> //Erik
>>
>> On Thu, Sep 2, 2010 at 6:48 AM, Michael Stocks
>> <[email protected]>  wrote:
>>>
>>> Dear All,
>>> I'm experiencing lost frames and id 0, 8 byte all zero packets now that
>>> we have equipment sending back to back packets on a 250Kb network.
>>> I have plenty of hardware attached to Gumstix Overos so I'll test your
>>> patch once the kernel is rebuilt.
>>>
>>> Cheers Mike.
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of christian
>>> pellegrin
>>> Sent: Thursday, 2 September 2010 02:36
>>> To: Benny B. Simonsen
>>> Cc: [email protected]
>>> Subject: Re: [Socketcan-users] MCP 2515 RX problems
>>>
>>> Hi, first of all sorry for the late response. Just now I managed to put
>>> together the needed hardware (you know, in Italy it's near impossible to do
>>> anything not related to holidays in august ;-) )
>>>
>>> I had a quick look at your patch and it's similar to the one I prepared
>>> and I'm attaching below. I think both of them solve the problem with out of
>>> order frame. What is worrying me is that packet with all zeros. I never saw
>>> something similar. I'll put my system on test during the next night to see
>>> if I can reproduce it. I guess you put the code to reset RX flags in case of
>>> error to try to solve this glitch, didn't you?
>>>
>>> Anyway I will try your patch tomorrow.
>>>
>>> diff --git a/drivers/net/can/mcp251x.c b/drivers/net/can/mcp251x.c index
>>> 177f2e1..e8c5e37 100644
>>> --- a/drivers/net/can/mcp251x.c
>>> +++ b/drivers/net/can/mcp251x.c
>>> @@ -773,8 +773,9 @@ static irqreturn_t mcp251x_can_ist(int irq, void
>>> *dev_id)
>>>        struct mcp251x_priv *priv = dev_id;
>>>        struct spi_device *spi = priv->spi;
>>>        struct net_device *net = priv->net;
>>> +       u8 intf;
>>>  #ifdef STATS
>>> -       static int irqs, runs, ov0, ov1, rx0, rx1;
>>> +       static int irqs, runs;
>>>        static int ee[8], ii[8];
>>>        static unsigned long last;
>>>        unsigned long now;
>>> @@ -788,7 +789,7 @@ static irqreturn_t mcp251x_can_ist(int irq, void
>>> *dev_id)
>>>                now = jiffies;
>>>                if (now>= last + HZ) {
>>>                        last = now;
>>> -                       printk("%lu: i %d r %d r0 %d r1 %d o0 %d o1 %d
>>> err", now, irqs,
>>> runs, rx0, rx1, ov0, ov1);
>>> +                       printk("%lu: i %d r %d err", now, irqs, runs);
>>>                        for(i = 0; i<  8; i++) {
>>>                                printk(",%d", ee[i]);
>>>                                ee[i] = 0;
>>> @@ -799,24 +800,20 @@ static irqreturn_t mcp251x_can_ist(int irq, void
>>> *dev_id)
>>>                                ii[i] = 0;
>>>                        }
>>>                        printk("\n");
>>> -                       irqs = runs = rx0 = rx1 = ov0 = ov1 = 0;
>>> +                       irqs = runs = 0;
>>>                }
>>>        }
>>>        irqs += 1;
>>>  #endif
>>>
>>>        mutex_lock(&priv->mcp_lock);
>>> -       while (!priv->force_quit) {
>>> +       intf = mcp251x_read_reg(spi, CANINTF);
>>> +       do {
>>>                enum can_state new_state;
>>> -               u8 intf = mcp251x_read_reg(spi, CANINTF);
>>> -               u8 eflag;
>>> +               u8 eflag = 0;
>>>                int can_id = 0, data1 = 0;
>>>  #ifdef STATS
>>>                runs += 1;
>>> -               if (intf&  CANINTF_RX0IF)
>>> -                       rx0 += 1;
>>> -               if (intf&  CANINTF_RX1IF)
>>> -                       rx1 += 1;
>>>                {
>>>                        int i;
>>>                        for(i = 0; i<  8; i++) {
>>> @@ -828,25 +825,31 @@ static irqreturn_t mcp251x_can_ist(int irq, void
>>> *dev_id)
>>>
>>>                if (intf&  CANINTF_RX0IF) {
>>>                        mcp251x_hw_rx(spi, 0);
>>> -                       /* Free one buffer ASAP */
>>> -                       mcp251x_write_bits(spi, CANINTF, intf&
>>>  CANINTF_RX0IF,
>>> -                                          0x00);
>>> +
>>> +                       /* check that a packet didn't come while emptying
>>> RX0 */
>>> +                       if ((intf&  CANINTF_RX1IF) == 0)
>>> +                               intf |= mcp251x_read_reg(spi, CANINTF);
>>>                }
>>>
>>>                if (intf&  CANINTF_RX1IF)
>>>                        mcp251x_hw_rx(spi, 1);
>>>
>>> +               /* This is needed to close a race condition that causes
>>> +                * packet reordening: we must assure that we never empty
>>> +                * RX0 but not RX1. If this happens the next packet will
>>> +                * land in RX0. As a conseuqence on the next do { } while
>>> +                * spin we will have packets out of order.
>>> +                */
>>> +               intf |= CANINTF_RX1IF;
>>> +
>>>                mcp251x_write_bits(spi, CANINTF, intf, 0x00);
>>>
>>> -               eflag = mcp251x_read_reg(spi, EFLG);
>>> -               mcp251x_write_reg(spi, EFLG, 0x00);
>>> +               if (intf&&  CANINTF_ERRIF) {
>>> +                       eflag = mcp251x_read_reg(spi, EFLG);
>>> +                       mcp251x_write_reg(spi, EFLG, 0x00);
>>> +               }
>>>
>>>  #ifdef STATS
>>> -               if (eflag&  EFLG_RX0OVR)
>>> -                       ov0 += 1;
>>> -
>>> -               if (eflag&  EFLG_RX1OVR)
>>> -                       ov1 += 1;
>>>                {
>>>                        int i;
>>>                        for(i = 0; i<  8; i++) {
>>> @@ -924,9 +927,6 @@ static irqreturn_t mcp251x_can_ist(int irq, void
>>> *dev_id)
>>>                        }
>>>                }
>>>
>>> -               if (intf == 0)
>>> -                       break;
>>> -
>>>                if (intf&  (CANINTF_TX2IF | CANINTF_TX1IF |
>>> CANINTF_TX0IF)) {
>>>                        net->stats.tx_packets++;
>>>                        net->stats.tx_bytes += priv->tx_len - 1; @@ -936,7
>>> +936,8 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
>>>                        }
>>>                        netif_wake_queue(net);
>>>                }
>>> -       }
>>> +               intf = mcp251x_read_reg(spi, CANINTF);
>>> +       } while (!priv->force_quit&&  intf);
>>>        mutex_unlock(&priv->mcp_lock);
>>>        return IRQ_HANDLED;
>>>  }
>>>
>>>
>>>
>>> On Sat, Aug 28, 2010 at 2:20 PM, Benny B. Simonsen<[email protected]>
>>>  wrote:
>>>>
>>>> 2010/8/6 christian pellegrin<[email protected]>
>>>>>
>>>>> On Fri, Aug 6, 2010 at 3:56 PM, Wolfgang Grandegger
>>>>> <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> To be clear, out-of-order messages are also not OK.
>>>>>
>>>>> Yes I was looking at this with some interleave analysis. I think the
>>>>> problem lies in the following race:
>>>>>
>>>>> // suppose we come here with RXB0 filled with packet X
>>>>>                u8 intf = mcp251x_read_reg(spi, CANINTF);
>>>>>                u8 eflag;
>>>>>                int can_id = 0, data1 = 0;
>>>>>
>>>>> // packet X+1 arrives and lands in RXB1
>>>>>
>>>>>                if (intf&  CANINTF_RX0IF) {
>>>>>                        mcp251x_hw_rx(spi, 0);
>>>>>                        /* Free one buffer ASAP */
>>>>>                        mcp251x_write_bits(spi, CANINTF, intf&
>>>>> CANINTF_RX0IF,
>>>>>                                           0x00);
>>>>>                }
>>>>>
>>>>> // packet X+2 arrives and lands in RXB0 // unfortunately variable
>>>>> intf is old and doesn't show that we have something in RXB1 // (*1)
>>>>>                if (intf&  CANINTF_RX1IF)
>>>>>                        mcp251x_hw_rx(spi, 1);
>>>>>
>>>>> now on the next turn we will have packet X+2 in RXB0 and X+1 in RXB1
>>>>> so we read them out-of-order. The easiest way to solve the problem is
>>>>> to take out the line:
>>>>>
>>>>>                        /* Free one buffer ASAP */
>>>>>                        mcp251x_write_bits(spi, CANINTF, intf&
>>>>> CANINTF_RX0IF,
>>>>>                                           0x00);
>>>>>
>>>>> but we risk to lose more packets. Perhaps also rereading intf in (*1)
>>>>> could work but we must be careful when we reset CANINTF register to
>>>>> not lose an interrupt for RXB0. I will do some tests but after next
>>>>> week because I don't have the access to the hardware until then.
>>>>>
>>>>> --
>>>>> Christian Pellegrin, see http://www.evolware.org/chri/ "Real
>>>>> Programmers don't play tennis, or any other sport which requires you
>>>>> to change clothes. Mountain climbing is OK, and Real Programmers wear
>>>>> their climbing boots to work in case a mountain should suddenly
>>>>> spring up in the middle of the computer room."
>>>>
>>>> Hi Christian,
>>>> Did you have a chance to look on the problem and / or the patch I send
>>>> to the list?
>>>> BR
>>>> Benny
>>>
>>>
>>> --
>>> Christian Pellegrin, see http://www.evolware.org/chri/ "Real Programmers
>>> don't play tennis, or any other sport which requires you to change clothes.
>>> Mountain climbing is OK, and Real Programmers wear their climbing boots to
>>> work in case a mountain should suddenly spring up in the middle of the
>>> computer room."
>>> _______________________________________________
>>> Socketcan-users mailing list
>>> [email protected]
>>> https://lists.berlios.de/mailman/listinfo/socketcan-users
>>> _______________________________________________
>>> Socketcan-users mailing list
>>> [email protected]
>>> https://lists.berlios.de/mailman/listinfo/socketcan-users
>>>
>> _______________________________________________
>> Socketcan-users mailing list
>> [email protected]
>> https://lists.berlios.de/mailman/listinfo/socketcan-users
>
> Hi Erik,
>
> The 0x04 ID message is the way SocketCAN indicates overrun. Don't know where
> the different error codes is described (saw it in the driver)
>
> BR
> Benny
>
_______________________________________________
Socketcan-users mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-users

Reply via email to