Ira W. Snyder wrote:
> On Wed, Feb 24, 2010 at 12:52:06PM -0800, Ira W. Snyder wrote:
>> On Wed, Feb 24, 2010 at 08:41:06PM +0100, Marc Kleine-Budde wrote:
>>> Ira W. Snyder wrote:
>>>> On Wed, Feb 24, 2010 at 08:08:15PM +0100, Marc Kleine-Budde wrote:
>>>>> Ira W. Snyder wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>>>> But we want so solve the TX problem. You only have to stop the
>>>>>>>>> netif_queue if you TX ring is full. Then you have to deal with calling
>>>>>>>>> netif_wake_queue() eventually. But we obviously don't want to poll.
>>>>>>>>>
>>>>>>>>> The solution might be to look at the TX queue in the interrupt handler
>>>>>>>>> and/or the NAPI loop. And now we get back to my first question. If the
>>>>>>>>> controller does a loopback in hardware, i.e. you receive a frame for
>>>>>>>>> each frame send, you don't need to poll the TX queue length.
>>>>>>>>>
>>>>>>>>> If you receive a CAN frame it might be on of your's, which means the 
>>>>>>>>> TX
>>>>>>>>> queue might have at least room for one frame. In pseudo code it might
>>>>>>>>> look like this:
>>>>>>>>>
>>>>>>>>> xmit()
>>>>>>>>> {
>>>>>>>>>       send_frame_to_hardware();
>>>>>>>>>       if (tx_ring_full) {
>>>>>>>>>               netif_stop_queue();
>>>>>>>>>               priv->flags |= TX_RING_FULL;
>>>>>>>>>       }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> irq() and netif_poll()
>>>>>>>>> {
>>>>>>>>>       if (priv->flags & TX_RING_FULL) {
>>>>>>>>>               if (room_in_ring()) {
>>>>>>>>>                       priv->flags &= ~TX_RING_FULL;
>>>>>>>>>                       netif_wake_queue();
>>>>>>>>>               }
>>>>>>>>>       }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Should be implemented race-free, i.e. you might have to use atomic_*
>>>>>>>>> operations and/or don't use a shared flag variable.
>>>>>>>> Cool, that should work. netif_queue_stopped() could be used instead of
>>>>>>>> the TX_RING_FULL flag.
>>>>>>> right...good point
>>>>>>>
>>>>>>> I should have looked at my own driver :)
>>>>>>>
>>>>>>> If room_in_ring() is cheap, it boils down to:
>>>>>>>
>>>>>>>>> irq() and netif_poll()
>>>>>>>>> {
>>>>>>>>>       if (room_in_ring()) {
>>>>>>>>>               netif_wake_queue();
>>>>>>>>> }
>>>>>>> Otherwise:
>>>>>>>
>>>>>>>>> irq() and netif_poll()
>>>>>>>>> {
>>>>>>>>>       if (netif_queue_stopped() && room_in_ring()) {
>>>>>>>>>               netif_wake_queue();
>>>>>>>>> }
>>>>>> Hmm, this *almost* works. Running "cangen -g 0 can0" quickly gets the
>>>>>> queue into a hung state, though.
>>>>> what is a "hung state"?
>>>>>
>>>>>> I added printk() to the sites where I stop/wake the queue. It is always
>>>>>> woken up correctly, but cangen always exits with "write: No buffer space
>>>>>> available" after a few stop/wake cycles. Immediately running cangen
>>>>>> again works for a few more stop/wake cycles.
>>>>> let's look the the cangen source if there's propper error handling...
>>>>> http://svn.berlios.de/viewvc/socketcan/trunk/can-utils/cangen.c?revision=787&view=markup
>>>>> nope, there isn't
>>>>>
>>>>> There's the option "-i" to ignore ENOBUFS, which is....we don't want to
>>>>> do that....
>>>>>
>>>>> Better improbe the source: (from the pengutronix' canutils...)
>>>>>
>>>>> again:
>>>>>         len = write(s, &frame, sizeof(frame));
>>>>>         if (len == -1) {
>>>>>                 switch (errno) {
>>>>>                 case ENOBUFS: {
>>>>>                         int err;
>>>>>                         struct pollfd fds[] = {
>>>>>                                 {
>>>>>                                         .fd     = s,
>>>>>                                         .events = POLLOUT,
>>>>>                                 },
>>>>>                         };
>>>>>
>>>>>                         err = poll(fds, 1, 1000);
>>>>>                         if (err == -1 && errno != -EINTR) {
>>>>>                                 perror("poll()");
>>>>>                                 exit(EXIT_FAILURE);
>>>>>                         } else if (err == 0) {
>>>>>                           printf("a timeout occured\n");
>>>>>                           exit(EXIT_FAILURE);
>>>>>                   }
>>>>>                 }
>>>>>                 case EINTR:     /* fallthrough */
>>>>>                         goto again;
>>>>>                 default:
>>>>>                         perror("write");
>>>>>                         exit(EXIT_FAILURE);
>>>>>                 }
>>>>>         }
>>>>>
>>>>> If you hit a timeout, which means you cannot send packages for 1000 ms,
>>>>> your send queue probably didn't wake up...
>>>>>
>>>> Is there any chance you can send me the pengutronix version of
>>>> cangen/canutils? I can't find it on their website.
>>>
>>> http://www.pengutronix.de/software/socket-can/download/canutils/v4.0/
>>> http://www.pengutronix.de/software/libsocketcan/download/
>>>
>>> you need the libsocketcan, too
>>>
>>>>>> Here is the output of ifconfig:
>>>>>> can0      Link encap:UNSPEC  HWaddr 
>>>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
>>>>>>           UP RUNNING NOARP  MTU:16  Metric:1
>>>>>>           RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
>>>>>>           TX packets:1902 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>           collisions:0 txqueuelen:10 
>>>>>>           RX bytes:10767 (10.5 KiB)  TX bytes:10767 (10.5 KiB)
>>>>>>           Interrupt:22 
>>>>>>
>>>>>> can1      Link encap:UNSPEC  HWaddr 
>>>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
>>>>>>           UP RUNNING NOARP  MTU:16  Metric:1
>>>>>>           RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
>>>>>>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>           collisions:0 txqueuelen:10 
>>>>>>           RX bytes:10767 (10.5 KiB)  TX bytes:0 (0.0 b)
>>>>>>           Interrupt:22 
>>>>>>
>>>>>> I'm stopping the queue correctly: the xmit() routine never complains 
>>>>>> that it
>>>>>> tried to transmit but had no buffer space. You'll see that we didn't 
>>>>>> drop any
>>>>>> packets in this run. In another run, we did drop packets. It is like the
>>>>>> firmware didn't echo the packets back to me. Also note that the TX of
>>>>>> can0 always matches the RX of can1. They're cabled together, so this is
>>>>>> expected.
>>>>> Hmm...it looks like this.
>>>>>
>>>>> Double check your RX path, so that you don't loose any frames in
>>>>> software. Is it possible that the RX and TX collide in your driver
>>>>> and/or in hardware?
>>>>>
>>>> I'm pretty sure that I do not lose RX frames in my driver. When I do,
>>>> stats->rx_dropped is always incremented. I do get the following warning
>>> I mean loosing RX frames due to an programming error...
>>>
> 
> I've done some more investigation.
> 
> 1) using a tasklet instead of the shared workqueue seems to help. We get
> many less rx packets lost

Why don't you use NAPI for rx?

> 2) the hangs seem to be due to the driver losing lots of interrupts. It
> spends all of it's time in the xmit() routine, and never gets a chance
> to run the recv() routine.

You wake the tx_queue quite early in the work_queue. Try waking TX after you
have RX'ed all messages, not earlier.

> This is a problem, even using tasklet_hi_schedule(). The cansequence
> program is blasting the transmit queue so hard that we never get a
> chance to service the recv() queue. Therefore we start losing packets,
> and generating "rx overflow" error frames.

Hmmm....if the tx_queue of your hardware is full,
the xmit functions shouldn't be called, because you halted it.

> Should I just go ahead and service the recv() routine right from
> interrupt context? I know that it is usually frowned upon to do that,
> but I've seen it in other network drivers before...

cheers, Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Reply via email to