On Wed, Feb 24, 2010 at 12:52:06PM -0800, Ira W. Snyder wrote:
> On Wed, Feb 24, 2010 at 08:41:06PM +0100, Marc Kleine-Budde wrote:
> > Ira W. Snyder wrote:
> > > On Wed, Feb 24, 2010 at 08:08:15PM +0100, Marc Kleine-Budde wrote:
> > >> Ira W. Snyder wrote:
> > >>
> > >> [...]
> > >>
> > >>>>>> But we want so solve the TX problem. You only have to stop the
> > >>>>>> netif_queue if you TX ring is full. Then you have to deal with
> > >>>>>> calling
> > >>>>>> netif_wake_queue() eventually. But we obviously don't want to poll.
> > >>>>>>
> > >>>>>> The solution might be to look at the TX queue in the interrupt
> > >>>>>> handler
> > >>>>>> and/or the NAPI loop. And now we get back to my first question. If
> > >>>>>> the
> > >>>>>> controller does a loopback in hardware, i.e. you receive a frame for
> > >>>>>> each frame send, you don't need to poll the TX queue length.
> > >>>>>>
> > >>>>>> If you receive a CAN frame it might be on of your's, which means the
> > >>>>>> TX
> > >>>>>> queue might have at least room for one frame. In pseudo code it might
> > >>>>>> look like this:
> > >>>>>>
> > >>>>>> xmit()
> > >>>>>> {
> > >>>>>> send_frame_to_hardware();
> > >>>>>> if (tx_ring_full) {
> > >>>>>> netif_stop_queue();
> > >>>>>> priv->flags |= TX_RING_FULL;
> > >>>>>> }
> > >>>>>> }
> > >>>>>>
> > >>>>>> irq() and netif_poll()
> > >>>>>> {
> > >>>>>> if (priv->flags & TX_RING_FULL) {
> > >>>>>> if (room_in_ring()) {
> > >>>>>> priv->flags &= ~TX_RING_FULL;
> > >>>>>> netif_wake_queue();
> > >>>>>> }
> > >>>>>> }
> > >>>>>> }
> > >>>>>>
> > >>>>>> Should be implemented race-free, i.e. you might have to use atomic_*
> > >>>>>> operations and/or don't use a shared flag variable.
> > >>>>> Cool, that should work. netif_queue_stopped() could be used instead of
> > >>>>> the TX_RING_FULL flag.
> > >>>> right...good point
> > >>>>
> > >>>> I should have looked at my own driver :)
> > >>>>
> > >>>> If room_in_ring() is cheap, it boils down to:
> > >>>>
> > >>>>>> irq() and netif_poll()
> > >>>>>> {
> > >>>>>> if (room_in_ring()) {
> > >>>>>> netif_wake_queue();
> > >>>>>> }
> > >>>> Otherwise:
> > >>>>
> > >>>>>> irq() and netif_poll()
> > >>>>>> {
> > >>>>>> if (netif_queue_stopped() && room_in_ring()) {
> > >>>>>> netif_wake_queue();
> > >>>>>> }
> > >>> Hmm, this *almost* works. Running "cangen -g 0 can0" quickly gets the
> > >>> queue into a hung state, though.
> > >> what is a "hung state"?
> > >>
> > >>> I added printk() to the sites where I stop/wake the queue. It is always
> > >>> woken up correctly, but cangen always exits with "write: No buffer space
> > >>> available" after a few stop/wake cycles. Immediately running cangen
> > >>> again works for a few more stop/wake cycles.
> > >> let's look the the cangen source if there's propper error handling...
> > >> http://svn.berlios.de/viewvc/socketcan/trunk/can-utils/cangen.c?revision=787&view=markup
> > >> nope, there isn't
> > >>
> > >> There's the option "-i" to ignore ENOBUFS, which is....we don't want to
> > >> do that....
> > >>
> > >> Better improbe the source: (from the pengutronix' canutils...)
> > >>
> > >> again:
> > >> len = write(s, &frame, sizeof(frame));
> > >> if (len == -1) {
> > >> switch (errno) {
> > >> case ENOBUFS: {
> > >> int err;
> > >> struct pollfd fds[] = {
> > >> {
> > >> .fd = s,
> > >> .events = POLLOUT,
> > >> },
> > >> };
> > >>
> > >> err = poll(fds, 1, 1000);
> > >> if (err == -1 && errno != -EINTR) {
> > >> perror("poll()");
> > >> exit(EXIT_FAILURE);
> > >> } else if (err == 0) {
> > >> printf("a timeout occured\n");
> > >> exit(EXIT_FAILURE);
> > >> }
> > >> }
> > >> case EINTR: /* fallthrough */
> > >> goto again;
> > >> default:
> > >> perror("write");
> > >> exit(EXIT_FAILURE);
> > >> }
> > >> }
> > >>
> > >> If you hit a timeout, which means you cannot send packages for 1000 ms,
> > >> your send queue probably didn't wake up...
> > >>
> > >
> > > Is there any chance you can send me the pengutronix version of
> > > cangen/canutils? I can't find it on their website.
> >
> >
> > http://www.pengutronix.de/software/socket-can/download/canutils/v4.0/
> > http://www.pengutronix.de/software/libsocketcan/download/
> >
> > you need the libsocketcan, too
> >
> > >
> > >>> Here is the output of ifconfig:
> > >>> can0 Link encap:UNSPEC HWaddr
> > >>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> > >>> UP RUNNING NOARP MTU:16 Metric:1
> > >>> RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
> > >>> TX packets:1902 errors:0 dropped:0 overruns:0 carrier:0
> > >>> collisions:0 txqueuelen:10
> > >>> RX bytes:10767 (10.5 KiB) TX bytes:10767 (10.5 KiB)
> > >>> Interrupt:22
> > >>>
> > >>> can1 Link encap:UNSPEC HWaddr
> > >>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> > >>> UP RUNNING NOARP MTU:16 Metric:1
> > >>> RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
> > >>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > >>> collisions:0 txqueuelen:10
> > >>> RX bytes:10767 (10.5 KiB) TX bytes:0 (0.0 b)
> > >>> Interrupt:22
> > >>>
> > >>> I'm stopping the queue correctly: the xmit() routine never complains
> > >>> that it
> > >>> tried to transmit but had no buffer space. You'll see that we didn't
> > >>> drop any
> > >>> packets in this run. In another run, we did drop packets. It is like the
> > >>> firmware didn't echo the packets back to me. Also note that the TX of
> > >>> can0 always matches the RX of can1. They're cabled together, so this is
> > >>> expected.
> > >> Hmm...it looks like this.
> > >>
> > >> Double check your RX path, so that you don't loose any frames in
> > >> software. Is it possible that the RX and TX collide in your driver
> > >> and/or in hardware?
> > >>
> > >
> > > I'm pretty sure that I do not lose RX frames in my driver. When I do,
> > > stats->rx_dropped is always incremented. I do get the following warning
> >
> > I mean loosing RX frames due to an programming error...
> >
I've done some more investigation.
1) using a tasklet instead of the shared workqueue seems to help. We get
many less rx packets lost
2) the hangs seem to be due to the driver losing lots of interrupts. It
spends all of it's time in the xmit() routine, and never gets a chance
to run the recv() routine.
This is a problem, even using tasklet_hi_schedule(). The cansequence
program is blasting the transmit queue so hard that we never get a
chance to service the recv() queue. Therefore we start losing packets,
and generating "rx overflow" error frames.
Should I just go ahead and service the recv() routine right from
interrupt context? I know that it is usually frowned upon to do that,
but I've seen it in other network drivers before...
Ira
_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core