On Wed, Feb 24, 2010 at 08:08:15PM +0100, Marc Kleine-Budde wrote:
> Ira W. Snyder wrote:
> 
> [...]
> 
> >>>> But we want so solve the TX problem. You only have to stop the
> >>>> netif_queue if you TX ring is full. Then you have to deal with calling
> >>>> netif_wake_queue() eventually. But we obviously don't want to poll.
> >>>>
> >>>> The solution might be to look at the TX queue in the interrupt handler
> >>>> and/or the NAPI loop. And now we get back to my first question. If the
> >>>> controller does a loopback in hardware, i.e. you receive a frame for
> >>>> each frame send, you don't need to poll the TX queue length.
> >>>>
> >>>> If you receive a CAN frame it might be on of your's, which means the TX
> >>>> queue might have at least room for one frame. In pseudo code it might
> >>>> look like this:
> >>>>
> >>>> xmit()
> >>>> {
> >>>>  send_frame_to_hardware();
> >>>>  if (tx_ring_full) {
> >>>>          netif_stop_queue();
> >>>>          priv->flags |= TX_RING_FULL;
> >>>>  }
> >>>> }
> >>>>
> >>>> irq() and netif_poll()
> >>>> {
> >>>>  if (priv->flags & TX_RING_FULL) {
> >>>>          if (room_in_ring()) {
> >>>>                  priv->flags &= ~TX_RING_FULL;
> >>>>                  netif_wake_queue();
> >>>>          }
> >>>>  }
> >>>> }
> >>>>
> >>>> Should be implemented race-free, i.e. you might have to use atomic_*
> >>>> operations and/or don't use a shared flag variable.
> >>> Cool, that should work. netif_queue_stopped() could be used instead of
> >>> the TX_RING_FULL flag.
> >> right...good point
> >>
> >> I should have looked at my own driver :)
> >>
> >> If room_in_ring() is cheap, it boils down to:
> >>
> >>>> irq() and netif_poll()
> >>>> {
> >>>>  if (room_in_ring()) {
> >>>>          netif_wake_queue();
> >>>> }
> >> Otherwise:
> >>
> >>>> irq() and netif_poll()
> >>>> {
> >>>>  if (netif_queue_stopped() && room_in_ring()) {
> >>>>          netif_wake_queue();
> >>>> }
> > 
> > Hmm, this *almost* works. Running "cangen -g 0 can0" quickly gets the
> > queue into a hung state, though.
> 
> what is a "hung state"?
> 
> > I added printk() to the sites where I stop/wake the queue. It is always
> > woken up correctly, but cangen always exits with "write: No buffer space
> > available" after a few stop/wake cycles. Immediately running cangen
> > again works for a few more stop/wake cycles.
> 
> let's look the the cangen source if there's propper error handling...
> http://svn.berlios.de/viewvc/socketcan/trunk/can-utils/cangen.c?revision=787&view=markup
> nope, there isn't
> 
> There's the option "-i" to ignore ENOBUFS, which is....we don't want to
> do that....
> 
> Better improbe the source: (from the pengutronix' canutils...)
> 
> again:
>         len = write(s, &frame, sizeof(frame));
>         if (len == -1) {
>                 switch (errno) {
>                 case ENOBUFS: {
>                         int err;
>                         struct pollfd fds[] = {
>                                 {
>                                         .fd     = s,
>                                         .events = POLLOUT,
>                                 },
>                         };
> 
>                         err = poll(fds, 1, 1000);
>                         if (err == -1 && errno != -EINTR) {
>                                 perror("poll()");
>                                 exit(EXIT_FAILURE);
>                         } else if (err == 0) {
>                               printf("a timeout occured\n");
>                               exit(EXIT_FAILURE);
>                       }
>                 }
>                 case EINTR:     /* fallthrough */
>                         goto again;
>                 default:
>                         perror("write");
>                         exit(EXIT_FAILURE);
>                 }
>         }
> 
> If you hit a timeout, which means you cannot send packages for 1000 ms,
> your send queue probably didn't wake up...
> 

Is there any chance you can send me the pengutronix version of
cangen/canutils? I can't find it on their website.

> > 
> > Here is the output of ifconfig:
> > can0      Link encap:UNSPEC  HWaddr 
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
> >           UP RUNNING NOARP  MTU:16  Metric:1
> >           RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:1902 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:10 
> >           RX bytes:10767 (10.5 KiB)  TX bytes:10767 (10.5 KiB)
> >           Interrupt:22 
> > 
> > can1      Link encap:UNSPEC  HWaddr 
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
> >           UP RUNNING NOARP  MTU:16  Metric:1
> >           RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:10 
> >           RX bytes:10767 (10.5 KiB)  TX bytes:0 (0.0 b)
> >           Interrupt:22 
> > 
> > I'm stopping the queue correctly: the xmit() routine never complains that it
> > tried to transmit but had no buffer space. You'll see that we didn't drop 
> > any
> > packets in this run. In another run, we did drop packets. It is like the
> > firmware didn't echo the packets back to me. Also note that the TX of
> > can0 always matches the RX of can1. They're cabled together, so this is
> > expected.
> 
> Hmm...it looks like this.
> 
> Double check your RX path, so that you don't loose any frames in
> software. Is it possible that the RX and TX collide in your driver
> and/or in hardware?
> 

I'm pretty sure that I do not lose RX frames in my driver. When I do,
stats->rx_dropped is always incremented. I do get the following warning
from lockdep, though. Sorry about the length. The best I can get from
the output is that something in the networking open() code is broken (or
needs a special lockdep annotation).

[  134.540375] janz-cmodio 0000:01:0e.0: PCI->APIC IRQ transform: INT A -> IRQ 
22
[  134.547629] janz-cmodio 0000:01:0e.0: setting latency timer to 64
[  134.684269] janz-ican3 janz-ican3.0: module 0: registered CAN device
[  134.807929] janz-ican3 janz-ican3.1: module 1: registered CAN device
[  175.361588] janz-ican3 janz-ican3.0: ican3_xmit: stopping queue
[  175.373104] janz-ican3 janz-ican3.0: ican3_recv_skb: waking queue
[  175.379556] 
[  175.379559] ======================================================
[  175.383277] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
[  175.383277] 2.6.33-rc7-00000-rc7 #3
[  175.383277] ------------------------------------------------------
[  175.383277] events/0/5 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire:
[  175.383277]  (clock-AF_CAN){++.-..}, at: [<c11920e7>] 
sock_def_write_space+0x15/0x8d
[  175.383277] 
[  175.383277] and this task is already holding:
[  175.383277]  (&(&mod->lock)->rlock){-.-...}, at: [<f898df29>] 
ican3_xmit+0x48/0x277 [janz_ican3]
[  175.383277] which would create a new lock dependency:
[  175.383277]  (&(&mod->lock)->rlock){-.-...} -> (clock-AF_CAN){++.-..}
[  175.383277] 
[  175.383277] but this new dependency connects a HARDIRQ-irq-safe lock:
[  175.383277]  (&(&mod->lock)->rlock){-.-...}
[  175.383277] ... which became HARDIRQ-irq-safe at:
[  175.383277]   [<c103d47f>] __lock_acquire+0x4ce/0x756
[  175.383277]   [<c103e209>] lock_acquire+0x45/0x5c
[  175.383277]   [<c12037a7>] _raw_spin_lock_irqsave+0x24/0x34
[  175.383277]   [<f898d3c9>] ican3_irq+0x41/0x164 [janz_ican3]
[  175.383277]   [<c10463ab>] handle_IRQ_event+0x1d/0x9a
[  175.383277]   [<c1047872>] handle_fasteoi_irq+0x9a/0xa7
[  175.383277] 
[  175.383277] to a HARDIRQ-irq-unsafe lock:
[  175.383277]  (clock-AF_CAN){++.-..}
[  175.383277] ... which became HARDIRQ-irq-unsafe at:
[  175.383277] ...  [<c103d500>] __lock_acquire+0x54f/0x756
[  175.383277]   [<c103e209>] lock_acquire+0x45/0x5c
[  175.383277]   [<c12038da>] _raw_write_lock_bh+0x20/0x2f
[  175.383277]   [<c11e3be7>] raw_release+0xf9/0x14c
[  175.383277]   [<c118f55c>] sock_release+0x14/0x58
[  175.383277]   [<c118fc4c>] sock_close+0x1c/0x20
[  175.383277]   [<c106fc41>] __fput+0xeb/0x187
[  175.383277]   [<c106fe8d>] fput+0x16/0x18
[  175.383277]   [<c106d873>] filp_close+0x51/0x5b
[  175.383277]   [<c106e9c3>] sys_close+0x67/0xa0
[  175.383277]   [<c1002657>] sysenter_do_call+0x12/0x36
[  175.383277] 
[  175.383277] other info that might help us debug this:
[  175.383277] 
[  175.383277] 4 locks held by events/0/5:
[  175.383277]  #0:  (events){+.+.+.}, at: [<c102ce51>] worker_thread+0xc9/0x1c9
[  175.383277]  #1:  ((&mod->work)){+.+...}, at: [<c102ce51>] 
worker_thread+0xc9/0x1c9
[  175.383277]  #2:  (_xmit_NONE#2){+.-...}, at: [<c11a693a>] 
sch_direct_xmit+0x2d/0x113
[  175.383277]  #3:  (&(&mod->lock)->rlock){-.-...}, at: [<f898df29>] 
ican3_xmit+0x48/0x277 [janz_ican3]
[  175.383277] 
[  175.383277] the dependencies between HARDIRQ-irq-safe lock and the holding 
lock:
[  175.383277] -> (&(&mod->lock)->rlock){-.-...} ops: 0 {
[  175.383277]    IN-HARDIRQ-W at:
[  175.383277]                                        [<c103d47f>] 
__lock_acquire+0x4ce/0x756
[  175.383277]                                        [<c103e209>] 
lock_acquire+0x45/0x5c
[  175.383277]                                        [<c12037a7>] 
_raw_spin_lock_irqsave+0x24/0x34
[  175.383277]                                        [<f898d3c9>] 
ican3_irq+0x41/0x164 [janz_ican3]
[  175.383277]                                        [<c10463ab>] 
handle_IRQ_event+0x1d/0x9a
[  175.383277]                                        [<c1047872>] 
handle_fasteoi_irq+0x9a/0xa7
[  175.383277]    IN-SOFTIRQ-W at:
[  175.383277]                                        [<c103d4a3>] 
__lock_acquire+0x4f2/0x756
[  175.383277]                                        [<c103e209>] 
lock_acquire+0x45/0x5c
[  175.383277]                                        [<c12037a7>] 
_raw_spin_lock_irqsave+0x24/0x34
[  175.383277]                                        [<f898d3c9>] 
ican3_irq+0x41/0x164 [janz_ican3]
[  175.383277]                                        [<c10463ab>] 
handle_IRQ_event+0x1d/0x9a
[  175.383277]                                        [<c1047872>] 
handle_fasteoi_irq+0x9a/0xa7
[  175.383277]    INITIAL USE at:
[  175.383277]                                       [<c103d571>] 
__lock_acquire+0x5c0/0x756
[  175.383277]                                       [<c103e209>] 
lock_acquire+0x45/0x5c
[  175.383277]                                       [<c12037a7>] 
_raw_spin_lock_irqsave+0x24/0x34
[  175.383277]                                       [<f898d06e>] 
ican3_send_msg+0x1c/0x1e6 [janz_ican3]
[  175.383277]                                       [<f898e64d>] 
ican3_probe+0x421/0xc91 [janz_ican3]
[  175.383277]                                       [<c1140dcd>] 
platform_drv_probe+0xc/0xe
[  175.383277]                                       [<c114022c>] 
driver_probe_device+0x7d/0xf0
[  175.383277]                                       [<c1140421>] 
__device_attach+0x28/0x30
[  175.383277]                                       [<c113f721>] 
bus_for_each_drv+0x39/0x62
[  175.383277]                                       [<c1140362>] 
device_attach+0x45/0x59
[  175.383277]                                       [<c113f6d3>] 
bus_probe_device+0x1f/0x34
[  175.383277]                                       [<c113e618>] 
device_add+0x2f5/0x43b
[  175.383277]                                       [<c11410d1>] 
platform_device_add+0xd9/0x119
[  175.383277]                                       [<c1141126>] 
platform_device_register+0x15/0x18
[  175.383277]                                       [<f89a92df>] 
cmodio_pci_probe+0x228/0x2cd [janz_cmodio]
[  175.383277]                                       [<c10f5100>] 
local_pci_probe+0xe/0x10
[  175.383277]                                       [<c10f52cf>] 
pci_device_probe+0x48/0x66
[  175.383277]                                       [<c114022c>] 
driver_probe_device+0x7d/0xf0
[  175.383277]                                       [<c11402e2>] 
__driver_attach+0x43/0x5f
[  175.383277]                                       [<c113f950>] 
bus_for_each_dev+0x39/0x5a
[  175.383277]                                       [<c11400ff>] 
driver_attach+0x14/0x16
[  175.383277]                                       [<c113fdd7>] 
bus_add_driver+0xa2/0x1c8
[  175.383277]                                       [<c11405ab>] 
driver_register+0x7b/0xd7
[  175.383277]                                       [<c10f54c5>] 
__pci_register_driver+0x4d/0xa0
[  175.383277]                                       [<f89ac017>] 0xf89ac017
[  175.383277]                                       [<c100104b>] 
do_one_initcall+0x4b/0x135
[  175.383277]                                       [<c104475e>] 
sys_init_module+0xa9/0x1e3
[  175.383277]                                       [<c1002657>] 
sysenter_do_call+0x12/0x36
[  175.383277]  }
[  175.383277]  ... key      at: [<f898f61c>] __key.26093+0x0/0xfffff8a1 
[janz_ican3]
[  175.383277]  ... acquired at:
[  175.383277]    [<c103c497>] check_irq_usage+0x50/0xad
[  175.383277]    [<c103caf8>] validate_chain+0x604/0xabd
[  175.383277]    [<c103d69c>] __lock_acquire+0x6eb/0x756
[  175.383277]    [<c103e209>] lock_acquire+0x45/0x5c
[  175.383277]    [<c1203adb>] _raw_read_lock+0x1b/0x2a
[  175.383277]    [<c11920e7>] sock_def_write_space+0x15/0x8d
[  175.383277]    [<c11918d3>] sock_wfree+0x25/0x41
[  175.383277]    [<c1193503>] skb_release_head_state+0x43/0x46
[  175.383277]    [<c11945c0>] skb_release_all+0xb/0x15
[  175.383277]    [<c1193e41>] __kfree_skb+0xb/0x69
[  175.383277]    [<c1193ef1>] kfree_skb+0x28/0x2a
[  175.383277]    [<f898e0ef>] ican3_xmit+0x20e/0x277 [janz_ican3]
[  175.383277]    [<c119a597>] dev_hard_start_xmit+0x1c7/0x25d
[  175.383277]    [<c11a695a>] sch_direct_xmit+0x4d/0x113
[  175.383277]    [<c11a6ab0>] __qdisc_run+0x90/0xa2
[  175.383277]    [<c119c90d>] net_tx_action+0xa7/0xff
[  175.383277]    [<c102384e>] __do_softirq+0x75/0xf3
[  175.383277] 
[  175.383277] 
[  175.383277] the dependencies between the lock to be acquired and 
HARDIRQ-irq-unsafe lock:
[  175.383277] -> (clock-AF_CAN){++.-..} ops: 0 {
[  175.383277]    HARDIRQ-ON-W at:
[  175.383277]                                        [<c103d500>] 
__lock_acquire+0x54f/0x756
[  175.383277]                                        [<c103e209>] 
lock_acquire+0x45/0x5c
[  175.383277]                                        [<c12038da>] 
_raw_write_lock_bh+0x20/0x2f
[  175.383277]                                        [<c11e3be7>] 
raw_release+0xf9/0x14c
[  175.383277]                                        [<c118f55c>] 
sock_release+0x14/0x58
[  175.383277]                                        [<c118fc4c>] 
sock_close+0x1c/0x20
[  175.383277]                                        [<c106fc41>] 
__fput+0xeb/0x187
[  175.383277]                                        [<c106fe8d>] 
fput+0x16/0x18
[  175.383277]                                        [<c106d873>] 
filp_close+0x51/0x5b
[  175.383277]                                        [<c106e9c3>] 
sys_close+0x67/0xa0
[  175.383277]                                        [<c1002657>] 
sysenter_do_call+0x12/0x36
[  175.383277]    HARDIRQ-ON-R at:
[  175.383277]                                        [<c103d4d0>] 
__lock_acquire+0x51f/0x756
[  175.383277]                                        [<c103e209>] 
lock_acquire+0x45/0x5c
[  175.383277]                                        [<c1203adb>] 
_raw_read_lock+0x1b/0x2a
[  175.383277]                                        [<c119207c>] 
sock_def_readable+0x15/0x6b
[  175.383277]                                        [<c119149d>] 
sock_queue_rcv_skb+0xfc/0x115
[  175.383277]                                        [<c11e3901>] 
raw_rcv+0x4a/0x5a
[  175.383277]                                        [<c11e2491>] 
can_rcv_filter+0x7a/0x147
[  175.383277]                                        [<c11e2765>] 
can_rcv+0xa8/0xed
[  175.383277]                                        [<c1199aec>] 
netif_receive_skb+0x288/0x2bb
[  175.383277]                                        [<c119c1b5>] 
process_backlog+0x63/0x8c
[  175.383277]                                        [<c119c5e1>] 
net_rx_action+0x4b/0x104
[  175.383277]                                        [<c102384e>] 
__do_softirq+0x75/0xf3
[  175.383277]    IN-SOFTIRQ-R at:
[  175.383277]                                        [<c103d4a3>] 
__lock_acquire+0x4f2/0x756
[  175.383277]                                        [<c103e209>] 
lock_acquire+0x45/0x5c
[  175.383277]                                        [<c1203adb>] 
_raw_read_lock+0x1b/0x2a
[  175.383277]                                        [<c119207c>] 
sock_def_readable+0x15/0x6b
[  175.383277]                                        [<c119149d>] 
sock_queue_rcv_skb+0xfc/0x115
[  175.383277]                                        [<c11e3901>] 
raw_rcv+0x4a/0x5a
[  175.383277]                                        [<c11e2491>] 
can_rcv_filter+0x7a/0x147
[  175.383277]                                        [<c11e2765>] 
can_rcv+0xa8/0xed
[  175.383277]                                        [<c1199aec>] 
netif_receive_skb+0x288/0x2bb
[  175.383277]                                        [<c119c1b5>] 
process_backlog+0x63/0x8c
[  175.383277]                                        [<c119c5e1>] 
net_rx_action+0x4b/0x104
[  175.383277]                                        [<c102384e>] 
__do_softirq+0x75/0xf3
[  175.383277]    INITIAL USE at:
[  175.383277]                                       [<c103d571>] 
__lock_acquire+0x5c0/0x756
[  175.383277]                                       [<c103e209>] 
lock_acquire+0x45/0x5c
[  175.383277]                                       [<c1203adb>] 
_raw_read_lock+0x1b/0x2a
[  175.383277]                                       [<c11920e7>] 
sock_def_write_space+0x15/0x8d
[  175.383277]                                       [<c11918d3>] 
sock_wfree+0x25/0x41
[  175.383277]                                       [<c1193503>] 
skb_release_head_state+0x43/0x46
[  175.383277]                                       [<c11945c0>] 
skb_release_all+0xb/0x15
[  175.383277]                                       [<c1193e41>] 
__kfree_skb+0xb/0x69
[  175.383277]                                       [<c1193ef1>] 
kfree_skb+0x28/0x2a
[  175.383277]                                       [<f898e0ef>] 
ican3_xmit+0x20e/0x277 [janz_ican3]
[  175.383277]                                       [<c119a597>] 
dev_hard_start_xmit+0x1c7/0x25d
[  175.383277]                                       [<c11a695a>] 
sch_direct_xmit+0x4d/0x113
[  175.383277]                                       [<c119d15d>] 
dev_queue_xmit+0x22c/0x397
[  175.383277]                                       [<c11e267e>] 
can_send+0xd0/0x10f
[  175.383277]                                       [<c11e3882>] 
raw_sendmsg+0xc3/0xf8
[  175.383277]                                       [<c118df5f>] 
sock_aio_write+0xc5/0xce
[  175.383277]                                       [<c106edf8>] 
do_sync_write+0x88/0xca
[  175.383277]                                       [<c106f49d>] 
vfs_write+0x9d/0x116
[  175.383277]                                       [<c106f86a>] 
sys_write+0x3b/0x60
[  175.383277]                                       [<c1002657>] 
sysenter_do_call+0x12/0x36
[  175.383277]  }
[  175.383277]  ... key      at: [<c186ba04>] af_callback_keys+0xe8/0x128
[  175.383277]  ... acquired at:
[  175.383277]    [<c103c497>] check_irq_usage+0x50/0xad
[  175.383277]    [<c103caf8>] validate_chain+0x604/0xabd
[  175.383277]    [<c103d69c>] __lock_acquire+0x6eb/0x756
[  175.383277]    [<c103e209>] lock_acquire+0x45/0x5c
[  175.383277]    [<c1203adb>] _raw_read_lock+0x1b/0x2a
[  175.383277]    [<c11920e7>] sock_def_write_space+0x15/0x8d
[  175.383277]    [<c11918d3>] sock_wfree+0x25/0x41
[  175.383277]    [<c1193503>] skb_release_head_state+0x43/0x46
[  175.383277]    [<c11945c0>] skb_release_all+0xb/0x15
[  175.383277]    [<c1193e41>] __kfree_skb+0xb/0x69
[  175.383277]    [<c1193ef1>] kfree_skb+0x28/0x2a
[  175.383277]    [<f898e0ef>] ican3_xmit+0x20e/0x277 [janz_ican3]
[  175.383277]    [<c119a597>] dev_hard_start_xmit+0x1c7/0x25d
[  175.383277]    [<c11a695a>] sch_direct_xmit+0x4d/0x113
[  175.383277]    [<c11a6ab0>] __qdisc_run+0x90/0xa2
[  175.383277]    [<c119c90d>] net_tx_action+0xa7/0xff
[  175.383277]    [<c102384e>] __do_softirq+0x75/0xf3
[  175.383277] 


> Another improvement for the cangen utility is to exit after a certain
> number of messages. So you can match the number of generated frames with
> the rx/tx counters of the CAN cards and interrupts (cat /proc/interrupts)
> 

Yeah, that would be a good improvement too.

> > can0      Link encap:UNSPEC  HWaddr 
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
> >           UP RUNNING NOARP  MTU:16  Metric:1
> >           RX packets:461 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:608 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:10 
> >           RX bytes:2616 (2.5 KiB)  TX bytes:3434 (3.3 KiB)
> >           Interrupt:22 
> > 
> > can1      Link encap:UNSPEC  HWaddr 
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
> >           UP RUNNING NOARP  MTU:16  Metric:1
> >           RX packets:608 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:10 
> >           RX bytes:3434 (3.3 KiB)  TX bytes:0 (0.0 b)
> >           Interrupt:22 
> > 
> > Any ideas what I am doing wrong?
> 
> cheers, Marc
> 
> -- 
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |
> 


_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Reply via email to