On Wed, Feb 24, 2010 at 08:08:15PM +0100, Marc Kleine-Budde wrote:
> Ira W. Snyder wrote:
>
> [...]
>
> >>>> But we want so solve the TX problem. You only have to stop the
> >>>> netif_queue if you TX ring is full. Then you have to deal with calling
> >>>> netif_wake_queue() eventually. But we obviously don't want to poll.
> >>>>
> >>>> The solution might be to look at the TX queue in the interrupt handler
> >>>> and/or the NAPI loop. And now we get back to my first question. If the
> >>>> controller does a loopback in hardware, i.e. you receive a frame for
> >>>> each frame send, you don't need to poll the TX queue length.
> >>>>
> >>>> If you receive a CAN frame it might be on of your's, which means the TX
> >>>> queue might have at least room for one frame. In pseudo code it might
> >>>> look like this:
> >>>>
> >>>> xmit()
> >>>> {
> >>>> send_frame_to_hardware();
> >>>> if (tx_ring_full) {
> >>>> netif_stop_queue();
> >>>> priv->flags |= TX_RING_FULL;
> >>>> }
> >>>> }
> >>>>
> >>>> irq() and netif_poll()
> >>>> {
> >>>> if (priv->flags & TX_RING_FULL) {
> >>>> if (room_in_ring()) {
> >>>> priv->flags &= ~TX_RING_FULL;
> >>>> netif_wake_queue();
> >>>> }
> >>>> }
> >>>> }
> >>>>
> >>>> Should be implemented race-free, i.e. you might have to use atomic_*
> >>>> operations and/or don't use a shared flag variable.
> >>> Cool, that should work. netif_queue_stopped() could be used instead of
> >>> the TX_RING_FULL flag.
> >> right...good point
> >>
> >> I should have looked at my own driver :)
> >>
> >> If room_in_ring() is cheap, it boils down to:
> >>
> >>>> irq() and netif_poll()
> >>>> {
> >>>> if (room_in_ring()) {
> >>>> netif_wake_queue();
> >>>> }
> >> Otherwise:
> >>
> >>>> irq() and netif_poll()
> >>>> {
> >>>> if (netif_queue_stopped() && room_in_ring()) {
> >>>> netif_wake_queue();
> >>>> }
> >
> > Hmm, this *almost* works. Running "cangen -g 0 can0" quickly gets the
> > queue into a hung state, though.
>
> what is a "hung state"?
>
> > I added printk() to the sites where I stop/wake the queue. It is always
> > woken up correctly, but cangen always exits with "write: No buffer space
> > available" after a few stop/wake cycles. Immediately running cangen
> > again works for a few more stop/wake cycles.
>
> let's look the the cangen source if there's propper error handling...
> http://svn.berlios.de/viewvc/socketcan/trunk/can-utils/cangen.c?revision=787&view=markup
> nope, there isn't
>
> There's the option "-i" to ignore ENOBUFS, which is....we don't want to
> do that....
>
> Better improbe the source: (from the pengutronix' canutils...)
>
> again:
> len = write(s, &frame, sizeof(frame));
> if (len == -1) {
> switch (errno) {
> case ENOBUFS: {
> int err;
> struct pollfd fds[] = {
> {
> .fd = s,
> .events = POLLOUT,
> },
> };
>
> err = poll(fds, 1, 1000);
> if (err == -1 && errno != -EINTR) {
> perror("poll()");
> exit(EXIT_FAILURE);
> } else if (err == 0) {
> printf("a timeout occured\n");
> exit(EXIT_FAILURE);
> }
> }
> case EINTR: /* fallthrough */
> goto again;
> default:
> perror("write");
> exit(EXIT_FAILURE);
> }
> }
>
> If you hit a timeout, which means you cannot send packages for 1000 ms,
> your send queue probably didn't wake up...
>
Is there any chance you can send me the pengutronix version of
cangen/canutils? I can't find it on their website.
> >
> > Here is the output of ifconfig:
> > can0 Link encap:UNSPEC HWaddr
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> > UP RUNNING NOARP MTU:16 Metric:1
> > RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:1902 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:10
> > RX bytes:10767 (10.5 KiB) TX bytes:10767 (10.5 KiB)
> > Interrupt:22
> >
> > can1 Link encap:UNSPEC HWaddr
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> > UP RUNNING NOARP MTU:16 Metric:1
> > RX packets:1902 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:10
> > RX bytes:10767 (10.5 KiB) TX bytes:0 (0.0 b)
> > Interrupt:22
> >
> > I'm stopping the queue correctly: the xmit() routine never complains that it
> > tried to transmit but had no buffer space. You'll see that we didn't drop
> > any
> > packets in this run. In another run, we did drop packets. It is like the
> > firmware didn't echo the packets back to me. Also note that the TX of
> > can0 always matches the RX of can1. They're cabled together, so this is
> > expected.
>
> Hmm...it looks like this.
>
> Double check your RX path, so that you don't loose any frames in
> software. Is it possible that the RX and TX collide in your driver
> and/or in hardware?
>
I'm pretty sure that I do not lose RX frames in my driver. When I do,
stats->rx_dropped is always incremented. I do get the following warning
from lockdep, though. Sorry about the length. The best I can get from
the output is that something in the networking open() code is broken (or
needs a special lockdep annotation).
[ 134.540375] janz-cmodio 0000:01:0e.0: PCI->APIC IRQ transform: INT A -> IRQ
22
[ 134.547629] janz-cmodio 0000:01:0e.0: setting latency timer to 64
[ 134.684269] janz-ican3 janz-ican3.0: module 0: registered CAN device
[ 134.807929] janz-ican3 janz-ican3.1: module 1: registered CAN device
[ 175.361588] janz-ican3 janz-ican3.0: ican3_xmit: stopping queue
[ 175.373104] janz-ican3 janz-ican3.0: ican3_recv_skb: waking queue
[ 175.379556]
[ 175.379559] ======================================================
[ 175.383277] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
[ 175.383277] 2.6.33-rc7-00000-rc7 #3
[ 175.383277] ------------------------------------------------------
[ 175.383277] events/0/5 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire:
[ 175.383277] (clock-AF_CAN){++.-..}, at: [<c11920e7>]
sock_def_write_space+0x15/0x8d
[ 175.383277]
[ 175.383277] and this task is already holding:
[ 175.383277] (&(&mod->lock)->rlock){-.-...}, at: [<f898df29>]
ican3_xmit+0x48/0x277 [janz_ican3]
[ 175.383277] which would create a new lock dependency:
[ 175.383277] (&(&mod->lock)->rlock){-.-...} -> (clock-AF_CAN){++.-..}
[ 175.383277]
[ 175.383277] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 175.383277] (&(&mod->lock)->rlock){-.-...}
[ 175.383277] ... which became HARDIRQ-irq-safe at:
[ 175.383277] [<c103d47f>] __lock_acquire+0x4ce/0x756
[ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c
[ 175.383277] [<c12037a7>] _raw_spin_lock_irqsave+0x24/0x34
[ 175.383277] [<f898d3c9>] ican3_irq+0x41/0x164 [janz_ican3]
[ 175.383277] [<c10463ab>] handle_IRQ_event+0x1d/0x9a
[ 175.383277] [<c1047872>] handle_fasteoi_irq+0x9a/0xa7
[ 175.383277]
[ 175.383277] to a HARDIRQ-irq-unsafe lock:
[ 175.383277] (clock-AF_CAN){++.-..}
[ 175.383277] ... which became HARDIRQ-irq-unsafe at:
[ 175.383277] ... [<c103d500>] __lock_acquire+0x54f/0x756
[ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c
[ 175.383277] [<c12038da>] _raw_write_lock_bh+0x20/0x2f
[ 175.383277] [<c11e3be7>] raw_release+0xf9/0x14c
[ 175.383277] [<c118f55c>] sock_release+0x14/0x58
[ 175.383277] [<c118fc4c>] sock_close+0x1c/0x20
[ 175.383277] [<c106fc41>] __fput+0xeb/0x187
[ 175.383277] [<c106fe8d>] fput+0x16/0x18
[ 175.383277] [<c106d873>] filp_close+0x51/0x5b
[ 175.383277] [<c106e9c3>] sys_close+0x67/0xa0
[ 175.383277] [<c1002657>] sysenter_do_call+0x12/0x36
[ 175.383277]
[ 175.383277] other info that might help us debug this:
[ 175.383277]
[ 175.383277] 4 locks held by events/0/5:
[ 175.383277] #0: (events){+.+.+.}, at: [<c102ce51>] worker_thread+0xc9/0x1c9
[ 175.383277] #1: ((&mod->work)){+.+...}, at: [<c102ce51>]
worker_thread+0xc9/0x1c9
[ 175.383277] #2: (_xmit_NONE#2){+.-...}, at: [<c11a693a>]
sch_direct_xmit+0x2d/0x113
[ 175.383277] #3: (&(&mod->lock)->rlock){-.-...}, at: [<f898df29>]
ican3_xmit+0x48/0x277 [janz_ican3]
[ 175.383277]
[ 175.383277] the dependencies between HARDIRQ-irq-safe lock and the holding
lock:
[ 175.383277] -> (&(&mod->lock)->rlock){-.-...} ops: 0 {
[ 175.383277] IN-HARDIRQ-W at:
[ 175.383277] [<c103d47f>]
__lock_acquire+0x4ce/0x756
[ 175.383277] [<c103e209>]
lock_acquire+0x45/0x5c
[ 175.383277] [<c12037a7>]
_raw_spin_lock_irqsave+0x24/0x34
[ 175.383277] [<f898d3c9>]
ican3_irq+0x41/0x164 [janz_ican3]
[ 175.383277] [<c10463ab>]
handle_IRQ_event+0x1d/0x9a
[ 175.383277] [<c1047872>]
handle_fasteoi_irq+0x9a/0xa7
[ 175.383277] IN-SOFTIRQ-W at:
[ 175.383277] [<c103d4a3>]
__lock_acquire+0x4f2/0x756
[ 175.383277] [<c103e209>]
lock_acquire+0x45/0x5c
[ 175.383277] [<c12037a7>]
_raw_spin_lock_irqsave+0x24/0x34
[ 175.383277] [<f898d3c9>]
ican3_irq+0x41/0x164 [janz_ican3]
[ 175.383277] [<c10463ab>]
handle_IRQ_event+0x1d/0x9a
[ 175.383277] [<c1047872>]
handle_fasteoi_irq+0x9a/0xa7
[ 175.383277] INITIAL USE at:
[ 175.383277] [<c103d571>]
__lock_acquire+0x5c0/0x756
[ 175.383277] [<c103e209>]
lock_acquire+0x45/0x5c
[ 175.383277] [<c12037a7>]
_raw_spin_lock_irqsave+0x24/0x34
[ 175.383277] [<f898d06e>]
ican3_send_msg+0x1c/0x1e6 [janz_ican3]
[ 175.383277] [<f898e64d>]
ican3_probe+0x421/0xc91 [janz_ican3]
[ 175.383277] [<c1140dcd>]
platform_drv_probe+0xc/0xe
[ 175.383277] [<c114022c>]
driver_probe_device+0x7d/0xf0
[ 175.383277] [<c1140421>]
__device_attach+0x28/0x30
[ 175.383277] [<c113f721>]
bus_for_each_drv+0x39/0x62
[ 175.383277] [<c1140362>]
device_attach+0x45/0x59
[ 175.383277] [<c113f6d3>]
bus_probe_device+0x1f/0x34
[ 175.383277] [<c113e618>]
device_add+0x2f5/0x43b
[ 175.383277] [<c11410d1>]
platform_device_add+0xd9/0x119
[ 175.383277] [<c1141126>]
platform_device_register+0x15/0x18
[ 175.383277] [<f89a92df>]
cmodio_pci_probe+0x228/0x2cd [janz_cmodio]
[ 175.383277] [<c10f5100>]
local_pci_probe+0xe/0x10
[ 175.383277] [<c10f52cf>]
pci_device_probe+0x48/0x66
[ 175.383277] [<c114022c>]
driver_probe_device+0x7d/0xf0
[ 175.383277] [<c11402e2>]
__driver_attach+0x43/0x5f
[ 175.383277] [<c113f950>]
bus_for_each_dev+0x39/0x5a
[ 175.383277] [<c11400ff>]
driver_attach+0x14/0x16
[ 175.383277] [<c113fdd7>]
bus_add_driver+0xa2/0x1c8
[ 175.383277] [<c11405ab>]
driver_register+0x7b/0xd7
[ 175.383277] [<c10f54c5>]
__pci_register_driver+0x4d/0xa0
[ 175.383277] [<f89ac017>] 0xf89ac017
[ 175.383277] [<c100104b>]
do_one_initcall+0x4b/0x135
[ 175.383277] [<c104475e>]
sys_init_module+0xa9/0x1e3
[ 175.383277] [<c1002657>]
sysenter_do_call+0x12/0x36
[ 175.383277] }
[ 175.383277] ... key at: [<f898f61c>] __key.26093+0x0/0xfffff8a1
[janz_ican3]
[ 175.383277] ... acquired at:
[ 175.383277] [<c103c497>] check_irq_usage+0x50/0xad
[ 175.383277] [<c103caf8>] validate_chain+0x604/0xabd
[ 175.383277] [<c103d69c>] __lock_acquire+0x6eb/0x756
[ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c
[ 175.383277] [<c1203adb>] _raw_read_lock+0x1b/0x2a
[ 175.383277] [<c11920e7>] sock_def_write_space+0x15/0x8d
[ 175.383277] [<c11918d3>] sock_wfree+0x25/0x41
[ 175.383277] [<c1193503>] skb_release_head_state+0x43/0x46
[ 175.383277] [<c11945c0>] skb_release_all+0xb/0x15
[ 175.383277] [<c1193e41>] __kfree_skb+0xb/0x69
[ 175.383277] [<c1193ef1>] kfree_skb+0x28/0x2a
[ 175.383277] [<f898e0ef>] ican3_xmit+0x20e/0x277 [janz_ican3]
[ 175.383277] [<c119a597>] dev_hard_start_xmit+0x1c7/0x25d
[ 175.383277] [<c11a695a>] sch_direct_xmit+0x4d/0x113
[ 175.383277] [<c11a6ab0>] __qdisc_run+0x90/0xa2
[ 175.383277] [<c119c90d>] net_tx_action+0xa7/0xff
[ 175.383277] [<c102384e>] __do_softirq+0x75/0xf3
[ 175.383277]
[ 175.383277]
[ 175.383277] the dependencies between the lock to be acquired and
HARDIRQ-irq-unsafe lock:
[ 175.383277] -> (clock-AF_CAN){++.-..} ops: 0 {
[ 175.383277] HARDIRQ-ON-W at:
[ 175.383277] [<c103d500>]
__lock_acquire+0x54f/0x756
[ 175.383277] [<c103e209>]
lock_acquire+0x45/0x5c
[ 175.383277] [<c12038da>]
_raw_write_lock_bh+0x20/0x2f
[ 175.383277] [<c11e3be7>]
raw_release+0xf9/0x14c
[ 175.383277] [<c118f55c>]
sock_release+0x14/0x58
[ 175.383277] [<c118fc4c>]
sock_close+0x1c/0x20
[ 175.383277] [<c106fc41>]
__fput+0xeb/0x187
[ 175.383277] [<c106fe8d>]
fput+0x16/0x18
[ 175.383277] [<c106d873>]
filp_close+0x51/0x5b
[ 175.383277] [<c106e9c3>]
sys_close+0x67/0xa0
[ 175.383277] [<c1002657>]
sysenter_do_call+0x12/0x36
[ 175.383277] HARDIRQ-ON-R at:
[ 175.383277] [<c103d4d0>]
__lock_acquire+0x51f/0x756
[ 175.383277] [<c103e209>]
lock_acquire+0x45/0x5c
[ 175.383277] [<c1203adb>]
_raw_read_lock+0x1b/0x2a
[ 175.383277] [<c119207c>]
sock_def_readable+0x15/0x6b
[ 175.383277] [<c119149d>]
sock_queue_rcv_skb+0xfc/0x115
[ 175.383277] [<c11e3901>]
raw_rcv+0x4a/0x5a
[ 175.383277] [<c11e2491>]
can_rcv_filter+0x7a/0x147
[ 175.383277] [<c11e2765>]
can_rcv+0xa8/0xed
[ 175.383277] [<c1199aec>]
netif_receive_skb+0x288/0x2bb
[ 175.383277] [<c119c1b5>]
process_backlog+0x63/0x8c
[ 175.383277] [<c119c5e1>]
net_rx_action+0x4b/0x104
[ 175.383277] [<c102384e>]
__do_softirq+0x75/0xf3
[ 175.383277] IN-SOFTIRQ-R at:
[ 175.383277] [<c103d4a3>]
__lock_acquire+0x4f2/0x756
[ 175.383277] [<c103e209>]
lock_acquire+0x45/0x5c
[ 175.383277] [<c1203adb>]
_raw_read_lock+0x1b/0x2a
[ 175.383277] [<c119207c>]
sock_def_readable+0x15/0x6b
[ 175.383277] [<c119149d>]
sock_queue_rcv_skb+0xfc/0x115
[ 175.383277] [<c11e3901>]
raw_rcv+0x4a/0x5a
[ 175.383277] [<c11e2491>]
can_rcv_filter+0x7a/0x147
[ 175.383277] [<c11e2765>]
can_rcv+0xa8/0xed
[ 175.383277] [<c1199aec>]
netif_receive_skb+0x288/0x2bb
[ 175.383277] [<c119c1b5>]
process_backlog+0x63/0x8c
[ 175.383277] [<c119c5e1>]
net_rx_action+0x4b/0x104
[ 175.383277] [<c102384e>]
__do_softirq+0x75/0xf3
[ 175.383277] INITIAL USE at:
[ 175.383277] [<c103d571>]
__lock_acquire+0x5c0/0x756
[ 175.383277] [<c103e209>]
lock_acquire+0x45/0x5c
[ 175.383277] [<c1203adb>]
_raw_read_lock+0x1b/0x2a
[ 175.383277] [<c11920e7>]
sock_def_write_space+0x15/0x8d
[ 175.383277] [<c11918d3>]
sock_wfree+0x25/0x41
[ 175.383277] [<c1193503>]
skb_release_head_state+0x43/0x46
[ 175.383277] [<c11945c0>]
skb_release_all+0xb/0x15
[ 175.383277] [<c1193e41>]
__kfree_skb+0xb/0x69
[ 175.383277] [<c1193ef1>]
kfree_skb+0x28/0x2a
[ 175.383277] [<f898e0ef>]
ican3_xmit+0x20e/0x277 [janz_ican3]
[ 175.383277] [<c119a597>]
dev_hard_start_xmit+0x1c7/0x25d
[ 175.383277] [<c11a695a>]
sch_direct_xmit+0x4d/0x113
[ 175.383277] [<c119d15d>]
dev_queue_xmit+0x22c/0x397
[ 175.383277] [<c11e267e>]
can_send+0xd0/0x10f
[ 175.383277] [<c11e3882>]
raw_sendmsg+0xc3/0xf8
[ 175.383277] [<c118df5f>]
sock_aio_write+0xc5/0xce
[ 175.383277] [<c106edf8>]
do_sync_write+0x88/0xca
[ 175.383277] [<c106f49d>]
vfs_write+0x9d/0x116
[ 175.383277] [<c106f86a>]
sys_write+0x3b/0x60
[ 175.383277] [<c1002657>]
sysenter_do_call+0x12/0x36
[ 175.383277] }
[ 175.383277] ... key at: [<c186ba04>] af_callback_keys+0xe8/0x128
[ 175.383277] ... acquired at:
[ 175.383277] [<c103c497>] check_irq_usage+0x50/0xad
[ 175.383277] [<c103caf8>] validate_chain+0x604/0xabd
[ 175.383277] [<c103d69c>] __lock_acquire+0x6eb/0x756
[ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c
[ 175.383277] [<c1203adb>] _raw_read_lock+0x1b/0x2a
[ 175.383277] [<c11920e7>] sock_def_write_space+0x15/0x8d
[ 175.383277] [<c11918d3>] sock_wfree+0x25/0x41
[ 175.383277] [<c1193503>] skb_release_head_state+0x43/0x46
[ 175.383277] [<c11945c0>] skb_release_all+0xb/0x15
[ 175.383277] [<c1193e41>] __kfree_skb+0xb/0x69
[ 175.383277] [<c1193ef1>] kfree_skb+0x28/0x2a
[ 175.383277] [<f898e0ef>] ican3_xmit+0x20e/0x277 [janz_ican3]
[ 175.383277] [<c119a597>] dev_hard_start_xmit+0x1c7/0x25d
[ 175.383277] [<c11a695a>] sch_direct_xmit+0x4d/0x113
[ 175.383277] [<c11a6ab0>] __qdisc_run+0x90/0xa2
[ 175.383277] [<c119c90d>] net_tx_action+0xa7/0xff
[ 175.383277] [<c102384e>] __do_softirq+0x75/0xf3
[ 175.383277]
> Another improvement for the cangen utility is to exit after a certain
> number of messages. So you can match the number of generated frames with
> the rx/tx counters of the CAN cards and interrupts (cat /proc/interrupts)
>
Yeah, that would be a good improvement too.
> > can0 Link encap:UNSPEC HWaddr
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> > UP RUNNING NOARP MTU:16 Metric:1
> > RX packets:461 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:608 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:10
> > RX bytes:2616 (2.5 KiB) TX bytes:3434 (3.3 KiB)
> > Interrupt:22
> >
> > can1 Link encap:UNSPEC HWaddr
> > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> > UP RUNNING NOARP MTU:16 Metric:1
> > RX packets:608 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:10
> > RX bytes:3434 (3.3 KiB) TX bytes:0 (0.0 b)
> > Interrupt:22
> >
> > Any ideas what I am doing wrong?
>
> cheers, Marc
>
> --
> Pengutronix e.K. | Marc Kleine-Budde |
> Industrial Linux Solutions | Phone: +49-231-2826-924 |
> Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
>
_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core