You have been subscribed to a public bug: Ubuntu 12.04 LTS was installed on 2 servers on a SM15000-XE system. Bidirectional iperf traffic was started on 7 nics between the 2 servers. After a few hours, following stack trace was observed on the dmesg. Sluggish network performance was observed after that
[15640.958682] sched: RT throttling activated [16924.585541] ------------[ cut here ]------------ [16924.585549] WARNING: at /build/buildd/linux-3.2.0/net/sched/sch_generic.c:255 dev_watchdog+0x25a/0x270() [16924.585551] Hardware name: Sabre2 [16924.585552] NETDEV WATCHDOG: eth6 (e1000): transmit queue 0 timed out [16924.585553] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000 [16924.585564] Pid: 0, comm: swapper/7 Not tainted 3.2.0-23-generic #36-Ubuntu [16924.585566] Call Trace: > [16924.585567] <IRQ> [<ffffffff8106712f>] warn_slowpath_common+0x7f/0xc0 > [16924.585575] [<ffffffff81067226>] warn_slowpath_fmt+0x46/0x50 > [16924.585578] [<ffffffff8155e22a>] dev_watchdog+0x25a/0x270 [16924.585581] > [<ffffffff81110b80>] ? perf_rotate_context+0x110/0x220 [16924.585585] > [<ffffffff81083030>] ? __queue_work+0x320/0x320 [16924.585587] > [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585589] > [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585592] > [<ffffffff81076086>] call_timer_fn+0x46/0x160 [16924.585594] > [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585597] > [<ffffffff810779d2>] run_timer_softirq+0x132/0x2a0 [16924.585600] > [<ffffffff810950c5>] ? ktime_get+0x65/0xe0 [16924.585604] > [<ffffffff8106e928>] __do_softirq+0xa8/0x210 [16924.585607] > [<ffffffff8101a779>] ? read_tsc+0x9/0x20 [16924.585609] [<ffffffff8109c054>] > ? tick_program_event+0x24/0x30 [16924.585613] [<ffffffff81666cec>] > call_softirq+0x1c/0x30 [16924.585616] [<ffffffff81015305>] do_so ftirq+0x65/0xa0 [16924.585619] [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0 [16924.585622] [<ffffffff8166768e>] smp_apic_timer_interrupt+0x6e/0x99 > [16924.585625] [<ffffffff8166555e>] apic_timer_interrupt+0x6e/0x80 > [16924.585626] <EOI> [<ffffffff8136b42d>] ? intel_idle+0xed/0x150 > [16924.585631] [<ffffffff8136b40f>] ? intel_idle+0xcf/0x150 [16924.585634] > [<ffffffff81504a01>] cpuidle_idle_call+0xc1/0x280 [16924.585637] > [<ffffffff8101222a>] cpu_idle+0xca/0x120 [16924.585640] [<ffffffff8163a7fa>] > start_secondary+0xd9/0xdb [16924.585642] ---[ end trace 3c74a3d373267b03 ]--- [16941.890574] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] [16941.963686] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000 [16941.963697] CPU 0 [16941.963698] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000 [16941.963704] [16941.963706] Pid: 0, comm: swapper/0 Tainted: G W 3.2.0-23-generic #36-Ubuntu SeaMicro Sabre2/Type2 - Board Product Name1 [16941.963710] RIP: 0010:[<ffffffff81531e2f>] [<ffffffff81531e2f>] __alloc_skb+0xcf/0x240 [16941.963718] RSP: 0018:ffff88043fc03c70 EFLAGS: 00010202 [16941.963719] RAX: 0000000000000000 RBX: ffff88043fc03c60 RCX: 000000000000000a [16941.963721] RDX: ffff88040f876680 RSI: 00000000000000cc RDI: ffff880420dab478 [16941.963722] RBP: ffff88043fc03cb0 R08: 0000000000000680 R09: 0000000000000800 [16941.963724] R10: ffff880420dab500 R11: 0000000000000001 R12: ffff88043fc03be8 [16941.963725] R13: ffffffff8166555e R14: ffff88043fc03cb0 R15: ffff88040f876000 [16941.963727] FS: 0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000 [16941.963729] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [16941.963730] CR2: 00007f19afb0ad50 CR3: 0000000210999000 CR4: 00000000000406f0 [16941.963732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [16941.963734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [16941.963735] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0d020) [16941.963737] Stack: [16941.987830] ffff88043fc03ca0 00000632813265a4 ffff880424796090 ffff880421200000 [16941.987834] 0000000000000000 00000000000000ff ffff88040f875840 ffff8804218a7a80 [16941.987837] ffff88043fc03cd0 ffffffff81532424 00000000000000fe ffffc90001877fd0 [16941.987840] Call Trace: [16942.017153] <IRQ> [16942.042416] [<ffffffff81532424>] __netdev_alloc_skb+0x24/0x50 [16942.042423] [<ffffffffa0003861>] e1000_alloc_rx_buffers+0x291/0x4f0 [e1000] [16942.042427] [<ffffffff813265c0>] ? map_single+0x60/0x60 [16942.042430] [<ffffffffa000287a>] e1000_clean_rx_irq+0x3ba/0x4d0 [e1000] [16942.042434] [<ffffffffa0003561>] e1000_clean+0x51/0xc0 [e1000] [16942.042438] [<ffffffff81540ad4>] net_rx_action+0x134/0x290 [16942.042442] [<ffffffff8106e928>] __do_softirq+0xa8/0x210 [16942.042446] [<ffffffff810334ee>] ? eoi_ioapic_irq.isra.23+0x5e/0x70 [16942.042450] [<ffffffff81666cec>] call_softirq+0x1c/0x30 [16942.042453] [<ffffffff81015305>] do_softirq+0x65/0xa0 [16942.042456] [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0 [16942.042458] [<ffffffff816675a3>] do_IRQ+0x63/0xe0 [16942.042462] [<ffffffff8165c92e>] common_interrupt+0x6e/0x6e [16942.042463] <EOI> [16942.067708] [<ffffffff8108ed70>] ? hrtimer_try_to_cancel+0x50/0xc0 [16942.067712] [<ffffffff8109cdb6>] ? tick_nohz_restart_sched_tick+0x106/0x130 [16942.067715] [<ffffffff8109cdb2>] ? tick_nohz_restart_sched_tick+0x102/0x130 [16942.067719] [<ffffffff81012263>] cpu_idle+0x103/0x120 [16942.067723] [<ffffffff816235ae>] rest_init+0x72/0x74 [16942.067728] [<ffffffff81cfbc0d>] start_kernel+0x3ba/0x3c7 [16942.067731] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136 [16942.067734] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140 [16942.067737] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc [16942.067739] Code: df be cc 00 00 00 0f 85 48 01 00 00 40 f6 c7 02 0f 85 56 01 00 00 40 f6 c7 04 0f 85 1c 01 00 00 89 f1 31 c0 c1 e9 03 40 f6 c6 04 <f3> 48 ab 0f 85 f0 00 00 00 40 f6 c6 02 0f 85 ce 00 00 00 83 e6 [16942.301153] Call Trace: [16942.301156] <IRQ> [<ffffffff81532424>] __netdev_alloc_skb+0x24/0x50 [16942.301168] [<ffffffffa0003861>] e1000_alloc_rx_buffers+0x291/0x4f0 [e1000] [16942.301172] [<ffffffff813265c0>] ? map_single+0x60/0x60 [16942.301175] [<ffffffffa000287a>] e1000_clean_rx_irq+0x3ba/0x4d0 [e1000] [16942.301178] [<ffffffffa0003561>] e1000_clean+0x51/0xc0 [e1000] [16942.301181] [<ffffffff81540ad4>] net_rx_action+0x134/0x290 [16942.301186] [<ffffffff8106e928>] __do_softirq+0xa8/0x210 [16942.301189] [<ffffffff810334ee>] ? eoi_ioapic_irq.isra.23+0x5e/0x70 [16942.301193] [<ffffffff81666cec>] call_softirq+0x1c/0x30 [16942.301196] [<ffffffff81015305>] do_softirq+0x65/0xa0 [16942.301199] [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0 [16942.301201] [<ffffffff816675a3>] do_IRQ+0x63/0xe0 [16942.301204] [<ffffffff8165c92e>] common_interrupt+0x6e/0x6e [16942.301205] <EOI> [<ffffffff8108ed70>] ? hrtimer_try_to_cancel+0x50/0xc0 [16942.301211] [<ffffffff8109cdb6>] ? tick_nohz_restart_sched_tick+0x106/0x130 [16942.301214] [<ffffffff8109cdb2>] ? tick_nohz_restart_sched_tick+0x102/0x130 [16942.301216] [<ffffffff81012263>] cpu_idle+0x103/0x120 [16942.301220] [<ffffffff816235ae>] rest_init+0x72/0x74 [16942.301233] [<ffffffff81cfbc0d>] start_kernel+0x3ba/0x3c7 [16942.301236] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136 [16942.301239] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140 [16942.301242] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc [17102.391716] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang [17102.391717] Tx Queue <0> [17102.391718] TDH <eb> [17102.391719] TDT <d3> [17102.391719] next_to_use <d3> [17102.391720] next_to_clean <e5> [17102.391720] buffer_info[next_to_clean] [17102.391721] time_stamp <100402f9f> [17102.391722] next_to_watch <f8> [17102.391722] jiffies <100403264> [17102.391723] next_to_watch.status <0> [17134.426481] INFO: task kworker/7:2:5025 blocked for more than 120 seconds. [17134.509027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17134.603042] kworker/7:2 D 0000000000000007 0 5025 2 0x00000000 [17134.603047] ffff880413843af0 0000000000000046 0000000000000000 0000000000000000 [17134.603061] ffff880413843fd8 ffff880413843fd8 ffff880413843fd8 0000000000013780 [17134.603066] ffff880420e75bc0 ffff880416ecdbc0 0000000000000000 7fffffffffffffff [17134.603070] Call Trace: [17134.603080] [<ffffffff8165a55f>] schedule+0x3f/0x60 [17134.603084] [<ffffffff8165aba5>] schedule_timeout+0x2a5/0x320 [17134.603088] [<ffffffff8165a39f>] wait_for_common+0xdf/0x180 [17134.603094] [<ffffffff81055fdd>] ? set_next_entity+0xad/0xd0 [17134.603099] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200 [17134.603102] [<ffffffff8165a51d>] wait_for_completion+0x1d/0x20 [17134.603108] [<ffffffff81084b53>] wait_on_cpu_work+0xd3/0xe0 [17134.603112] [<ffffffff810822d0>] ? do_work_for_cpu+0x30/0x30 [17134.603119] [<ffffffff81084c14>] wait_on_work+0xb4/0x120 [17134.603123] [<ffffffff81085643>] __cancel_work_timer+0x73/0x80 [17134.603127] [<ffffffff81085680>] cancel_work_sync+0x10/0x20 [17134.603136] [<ffffffffa0000875>] e1000_down_and_stop+0x25/0x50 [e1000] [17134.603150] [<ffffffffa0004e4f>] e1000_down+0x14f/0x200 [e1000] [17134.603155] [<ffffffffa0006ef0>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000] [17134.603160] [<ffffffffa0006f5e>] e1000_reset_task+0x6e/0x90 [e1000] [17134.603162] [<ffffffff81084e2a>] process_one_work+0x11a/0x480 [17134.603165] [<ffffffff81085bd4>] worker_thread+0x164/0x370 [17134.603168] [<ffffffff81085a70>] ? manage_workers.isra.29+0x130/0x130 [17134.603170] [<ffffffff8108a42c>] kthread+0x8c/0xa0 [17134.603173] [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10 [17134.603176] [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0 [17134.603178] [<ffffffff81666bf0>] ? gs_change+0x13/0x13 [17134.603184] INFO: task kworker/7:1:5691 blocked for more than 120 seconds. [17134.685722] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17134.779732] kworker/7:1 D 0000000000000007 0 5691 2 0x00000000 [17134.779738] ffff8801d0e29d10 0000000000000046 0000000000000000 0000000000000000 [17134.779745] ffff8801d0e29fd8 ffff8801d0e29fd8 ffff8801d0e29fd8 0000000000013780 [17134.779750] ffff880425775bc0 ffff88020f8d96f0 ffff880425794600 ffff880421205040 [17134.779754] Call Trace: [17134.779762] [<ffffffff8165a55f>] schedule+0x3f/0x60 [17134.779764] [<ffffffff8165b367>] __mutex_lock_slowpath+0xd7/0x150 [17134.779767] [<ffffffff8165af7a>] mutex_lock+0x2a/0x50 [17134.779786] [<ffffffffa0005d31>] e1000_watchdog+0x71/0x3d0 [e1000] [17134.779792] [<ffffffffa0005cc0>] ? e1000_update_stats+0x970/0x970 [e1000] [17134.779798] [<ffffffff81084e2a>] process_one_work+0x11a/0x480 [17134.779802] [<ffffffff81085bd4>] worker_thread+0x164/0x370 [17134.779808] [<ffffffff81085a70>] ? manage_workers.isra.29+0x130/0x130 [17134.779812] [<ffffffff8108a42c>] kthread+0x8c/0xa0 [17134.779815] [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10 [17134.779817] [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0 [17134.779822] [<ffffffff81666bf0>] ? gs_change+0x13/0x13 [17146.285107] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang [17146.285108] Tx Queue <0> [17146.285109] TDH <ed> [17146.285109] TDT <37> [17146.285110] next_to_use <37> [17146.285111] next_to_clean <eb> [17146.285111] buffer_info[next_to_clean] [17146.285112] time_stamp <100405c55> [17146.285112] next_to_watch <ee> [17146.285113] jiffies <100405d53> [17146.285113] next_to_watch.status <0> [17148.215140] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang [17148.215141] Tx Queue <0> [17148.215142] TDH <a> [17148.215142] TDT <f2> [17148.215143] next_to_use <f2> [17148.215143] next_to_clean <5> [17148.215144] buffer_info[next_to_clean] [17148.215144] time_stamp <100405cb2> [17148.215145] next_to_watch <10> [17148.215145] jiffies <100405f36> [17148.215146] next_to_watch.status <0> [17223.041709] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang [17223.041710] Tx Queue <0> [17223.041711] TDH <bd> [17223.041711] TDT <20> [17223.041712] next_to_use <20> [17223.041712] next_to_clean <b4> [17223.041713] buffer_info[next_to_clean] [17223.041713] time_stamp <10040a73e> [17223.041714] next_to_watch <be> [17223.041714] jiffies <10040a866> [17223.041715] next_to_watch.status <0> [17231.053339] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang [17231.053340] Tx Queue <0> [17231.053341] TDH <f5> [17231.053341] TDT <75> [17231.053342] next_to_use <75> [17231.053342] next_to_clean <f1> [17231.053343] buffer_info[next_to_clean] [17231.053343] time_stamp <10040af0d> [17231.053344] next_to_watch <f9> [17231.053344] jiffies <10040b03b> [17231.053345] next_to_watch.status <0> [17254.591185] INFO: task kworker/7:2:5025 blocked for more than 120 seconds. [17254.673724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17254.767714] kworker/7:2 D 0000000000000007 0 5025 2 0x00000000 INTERRUPT root@ubuntu ~# cat /proc/interrupts 100% 243KB 242.8KB/s 00:00 > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > CPU6 CPU7 > 0: 87 0 0 0 0 0 > 0 0 IR-IO-APIC-edge timer > 4: 17735 0 14263 0 30126 0 > 0 0 IR-IO-APIC-edge serial > 8: 1 0 0 0 0 0 > 0 0 IR-IO-APIC-edge rtc0 > 9: 0 0 0 0 0 0 > 0 0 IR-IO-APIC-fasteoi acpi > 16: 16604523 0 0 12120410 0 0 > 0 0 IR-IO-APIC-fasteoi ahci, ahci, ahci, eth1, eth4 > 17: 1678725 0 0 8996499 0 0 > 0 0 IR-IO-APIC-fasteoi ahci, ahci, ahci, eth0, eth6 > 18: 45881173 0 0 0 0 0 > 0 0 IR-IO-APIC-fasteoi ahci, ahci, eth3, eth7 > 19: 59172483 0 0 20836 0 0 > 0 0 IR-IO-APIC-fasteoi eth2, eth5 > 40: 0 0 0 0 0 0 > 0 0 DMAR_MSI-edge dmar0 > 41: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > 42: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > 43: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > 44: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > 45: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > 46: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > 47: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > 48: 0 0 0 0 0 0 > 0 0 IR-PCI-MSI-edge PCIe PME > NMI: 5647 157 1546 1659 1984 356 > 997 572 Non-maskable interrupts > LOC: 964914 979936 1409782 582208 1756677 1311067 > 2095033 1656838 Local timer interrupts > SPU: 0 0 0 0 0 0 > 0 0 Spurious interrupts > PMI: 5647 157 1546 1659 1984 356 > 997 572 Performance monitoring interrupts > IWI: 0 0 0 0 0 0 > 0 0 IRQ work interrupts > RES: 147237 712096 22528993 144356 10341760 488304 > 3966861 554994 Rescheduling interrupts > CAL: 174 221 212 149 223 212 > 205 218 Function call interrupts > TLB: 153 210 252 1558 299 517 > 361 682 TLB shootdowns > TRM: 0 0 0 0 0 0 > 0 0 Thermal event interrupts > THR: 0 0 0 0 0 0 > 0 0 Threshold APIC interrupts > MCE: 0 0 0 0 0 0 > 0 0 Machine check exceptions > MCP: 45 45 45 45 45 45 > 45 45 Machine check polls ** Affects: irqbalance (Ubuntu) Importance: Undecided Status: Confirmed ** Tags: bot-comment -- On running network and I/O traffic on SM15000-XE, "BUG: soft lockup -- CPU#0 stuck for 22s!" seen on dmesg https://bugs.launchpad.net/bugs/1183135 You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to irqbalance in Ubuntu. -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs