Package: linux-image-2.6.18-5-xen-686 Version: 2.6.18.dfsg.1-13etch4 Severity: grave Justification: renders package unusable
Hi, In the past couple of months two of my Xen dom0 servers have, after about a week of uptime, been reporting kernel errors like so: Nov 14 18:57:58 corona kernel: swapper: page allocation failure. order:0, mode:0x20 Nov 14 18:57:58 corona kernel: [<c0140735>] __alloc_pages+0x261/0x275 Nov 14 18:57:58 corona kernel: [<c01561c2>] cache_alloc_refill+0x297/0x493 Nov 14 18:57:58 corona kernel: [<c0104a51>] hypervisor_callback+0x3d/0x48 Nov 14 18:57:58 corona kernel: [<c020007b>] handle_diacr+0x58/0xad Nov 14 18:57:58 corona kernel: [<c0155f12>] kmem_cache_alloc+0x3b/0x54 Nov 14 18:57:58 corona kernel: [<c022e995>] alloc_skb_from_cache+0x48/0x110 Nov 14 18:57:58 corona kernel: [<c020d708>] __alloc_skb+0x6c/0x70 Nov 14 18:57:58 corona kernel: [<c0215d5b>] netif_be_start_xmit+0x118/0x3d5 Nov 14 18:57:58 corona kernel: [<c023269e>] dev_hard_start_xmit+0x19a/0x1f0 Nov 14 18:57:58 corona kernel: [<c0234020>] dev_queue_xmit+0x247/0x2e3 Nov 14 18:57:58 corona kernel: [<ee406dfe>] br_dev_queue_push_xmit+0x155/0x178 [bridge] Nov 14 18:57:58 corona kernel: [<ee406e64>] br_forward_finish+0x43/0x45 [bridge] Nov 14 18:57:58 corona kernel: [<ee40aae4>] br_nf_forward_finish+0xc6/0xcc [bridge] Nov 14 18:57:58 corona kernel: [<ee40b34a>] br_nf_forward_arp+0x116/0x128 [bridge] Nov 14 18:57:58 corona kernel: [<c0246e28>] nf_iterate+0x30/0x61 Nov 14 18:57:58 corona kernel: [<ee406e21>] br_forward_finish+0x0/0x45 [bridge] Nov 14 18:57:58 corona kernel: [<c0246f4e>] nf_hook_slow+0x3a/0x90 Nov 14 18:57:58 corona kernel: [<ee406e21>] br_forward_finish+0x0/0x45 [bridge] Nov 14 18:57:58 corona kernel: [<ee406eac>] __br_forward+0x46/0x57 [bridge] Nov 14 18:57:58 corona kernel: [<ee406e21>] br_forward_finish+0x0/0x45 [bridge] Nov 14 18:57:58 corona kernel: [<ee406c59>] br_flood+0x65/0x9d [bridge] Nov 14 18:57:58 corona kernel: [<ee406e66>] __br_forward+0x0/0x57 [bridge] Nov 14 18:57:58 corona kernel: [<ee406c9b>] br_flood_forward+0xa/0xc [bridge] Nov 14 18:57:58 corona kernel: [<ee406e66>] __br_forward+0x0/0x57 [bridge] Nov 14 18:57:58 corona kernel: [<ee407868>] br_handle_frame_finish+0x80/0xcf [bridge] Nov 14 18:57:58 corona kernel: [<ee407a16>] br_handle_frame+0x15f/0x179 [bridge] Nov 14 18:57:58 corona kernel: [<c0232231>] netif_receive_skb+0x25e/0x357 Nov 14 18:57:58 corona kernel: [<ee084130>] e1000_clean_rx_irq_ps+0x4a6/0x569 [e1000] Nov 14 18:57:58 corona kernel: [<ee082c4c>] e1000_clean+0x69/0x136 [e1000] Nov 14 18:57:58 corona kernel: [<c0233ce0>] net_rx_action+0x96/0x18f Nov 14 18:57:58 corona kernel: [<c011f41e>] __do_softirq+0x5e/0xc3 Nov 14 18:57:58 corona kernel: [<c011f4bd>] do_softirq+0x3a/0x4a Nov 14 18:57:58 corona kernel: [<c0106131>] do_IRQ+0x48/0x53 Nov 14 18:57:58 corona kernel: [<c020c1cc>] evtchn_do_upcall+0x64/0x9b Nov 14 18:57:58 corona kernel: [<c0104a51>] hypervisor_callback+0x3d/0x48 Nov 14 18:57:58 corona kernel: [<c0107342>] raw_safe_halt+0x8c/0xaf Nov 14 18:57:58 corona kernel: [<c0102c5f>] xen_idle+0x22/0x2e Nov 14 18:57:58 corona kernel: [<c0102d7e>] cpu_idle+0x91/0xab Nov 14 18:57:58 corona kernel: [<c03236fc>] start_kernel+0x378/0x37f Nov 14 18:57:58 corona kernel: Mem-info: Nov 14 18:57:58 corona kernel: DMA per-cpu: Nov 14 18:57:58 corona kernel: cpu 0 hot: high 186, batch 31 used:30 Nov 14 18:57:58 corona kernel: cpu 0 cold: high 62, batch 15 used:55 Nov 14 18:57:58 corona kernel: DMA32 per-cpu: empty Nov 14 18:57:58 corona kernel: Normal per-cpu: empty Nov 14 18:57:58 corona kernel: HighMem per-cpu: Nov 14 18:57:58 corona kernel: cpu 0 hot: high 90, batch 15 used:75 Nov 14 18:57:58 corona kernel: cpu 0 cold: high 30, batch 7 used:6 Nov 14 18:57:58 corona kernel: Free pages: 34404kB (33228kB HighMem) Nov 14 18:57:58 corona kernel: Active:146620 inactive:39375 dirty:10 writeback:0 unstable:0 free:8601 slab:19722 mapped:2949 pagetables:254 Nov 14 18:57:58 corona kernel: DMA free:1176kB min:3452kB low:4312kB high:5176kB active:454452kB inactive:122052kB present:745464kB pages_scanned:0 all_unreclaimable? no Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 204 Nov 14 18:57:58 corona kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 204 Nov 14 18:57:58 corona kernel: Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 1632 Nov 14 18:57:58 corona kernel: HighMem free:33228kB min:204kB low:444kB high:684kB active:132028kB inactive:35448kB present:208904kB pages_scanned:0 all_unreclaimable? no Nov 14 18:57:58 corona kernel: lowmem_reserve[]: 0 0 0 0 Nov 14 18:57:58 corona kernel: DMA: 0*4kB 1*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1176kB Nov 14 18:57:58 corona kernel: DMA32: empty Nov 14 18:57:58 corona kernel: Normal: empty Nov 14 18:57:58 corona kernel: HighMem: 1353*4kB 1691*8kB 439*16kB 129*32kB 17*64kB 8*128kB 4*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 33228kB Nov 14 18:57:58 corona kernel: Swap cache: add 23, delete 0, find 0/0, race 0+0 Nov 14 18:57:58 corona kernel: Free swap = 1975580kB Nov 14 18:57:58 corona kernel: Total swap = 1975672kB Nov 14 18:57:58 corona kernel: Free swap: 1975580kB Nov 14 18:57:58 corona kernel: 238592 pages of RAM Nov 14 18:57:58 corona kernel: 52226 pages of HIGHMEM Nov 14 18:57:58 corona kernel: 19812 reserved pages Nov 14 18:57:58 corona kernel: 146572 pages shared Nov 14 18:57:58 corona kernel: 23 pages swap cached Nov 14 18:57:58 corona kernel: 10 pages dirty Nov 14 18:57:58 corona kernel: 0 pages writeback Nov 14 18:57:58 corona kernel: 2949 pages mapped Nov 14 18:57:58 corona kernel: 19722 pages slab Nov 14 18:57:58 corona kernel: 254 pages pagetables This will scroll by for a few minutes during which time networking is completely frozen. The server is usable over serial console but no networking takes place at all. Finally after a few minutes the server comes back to life, network-wise. This will reoccur every couple of hours forcing an eventual reboot. I don't know where to start debugging this, but it only has started happening with linux-image-2.6.18-5-xen-686. I will try downgrading back to linux-image-2.6.18-4-xen-686 just to see if the problem goes away. The dom0 is using the stock debian Xen packages, and dom0_mem kernel command line was used to give dom0 1G RAM. When the above is occuring, top does not suggest that the server is running out of RAM or swap. The usual bridged networking setup is in place. If you need any more information I will be happy to provide. This was also reported in the Xen bugzilla when it last happened: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1097 but as I've had no response to that at all I figured I'd try Debian this time :) Cheers, Andy -- System Information: Debian Release: 4.0 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.18-5-xen-686 Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Versions of packages linux-image-2.6.18-5-xen-686 depends on: ii initramfs-tools 0.85h tools for generating an initramfs ii linux-modules-2.6. 2.6.18.dfsg.1-13etch4 Linux 2.6.18 modules on i686 Versions of packages linux-image-2.6.18-5-xen-686 recommends: ii libc6-xen 2.3.6.ds1-13etch2 GNU C Library: Shared libraries [X -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]