Public bug reported: Binary package hint: mdadm
Ubuntu 8.04 on a Intel Xeon Quad X3220, 8GB RAM DDR2 no ecc, m/b Intel S3200SHC (extensively tested with memtest86 and intel EFI tools) had some problems (soft lockups) two months ago, soon after fresh installation of Ubuntu and Vmware Server 2, seemed resolved by downgrading to 2.6.24-18 and blacklisting iTCO_wdt now I've upgraded to 8.04.2 and moved on new disks with (md) raid 1 rebuilding the largest partition, I obtained several crashes with no log the last, logged, is Mar 18 00:17:07 srv0 mdadm: Rebuild60 event detected on md device /dev/md3 Mar 18 00:29:26 srv0 kernel: [127595.907547] Bad page state in process 'swapper' Mar 18 00:29:26 srv0 kernel: [127595.907550] page:c108bd60 flags:0x40000400 mapping:00000000 mapcount:0 count:0 Mar 18 00:29:26 srv0 kernel: [127595.907551] Trying to fix it up, but a reboot is needed Mar 18 00:29:26 srv0 kernel: [127595.907552] Backtrace: Mar 18 00:29:26 srv0 kernel: [127595.907781] Pid: 0, comm: swapper Tainted: GF 2.6.24-18-server #1 Mar 18 00:29:26 srv0 kernel: [127595.907794] [bad_page+0x63/0xa0] bad_page+0x63/0xa0 Mar 18 00:29:26 srv0 kernel: [127595.907805] [free_hot_cold_page+0x187/0x1a0] free_hot_cold_page+0x187/0x1a0 Mar 18 00:29:26 srv0 kernel: [127595.907810] [raid10:mempool_free+0x88/0xa0] mempool_free+0x88/0xa0 Mar 18 00:29:26 srv0 kernel: [127595.907815] [<f8839b28>] put_buf+0x78/0x90 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.907823] [<f883a1bd>] end_sync_write+0x9d/0x120 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.907831] [<f89a2f08>] ata_scsi_qc_complete+0xa8/0x410 [libata] Mar 18 00:29:26 srv0 kernel: [127595.907847] [try_to_wake_up+0x4e/0x350] try_to_wake_up+0x4e/0x350 Mar 18 00:29:26 srv0 kernel: [127595.907855] [<f883a120>] end_sync_write+0x0/0x120 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.907863] [loop:bio_endio+0x18/0x130] bio_endio+0x18/0x30 Mar 18 00:29:26 srv0 kernel: [127595.907869] [__end_that_request_first+0x9a/0x3b0] __end_that_request_first+0x9a/0x3b0 Mar 18 00:29:26 srv0 kernel: [127595.907875] [raid10:mempool_free+0x88/0xa0] mempool_free+0x88/0xa0 Mar 18 00:29:26 srv0 kernel: [127595.907883] [raid10:mempool_free+0x88/0xa0] mempool_free+0x88/0xa0 Mar 18 00:29:26 srv0 kernel: [127595.907888] [<f8894b49>] scsi_end_request+0x29/0xe0 [scsi_mod] Mar 18 00:29:26 srv0 kernel: [127595.907909] [<f88958b9>] scsi_io_completion+0xa9/0x3d0 [scsi_mod] Mar 18 00:29:26 srv0 kernel: [127595.907928] [rtc_interrupt+0x9e/0x100] rtc_interrupt+0x9e/0x100 Mar 18 00:29:26 srv0 kernel: [127595.907938] [blk_done_softirq+0x60/0x70] blk_done_softirq+0x60/0x70 Mar 18 00:29:26 srv0 kernel: [127595.907942] [__do_softirq+0x82/0x110] __do_softirq+0x82/0x110 Mar 18 00:29:26 srv0 kernel: [127595.907947] [do_softirq+0x55/0x60] do_softirq+0x55/0x60 Mar 18 00:29:26 srv0 kernel: [127595.907950] [irq_exit+0x6d/0x80] irq_exit+0x6d/0x80 Mar 18 00:29:26 srv0 kernel: [127595.907952] [do_IRQ+0x40/0x70] do_IRQ+0x40/0x70 Mar 18 00:29:26 srv0 kernel: [127595.907956] [ktime_get_ts+0x1e/0x60] ktime_get_ts+0x1e/0x60 Mar 18 00:29:26 srv0 kernel: [127595.907960] [common_interrupt+0x23/0x28] common_interrupt+0x23/0x28 Mar 18 00:29:26 srv0 kernel: [127595.907966] [mwait_idle_with_hints+0x46/0x60] mwait_idle_with_hints+0x46/0x60 Mar 18 00:29:26 srv0 kernel: [127595.907970] [cpu_idle+0x73/0xd0] cpu_idle+0x73/0xd0 Mar 18 00:29:26 srv0 kernel: [127595.907976] ======================= Mar 18 00:29:26 srv0 kernel: [127595.908040] BUG: unable to handle kernel paging request at virtual address 8001086c Mar 18 00:29:26 srv0 kernel: [127595.908125] printing eip: c01973a4 *pdpt = 000000002193e001 *pde = 0000000000000000 Mar 18 00:29:26 srv0 kernel: [127595.908217] Oops: 0000 [#1] SMP Mar 18 00:29:26 srv0 kernel: [127595.908264] Modules linked in: vmnet parport_pc vsock(F) vmci vmmon iptable_filter ip_tables x_tables lp parport loop button psmouse serio_raw shpchp pci_hotplug e1000e evdev pcspkr ext3 jbd mbcache sr_mod cdrom sg floppy e1000 ahci libata uhci_hcd ehci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid0 multipath linear thermal processor fan fbcon tileblit font bitblit softcursor fuse sd_mod scsi_mod raid1 md_mod Mar 18 00:29:26 srv0 kernel: [127595.908713] Mar 18 00:29:26 srv0 kernel: [127595.908749] Pid: 10591, comm: md3_resync Tainted: GF B (2.6.24-18-server #1) Mar 18 00:29:26 srv0 kernel: [127595.908831] EIP: 0060:[ext3:__kmalloc+0x64/0xbf0] EFLAGS: 00010086 CPU: 1 Mar 18 00:29:26 srv0 kernel: [127595.908882] EIP is at __kmalloc+0x64/0x110 Mar 18 00:29:26 srv0 kernel: [127595.908928] EAX: 00000000 EBX: 8001086c ECX: c0433260 EDX: f8839772 Mar 18 00:29:26 srv0 kernel: [127595.908982] ESI: 00000286 EDI: c0433260 EBP: c54206d4 ESP: ea5d1dec Mar 18 00:29:26 srv0 kernel: [127595.909035] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Mar 18 00:29:26 srv0 kernel: [127595.909085] Process md3_resync (pid: 10591, ti=ea5d0000 task=e18d3700 task.ti=ea5d0000) Mar 18 00:29:26 srv0 kernel: [127595.909141] Stack: 00031200 00000000 00019200 f7c39f80 f7c05cf8 f5dfb580 00000010 f8839772 Mar 18 00:29:26 srv0 kernel: [127595.909253] f7c39f80 00011200 f883c872 00000000 00000080 027502c5 f7c05cf8 ffffffff Mar 18 00:29:26 srv0 kernel: [127595.909364] f7c39f80 00011210 f5dfb580 00000010 c017579d f88394b3 c13ab3c0 c13ab3c0 Mar 18 00:29:26 srv0 kernel: [127595.909472] Call Trace: Mar 18 00:29:26 srv0 kernel: [127595.909541] [<f8839772>] r1bio_pool_alloc+0x22/0x50 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.909596] [<f883c872>] r1buf_pool_alloc+0x12/0x1f4 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.909652] [raid10:mempool_alloc+0x2d/0x190] mempool_alloc+0x2d/0xe0 Mar 18 00:29:26 srv0 kernel: [127595.909702] [<f88394b3>] raise_barrier+0x13/0x160 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.909756] [md_mod:bio_add_page+0x37/0x190] bio_add_page+0x37/0x50 Mar 18 00:29:26 srv0 kernel: [127595.909806] [<f883b975>] sync_request+0x145/0x6d0 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.909862] [<f883b830>] sync_request+0x0/0x6d0 [raid1] Mar 18 00:29:26 srv0 kernel: [127595.909915] [<f884f6db>] md_do_sync+0x7db/0xb90 [md_mod] Mar 18 00:29:26 srv0 kernel: [127595.909981] [<f8851ea0>] md_thread+0x0/0xe0 [md_mod] Mar 18 00:29:26 srv0 kernel: [127595.910036] [<f8851ec3>] md_thread+0x23/0xe0 [md_mod] Mar 18 00:29:26 srv0 kernel: [127595.910090] [loop:complete+0x40/0xe0] complete+0x40/0x60 Mar 18 00:29:26 srv0 kernel: [127595.910139] [<f8851ea0>] md_thread+0x0/0xe0 [md_mod] Mar 18 00:29:26 srv0 kernel: [127595.910194] [kthread+0x42/0x70] kthread+0x42/0x70 Mar 18 00:29:26 srv0 kernel: [127595.910241] [kthread+0x0/0x70] kthread+0x0/0x70 Mar 18 00:29:26 srv0 kernel: [127595.910288] [kernel_thread_helper+0x7/0x10] kernel_thread_helper+0x7/0x10 Mar 18 00:29:26 srv0 kernel: [127595.910341] ======================= Mar 18 00:29:26 srv0 kernel: [127595.910383] Code: 1f 84 00 00 00 00 00 89 c6 fa 0f 1f 84 00 00 00 00 00 90 64 a1 08 50 49 c0 8b 6c 87 70 8b 5d 00 85 db 0f 84 8a 00 00 00 8b 45 0c <8b> 04 83 89 45 00 89 f0 50 9d 0f 1f 84 00 00 00 00 00 66 83 7c Mar 18 00:29:26 srv0 kernel: [127595.910754] EIP: [ext3:__kmalloc+0x64/0xbf0] __kmalloc+0x64/0x110 SS:ESP 0068:ea5d1dec Mar 18 00:29:26 srv0 kernel: [127595.911133] ---[ end trace a56092253af6cc70 ]--- Mar 18 00:29:26 srv0 kernel: [127595.911228] rtc: lost 5 interrupts in last days obtained these segfaults Mar 16 12:54:31 srv0 kernel: [ 927.147246] modprobe[5407]: segfault at 00000033 eip b7e9aec2 esp bffbf050 error 4 Mar 16 12:57:12 srv0 kernel: [ 1087.063274] md: bind<sda1> <system not responding at all> Mar 16 13:00:57 srv0 syslogd 1.5.0#1ubuntu1: restart. and others sparse segfaults of modprobe, all of them following mdadm operations if we don't try md rebuilding, the system seems stable (we're currently using it as vmware server host) r...@srv0:/opt# dpkg -l | grep mdadm ii mdadm 2.6.3+200709292116+4450e59-3ubuntu3.1 dmesg as attachment r...@srv0:# lsmod Module Size Used by vmnet 46016 16 vsock 20952 0 vmci 53848 1 vsock vmmon 76048 24 iptable_filter 3840 0 ip_tables 14820 1 iptable_filter x_tables 16132 1 ip_tables parport_pc 36644 0 lp 12324 0 parport 37704 2 parport_pc,lp loop 19076 0 psmouse 40208 0 serio_raw 7940 0 shpchp 34452 0 pci_hotplug 30880 1 shpchp evdev 12928 0 button 9232 0 e1000e 98212 0 pcspkr 4224 0 ext3 136584 4 jbd 48404 1 ext3 mbcache 9600 1 ext3 sr_mod 17828 0 cdrom 37280 1 sr_mod sg 36496 0 floppy 59332 0 e1000 126656 0 ahci 28420 9 libata 159344 1 ahci ehci_hcd 38412 0 uhci_hcd 27152 0 usbcore 146028 3 ehci_hcd,uhci_hcd raid10 25856 0 raid456 129040 0 async_xor 4992 1 raid456 async_memcpy 3840 1 raid456 async_tx 9292 3 raid456,async_xor,async_memcpy xor 16136 2 raid456,async_xor raid0 9344 0 multipath 9600 0 linear 7296 0 thermal 16796 0 processor 37000 1 thermal fan 5636 0 fbcon 42656 0 tileblit 3456 1 fbcon font 9472 1 fbcon bitblit 6784 1 fbcon softcursor 3072 1 bitblit fuse 50580 1 sd_mod 30720 12 scsi_mod 151180 4 sr_mod,sg,libata,sd_mod raid1 25728 3 md_mod 82068 9 raid10,raid456,raid0,multipath,linear,raid1 ** Affects: mdadm (Ubuntu) Importance: Undecided Status: New -- harsh crashes during mdadm rebuild [SCARING!!!] https://bugs.launchpad.net/bugs/344748 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs