Public bug reported:

Binary package hint: mdadm

Ubuntu 8.04 on a Intel Xeon Quad X3220, 8GB RAM DDR2 no ecc, m/b Intel
S3200SHC (extensively tested with memtest86 and intel EFI tools)

had some problems (soft lockups) two months ago, soon after fresh
installation of Ubuntu and Vmware Server 2, seemed resolved by
downgrading to 2.6.24-18 and blacklisting  iTCO_wdt

now I've upgraded to 8.04.2 and moved on new disks with (md) raid 1

rebuilding the largest partition, I obtained several crashes with no log

the last, logged, is

Mar 18 00:17:07 srv0 mdadm: Rebuild60 event detected on md device /dev/md3
Mar 18 00:29:26 srv0 kernel: [127595.907547] Bad page state in process 'swapper'
Mar 18 00:29:26 srv0 kernel: [127595.907550] page:c108bd60 flags:0x40000400 
mapping:00000000 mapcount:0 count:0
Mar 18 00:29:26 srv0 kernel: [127595.907551] Trying to fix it up, but a reboot 
is needed
Mar 18 00:29:26 srv0 kernel: [127595.907552] Backtrace:
Mar 18 00:29:26 srv0 kernel: [127595.907781] Pid: 0, comm: swapper Tainted: GF  
     2.6.24-18-server #1
Mar 18 00:29:26 srv0 kernel: [127595.907794]  [bad_page+0x63/0xa0] 
bad_page+0x63/0xa0
Mar 18 00:29:26 srv0 kernel: [127595.907805]  [free_hot_cold_page+0x187/0x1a0] 
free_hot_cold_page+0x187/0x1a0
Mar 18 00:29:26 srv0 kernel: [127595.907810]  [raid10:mempool_free+0x88/0xa0] 
mempool_free+0x88/0xa0
Mar 18 00:29:26 srv0 kernel: [127595.907815]  [<f8839b28>] put_buf+0x78/0x90 
[raid1]
Mar 18 00:29:26 srv0 kernel: [127595.907823]  [<f883a1bd>] 
end_sync_write+0x9d/0x120 [raid1]
Mar 18 00:29:26 srv0 kernel: [127595.907831]  [<f89a2f08>] 
ata_scsi_qc_complete+0xa8/0x410 [libata]
Mar 18 00:29:26 srv0 kernel: [127595.907847]  [try_to_wake_up+0x4e/0x350] 
try_to_wake_up+0x4e/0x350
Mar 18 00:29:26 srv0 kernel: [127595.907855]  [<f883a120>] 
end_sync_write+0x0/0x120 [raid1]
Mar 18 00:29:26 srv0 kernel: [127595.907863]  [loop:bio_endio+0x18/0x130] 
bio_endio+0x18/0x30
Mar 18 00:29:26 srv0 kernel: [127595.907869]  
[__end_that_request_first+0x9a/0x3b0] __end_that_request_first+0x9a/0x3b0
Mar 18 00:29:26 srv0 kernel: [127595.907875]  [raid10:mempool_free+0x88/0xa0] 
mempool_free+0x88/0xa0
Mar 18 00:29:26 srv0 kernel: [127595.907883]  [raid10:mempool_free+0x88/0xa0] 
mempool_free+0x88/0xa0
Mar 18 00:29:26 srv0 kernel: [127595.907888]  [<f8894b49>] 
scsi_end_request+0x29/0xe0 [scsi_mod]
Mar 18 00:29:26 srv0 kernel: [127595.907909]  [<f88958b9>] 
scsi_io_completion+0xa9/0x3d0 [scsi_mod]
Mar 18 00:29:26 srv0 kernel: [127595.907928]  [rtc_interrupt+0x9e/0x100] 
rtc_interrupt+0x9e/0x100
Mar 18 00:29:26 srv0 kernel: [127595.907938]  [blk_done_softirq+0x60/0x70] 
blk_done_softirq+0x60/0x70
Mar 18 00:29:26 srv0 kernel: [127595.907942]  [__do_softirq+0x82/0x110] 
__do_softirq+0x82/0x110
Mar 18 00:29:26 srv0 kernel: [127595.907947]  [do_softirq+0x55/0x60] 
do_softirq+0x55/0x60
Mar 18 00:29:26 srv0 kernel: [127595.907950]  [irq_exit+0x6d/0x80] 
irq_exit+0x6d/0x80
Mar 18 00:29:26 srv0 kernel: [127595.907952]  [do_IRQ+0x40/0x70] 
do_IRQ+0x40/0x70
Mar 18 00:29:26 srv0 kernel: [127595.907956]  [ktime_get_ts+0x1e/0x60] 
ktime_get_ts+0x1e/0x60
Mar 18 00:29:26 srv0 kernel: [127595.907960]  [common_interrupt+0x23/0x28] 
common_interrupt+0x23/0x28
Mar 18 00:29:26 srv0 kernel: [127595.907966]  [mwait_idle_with_hints+0x46/0x60] 
mwait_idle_with_hints+0x46/0x60
Mar 18 00:29:26 srv0 kernel: [127595.907970]  [cpu_idle+0x73/0xd0] 
cpu_idle+0x73/0xd0
Mar 18 00:29:26 srv0 kernel: [127595.907976]  =======================
Mar 18 00:29:26 srv0 kernel: [127595.908040] BUG: unable to handle kernel 
paging request at virtual address 8001086c
Mar 18 00:29:26 srv0 kernel: [127595.908125] printing eip: c01973a4 *pdpt = 
000000002193e001 *pde = 0000000000000000 
Mar 18 00:29:26 srv0 kernel: [127595.908217] Oops: 0000 [#1] SMP 
Mar 18 00:29:26 srv0 kernel: [127595.908264] Modules linked in: vmnet 
parport_pc vsock(F) vmci vmmon iptable_filter ip_tables x_tables lp parport 
loop button psmouse serio_raw shpchp pci_hotplug e1000e evdev pcspkr ext3 jbd 
mbcache sr_mod cdrom sg floppy e1000 ahci libata uhci_hcd ehci_hcd usbcore 
raid10 raid456 async_xor async_memcpy async_tx xor raid0 multipath linear 
thermal processor fan fbcon tileblit font bitblit softcursor fuse sd_mod 
scsi_mod raid1 md_mod
Mar 18 00:29:26 srv0 kernel: [127595.908713] 
Mar 18 00:29:26 srv0 kernel: [127595.908749] Pid: 10591, comm: md3_resync 
Tainted: GF   B   (2.6.24-18-server #1)
Mar 18 00:29:26 srv0 kernel: [127595.908831] EIP: 
0060:[ext3:__kmalloc+0x64/0xbf0] EFLAGS: 00010086 CPU: 1
Mar 18 00:29:26 srv0 kernel: [127595.908882] EIP is at __kmalloc+0x64/0x110
Mar 18 00:29:26 srv0 kernel: [127595.908928] EAX: 00000000 EBX: 8001086c ECX: 
c0433260 EDX: f8839772
Mar 18 00:29:26 srv0 kernel: [127595.908982] ESI: 00000286 EDI: c0433260 EBP: 
c54206d4 ESP: ea5d1dec
Mar 18 00:29:26 srv0 kernel: [127595.909035]  DS: 007b ES: 007b FS: 00d8 GS: 
0000 SS: 0068
Mar 18 00:29:26 srv0 kernel: [127595.909085] Process md3_resync (pid: 10591, 
ti=ea5d0000 task=e18d3700 task.ti=ea5d0000)
Mar 18 00:29:26 srv0 kernel: [127595.909141] Stack: 00031200 00000000 00019200 
f7c39f80 f7c05cf8 f5dfb580 00000010 f8839772 
Mar 18 00:29:26 srv0 kernel: [127595.909253]        f7c39f80 00011200 f883c872 
00000000 00000080 027502c5 f7c05cf8 ffffffff 
Mar 18 00:29:26 srv0 kernel: [127595.909364]        f7c39f80 00011210 f5dfb580 
00000010 c017579d f88394b3 c13ab3c0 c13ab3c0 
Mar 18 00:29:26 srv0 kernel: [127595.909472] Call Trace:
Mar 18 00:29:26 srv0 kernel: [127595.909541]  [<f8839772>] 
r1bio_pool_alloc+0x22/0x50 [raid1]
Mar 18 00:29:26 srv0 kernel: [127595.909596]  [<f883c872>] 
r1buf_pool_alloc+0x12/0x1f4 [raid1]
Mar 18 00:29:26 srv0 kernel: [127595.909652]  [raid10:mempool_alloc+0x2d/0x190] 
mempool_alloc+0x2d/0xe0
Mar 18 00:29:26 srv0 kernel: [127595.909702]  [<f88394b3>] 
raise_barrier+0x13/0x160 [raid1]
Mar 18 00:29:26 srv0 kernel: [127595.909756]  [md_mod:bio_add_page+0x37/0x190] 
bio_add_page+0x37/0x50
Mar 18 00:29:26 srv0 kernel: [127595.909806]  [<f883b975>] 
sync_request+0x145/0x6d0 [raid1]
Mar 18 00:29:26 srv0 kernel: [127595.909862]  [<f883b830>] 
sync_request+0x0/0x6d0 [raid1]
Mar 18 00:29:26 srv0 kernel: [127595.909915]  [<f884f6db>] 
md_do_sync+0x7db/0xb90 [md_mod]
Mar 18 00:29:26 srv0 kernel: [127595.909981]  [<f8851ea0>] md_thread+0x0/0xe0 
[md_mod]
Mar 18 00:29:26 srv0 kernel: [127595.910036]  [<f8851ec3>] md_thread+0x23/0xe0 
[md_mod]
Mar 18 00:29:26 srv0 kernel: [127595.910090]  [loop:complete+0x40/0xe0] 
complete+0x40/0x60
Mar 18 00:29:26 srv0 kernel: [127595.910139]  [<f8851ea0>] md_thread+0x0/0xe0 
[md_mod]
Mar 18 00:29:26 srv0 kernel: [127595.910194]  [kthread+0x42/0x70] 
kthread+0x42/0x70
Mar 18 00:29:26 srv0 kernel: [127595.910241]  [kthread+0x0/0x70] 
kthread+0x0/0x70
Mar 18 00:29:26 srv0 kernel: [127595.910288]  [kernel_thread_helper+0x7/0x10] 
kernel_thread_helper+0x7/0x10
Mar 18 00:29:26 srv0 kernel: [127595.910341]  =======================
Mar 18 00:29:26 srv0 kernel: [127595.910383] Code: 1f 84 00 00 00 00 00 89 c6 
fa 0f 1f 84 00 00 00 00 00 90 64 a1 08 50 49 c0 8b 6c 87 70 8b 5d 00 85 db 0f 
84 8a 00 00 00 8b 45 0c <8b> 04 83 89 45 00 89 f0 50 9d 0f 1f 84 00 00 00 00 00 
66 83 7c 
Mar 18 00:29:26 srv0 kernel: [127595.910754] EIP: [ext3:__kmalloc+0x64/0xbf0] 
__kmalloc+0x64/0x110 SS:ESP 0068:ea5d1dec
Mar 18 00:29:26 srv0 kernel: [127595.911133] ---[ end trace a56092253af6cc70 
]---
Mar 18 00:29:26 srv0 kernel: [127595.911228] rtc: lost 5 interrupts

in last days obtained these segfaults

Mar 16 12:54:31 srv0 kernel: [  927.147246] modprobe[5407]: segfault at 
00000033 eip b7e9aec2 esp bffbf050 error 4
Mar 16 12:57:12 srv0 kernel: [ 1087.063274] md: bind<sda1>

<system not responding at all>

Mar 16 13:00:57 srv0 syslogd 1.5.0#1ubuntu1: restart.

and others sparse segfaults of modprobe, all of them following mdadm
operations

if we don't try md rebuilding, the system seems stable (we're currently
using it as vmware server host)

r...@srv0:/opt#  dpkg -l | grep mdadm
ii  mdadm                                   
2.6.3+200709292116+4450e59-3ubuntu3.1 

dmesg as attachment

r...@srv0:# lsmod
Module                  Size  Used by
vmnet                  46016  16
vsock                  20952  0
vmci                   53848  1 vsock
vmmon                  76048  24
iptable_filter          3840  0
ip_tables              14820  1 iptable_filter
x_tables               16132  1 ip_tables
parport_pc             36644  0
lp                     12324  0
parport                37704  2 parport_pc,lp
loop                   19076  0
psmouse                40208  0
serio_raw               7940  0
shpchp                 34452  0
pci_hotplug            30880  1 shpchp
evdev                  12928  0
button                  9232  0
e1000e                 98212  0
pcspkr                  4224  0
ext3                  136584  4
jbd                    48404  1 ext3
mbcache                 9600  1 ext3
sr_mod                 17828  0
cdrom                  37280  1 sr_mod
sg                     36496  0
floppy                 59332  0
e1000                 126656  0
ahci                   28420  9
libata                159344  1 ahci
ehci_hcd               38412  0
uhci_hcd               27152  0
usbcore               146028  3 ehci_hcd,uhci_hcd
raid10                 25856  0
raid456               129040  0
async_xor               4992  1 raid456
async_memcpy            3840  1 raid456
async_tx                9292  3 raid456,async_xor,async_memcpy
xor                    16136  2 raid456,async_xor
raid0                   9344  0
multipath               9600  0
linear                  7296  0
thermal                16796  0
processor              37000  1 thermal
fan                     5636  0
fbcon                  42656  0
tileblit                3456  1 fbcon
font                    9472  1 fbcon
bitblit                 6784  1 fbcon
softcursor              3072  1 bitblit
fuse                   50580  1
sd_mod                 30720  12
scsi_mod              151180  4 sr_mod,sg,libata,sd_mod
raid1                  25728  3
md_mod                 82068  9 raid10,raid456,raid0,multipath,linear,raid1

** Affects: mdadm (Ubuntu)
     Importance: Undecided
         Status: New

-- 
harsh crashes during mdadm rebuild [SCARING!!!]
https://bugs.launchpad.net/bugs/344748
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to