I can repro this issue quite easily with my setup. I'm running two amd64
kvm guests on amd64 host system with 8GB of memory. Nfs server is
running on the host, and guests heavily rely on it. All systems are up-
to-date, kernel is 2.6.32-23.

So the guests hang when they heavily access nfs mounts, it seems that
write operations are needed. First I used nfs3, then switched to nfs4,
but it didn't really help.

host export:
/srv/mmedia             172.16.0.0/16(rw,nohide,insecure,no_subtree_check,async)

guest fstab mount:
172.16.1.1:/mmedia    /mmedia   nfs4 _netdev,auto 0 0

I have had this issue since upgrading to Lucid, and never had anything
like this with Karmic, where I had exactly the same setup.

dmesg log attached, both from the host and a guest.

One way to repro this is to run a script on the guest that processes
(copies) image files over nfs, this hangs after processing around 20-50
files. System load starts to increase after the script hangs, I have
seen loads way over 200. After this happens, also all other processes
accessing nfs mounts hang. Cannot reboot, have to hard reset the guest.

syslog from the gust:
--------------------------
Jul 12 13:42:14 scotty kernel: [  360.190575] INFO: task perl:4360 blocked for 
more than 120 seconds.
Jul 12 13:42:14 scotty kernel: [  360.190585] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 12 13:42:14 scotty kernel: [  360.190592] perl          D 0000000000000000  
   0  4360   4358 0x00000000
Jul 12 13:42:14 scotty kernel: [  360.190605]  ffff8800b02ffc48 
0000000000000082 0000000000015bc0 0000000000015bc0
Jul 12 13:42:14 scotty kernel: [  360.190616]  ffff8800ae73c890 
ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c4d0
Jul 12 13:42:14 scotty kernel: [  360.190624]  0000000000015bc0 
ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c890
Jul 12 13:42:14 scotty kernel: [  360.190633] Call Trace:
Jul 12 13:42:14 scotty kernel: [  360.190729]  [<ffffffffa014a3b0>] ? 
nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.190788]  [<ffffffff81541357>] 
io_schedule+0x47/0x70
Jul 12 13:42:14 scotty kernel: [  360.190816]  [<ffffffffa014a3be>] 
nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.190824]  [<ffffffff81541bbf>] 
__wait_on_bit+0x5f/0x90
Jul 12 13:42:14 scotty kernel: [  360.190850]  [<ffffffffa014a3b0>] ? 
nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.190860]  [<ffffffff81541c68>] 
out_of_line_wait_on_bit+0x78/0x90
Jul 12 13:42:14 scotty kernel: [  360.190905]  [<ffffffff81085470>] ? 
wake_bit_function+0x0/0x40
Jul 12 13:42:14 scotty kernel: [  360.190931]  [<ffffffffa014a39f>] 
nfs_wait_on_request+0x2f/0x40 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.190964]  [<ffffffffa014e7df>] 
nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.190992]  [<ffffffffa014fc1e>] 
nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.191027]  [<ffffffffa0150009>] 
nfs_write_mapping+0x79/0xb0 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.191060]  [<ffffffff8115f7d0>] ? 
mntput_no_expire+0x30/0x110
Jul 12 13:42:14 scotty kernel: [  360.191087]  [<ffffffffa0150077>] 
nfs_wb_all+0x17/0x20 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.191109]  [<ffffffffa013ef9a>] 
nfs_do_fsync+0x2a/0x60 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.191131]  [<ffffffffa013f1e5>] 
nfs_file_flush+0x75/0xa0 [nfs]
Jul 12 13:42:14 scotty kernel: [  360.191146]  [<ffffffff8114173c>] 
filp_close+0x3c/0x90
Jul 12 13:42:14 scotty kernel: [  360.191153]  [<ffffffff81141847>] 
sys_close+0xb7/0x120
Jul 12 13:42:14 scotty kernel: [  360.191179]  [<ffffffff810131b2>] 
system_call_fastpath+0x16/0x1b
Jul 12 13:44:14 scotty kernel: [  480.190437] INFO: task perl:4360 blocked for 
more than 120 seconds.
Jul 12 13:44:14 scotty kernel: [  480.190446] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 12 13:44:14 scotty kernel: [  480.190453] perl          D 0000000000000000  
   0  4360   4358 0x00000000
Jul 12 13:44:14 scotty kernel: [  480.190466]  ffff8800b02ffc48 
0000000000000082 0000000000015bc0 0000000000015bc0
Jul 12 13:44:14 scotty kernel: [  480.190477]  ffff8800ae73c890 
ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c4d0
Jul 12 13:44:14 scotty kernel: [  480.190486]  0000000000015bc0 
ffff8800b02fffd8 0000000000015bc0 ffff8800ae73c890
Jul 12 13:44:14 scotty kernel: [  480.190495] Call Trace:
Jul 12 13:44:14 scotty kernel: [  480.190534]  [<ffffffffa014a3b0>] ? 
nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190548]  [<ffffffff81541357>] 
io_schedule+0x47/0x70
Jul 12 13:44:14 scotty kernel: [  480.190582]  [<ffffffffa014a3be>] 
nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190591]  [<ffffffff81541bbf>] 
__wait_on_bit+0x5f/0x90
Jul 12 13:44:14 scotty kernel: [  480.190617]  [<ffffffffa014a3b0>] ? 
nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190626]  [<ffffffff81541c68>] 
out_of_line_wait_on_bit+0x78/0x90
Jul 12 13:44:14 scotty kernel: [  480.190637]  [<ffffffff81085470>] ? 
wake_bit_function+0x0/0x40
Jul 12 13:44:14 scotty kernel: [  480.190663]  [<ffffffffa014a39f>] 
nfs_wait_on_request+0x2f/0x40 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190690]  [<ffffffffa014e7df>] 
nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190718]  [<ffffffffa014fc1e>] 
nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190745]  [<ffffffffa0150009>] 
nfs_write_mapping+0x79/0xb0 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190756]  [<ffffffff8115f7d0>] ? 
mntput_no_expire+0x30/0x110
Jul 12 13:44:14 scotty kernel: [  480.190782]  [<ffffffffa0150077>] 
nfs_wb_all+0x17/0x20 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190805]  [<ffffffffa013ef9a>] 
nfs_do_fsync+0x2a/0x60 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190827]  [<ffffffffa013f1e5>] 
nfs_file_flush+0x75/0xa0 [nfs]
Jul 12 13:44:14 scotty kernel: [  480.190836]  [<ffffffff8114173c>] 
filp_close+0x3c/0x90
Jul 12 13:44:14 scotty kernel: [  480.190843]  [<ffffffff81141847>] 
sys_close+0xb7/0x120
Jul 12 13:44:14 scotty kernel: [  480.190852]  [<ffffffff810131b2>] 
system_call_fastpath+0x16/0x1b


** Attachment added: "dmesg-host-guest.txt"
   http://launchpadlibrarian.net/51777196/dmesg-host-guest.txt

-- 
Writing big files to NFS target causes system lock up
https://bugs.launchpad.net/bugs/561210
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to