I believe we have also been affected by this issue. I've managed to recreate it using a single virtual machine running on VMWare server and only using the loopback interface.
First I created a 64-bit VM using VMWare Server with 2 CPUs, 512MB RAM and 8GB disk. Into that I performed a fresh install of 64-bit Ubuntu 10.04 LTS desktop edition (as it was all I had to hand), using the image file: ubuntu-10.04-desktop-amd64.iso Once installed, I added the nfs-kernel-server package. At this stage I have not upgraded any packages from the versions that come on the CD. In /etc/exports I added the line: /srv *(rw,sync,no_subtree_check) In /etc/fstab I added the line: localhost:/srv /mnt/srv nfs rw 0 2 Then I executed the following commands: # exportfs -a # mkdir /mnt/srv # mount /mnt/srv In /srv I created a 512MB file (512MB was chosen to match the size of RAM on the virtual machine) # dd if=/dev/urandom of=/srv/test bs=1M count=512 Then I executed a continual gzip loop accessing the file over NFS using the loopback interface, and writing its results back over NFS. # while true > do > gzip -c /mnt/srv/test >/mnt/srv/test.gz > done I was running top in another virtual console and within seconds the load on the virtual machine rose rapidly (> 10), gzip was no longer consuming CPU, and the machine appeared to "lock up" permanently and never recover. On rebooting the following messages were in /var/syslog: Jun 21 23:22:12 lucid kernel: [ 1202.033262] INFO: task kswapd0:36 blocked for more than 120 seconds. Jun 21 23:22:12 lucid kernel: [ 1202.150847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:22:12 lucid kernel: [ 1202.286349] kswapd0 D 0000000000000000 0 36 2 0x00000000 Jun 21 23:22:12 lucid kernel: [ 1202.286349] ffff880017881720 0000000000000046 0000000000015bc0 0000000000015bc0 Jun 21 23:22:12 lucid kernel: [ 1202.286349] ffff88001d79df80 ffff880017881fd8 0000000000015bc0 ffff88001d79dbc0 Jun 21 23:22:12 lucid kernel: [ 1202.286349] 0000000000015bc0 ffff880017881fd8 0000000000015bc0 ffff88001d79df80 Jun 21 23:22:12 lucid kernel: [ 1202.286349] Call Trace: Jun 21 23:22:12 lucid kernel: [ 1202.288343] [<ffffffffa01d02b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288552] [<ffffffff8153eb57>] io_schedule+0x47/0x70 Jun 21 23:22:12 lucid kernel: [ 1202.288574] [<ffffffffa01d02be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288580] [<ffffffff8153f3af>] __wait_on_bit+0x5f/0x90 Jun 21 23:22:12 lucid kernel: [ 1202.288593] [<ffffffffa01d02b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288599] [<ffffffff8153f458>] out_of_line_wait_on_bit+0x78/0x90 Jun 21 23:22:12 lucid kernel: [ 1202.288605] [<ffffffff81085360>] ? wake_bit_function+0x0/0x40 Jun 21 23:22:12 lucid kernel: [ 1202.288627] [<ffffffffa01d029f>] nfs_wait_on_request+0x2f/0x40 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288640] [<ffffffffa01d46af>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288658] [<ffffffffa01d5aee>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288671] [<ffffffffa01d5c71>] nfs_wb_page+0x81/0xe0 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288683] [<ffffffffa01c4b2f>] nfs_release_page+0x5f/0x80 [nfs] Jun 21 23:22:12 lucid kernel: [ 1202.288688] [<ffffffff810f2bb2>] try_to_release_page+0x32/0x50 Jun 21 23:22:12 lucid kernel: [ 1202.288698] [<ffffffff81101833>] shrink_page_list+0x453/0x5f0 Jun 21 23:22:12 lucid kernel: [ 1202.288704] [<ffffffff81101cdd>] shrink_inactive_list+0x30d/0x7e0 Jun 21 23:22:12 lucid kernel: [ 1202.288712] [<ffffffff810591e5>] ? balance_tasks+0x135/0x160 Jun 21 23:22:12 lucid kernel: [ 1202.288718] [<ffffffff810fbe3a>] ? determine_dirtyable_memory+0x1a/0x30 Jun 21 23:22:12 lucid kernel: [ 1202.288723] [<ffffffff810fbee7>] ? get_dirty_limits+0x27/0x2f0 Jun 21 23:22:12 lucid kernel: [ 1202.288727] [<ffffffff81102241>] shrink_list+0x91/0xf0 Jun 21 23:22:12 lucid kernel: [ 1202.289518] [<ffffffff81102437>] shrink_zone+0x197/0x240 Jun 21 23:22:12 lucid kernel: [ 1202.289523] [<ffffffff811034c9>] balance_pgdat+0x659/0x6d0 Jun 21 23:22:12 lucid kernel: [ 1202.289528] [<ffffffff81100550>] ? isolate_pages_global+0x0/0x50 Jun 21 23:22:12 lucid kernel: [ 1202.289533] [<ffffffff8110363e>] kswapd+0xfe/0x150 Jun 21 23:22:12 lucid kernel: [ 1202.289537] [<ffffffff81085320>] ? autoremove_wake_function+0x0/0x40 Jun 21 23:22:12 lucid kernel: [ 1202.289542] [<ffffffff81103540>] ? kswapd+0x0/0x150 Jun 21 23:22:12 lucid kernel: [ 1202.289547] [<ffffffff81084fa6>] kthread+0x96/0xa0 Jun 21 23:22:12 lucid kernel: [ 1202.289552] [<ffffffff810141ea>] child_rip+0xa/0x20 Jun 21 23:22:12 lucid kernel: [ 1202.289557] [<ffffffff81084f10>] ? kthread+0x0/0xa0 Jun 21 23:22:12 lucid kernel: [ 1202.289561] [<ffffffff810141e0>] ? child_rip+0x0/0x20 ... lots of other blocked processes ... I then upgraded all packages (including the kernel) to the latest versions and repeated the above test. The same effect was observed. Jun 21 23:56:46 lucid kernel: [ 242.212616] INFO: task kswapd0:36 blocked for more than 120 seconds. Jun 21 23:56:46 lucid kernel: [ 242.282429] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:56:46 lucid kernel: [ 242.350201] kswapd0 D 00000000ffffffff 0 36 2 0x00000000 Jun 21 23:56:46 lucid kernel: [ 242.350344] ffff88001786f720 0000000000000046 0000000000015bc0 0000000000015bc0 Jun 21 23:56:46 lucid kernel: [ 242.350414] ffff88001d785f80 ffff88001786ffd8 0000000000015bc0 ffff88001d785bc0 Jun 21 23:56:46 lucid kernel: [ 242.350421] 0000000000015bc0 ffff88001786ffd8 0000000000015bc0 ffff88001d785f80 Jun 21 23:56:46 lucid kernel: [ 242.350459] Call Trace: Jun 21 23:56:46 lucid kernel: [ 242.351835] [<ffffffffa016a2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.352983] [<ffffffff8153ebb7>] io_schedule+0x47/0x70 Jun 21 23:56:46 lucid kernel: [ 242.353007] [<ffffffffa016a2be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.353014] [<ffffffff8153f40f>] __wait_on_bit+0x5f/0x90 Jun 21 23:56:46 lucid kernel: [ 242.353027] [<ffffffffa016a2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.353033] [<ffffffff8153f4b8>] out_of_line_wait_on_bit+0x78/0x90 Jun 21 23:56:46 lucid kernel: [ 242.353047] [<ffffffff81085360>] ? wake_bit_function+0x0/0x40 Jun 21 23:56:46 lucid kernel: [ 242.353070] [<ffffffffa016a29f>] nfs_wait_on_request+0x2f/0x40 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.353083] [<ffffffffa016e6af>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.353096] [<ffffffffa016faee>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.353109] [<ffffffffa016fc71>] nfs_wb_page+0x81/0xe0 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.353121] [<ffffffffa015eb2f>] nfs_release_page+0x5f/0x80 [nfs] Jun 21 23:56:46 lucid kernel: [ 242.353135] [<ffffffff810f2bb2>] try_to_release_page+0x32/0x50 Jun 21 23:56:46 lucid kernel: [ 242.353144] [<ffffffff81101833>] shrink_page_list+0x453/0x5f0 Jun 21 23:56:46 lucid kernel: [ 242.353163] [<ffffffff8113b419>] ? mem_cgroup_del_lru+0x39/0x40 Jun 21 23:56:46 lucid kernel: [ 242.353167] [<ffffffff811003cb>] ? isolate_lru_pages+0xdb/0x260 Jun 21 23:56:46 lucid kernel: [ 242.353173] [<ffffffff81101cdd>] shrink_inactive_list+0x30d/0x7e0 Jun 21 23:56:46 lucid kernel: [ 242.353179] [<ffffffff81053980>] ? __dequeue_entity+0x30/0x50 Jun 21 23:56:46 lucid kernel: [ 242.353185] [<ffffffff810116c0>] ? __switch_to+0xd0/0x320 Jun 21 23:56:46 lucid kernel: [ 242.353191] [<ffffffff81076e2c>] ? lock_timer_base+0x3c/0x70 Jun 21 23:56:46 lucid kernel: [ 242.353196] [<ffffffff810778b5>] ? try_to_del_timer_sync+0x75/0xd0 Jun 21 23:56:46 lucid kernel: [ 242.353202] [<ffffffff810fbe3a>] ? determine_dirtyable_memory+0x1a/0x30 Jun 21 23:56:46 lucid kernel: [ 242.353208] [<ffffffff810fbee7>] ? get_dirty_limits+0x27/0x2f0 Jun 21 23:56:46 lucid kernel: [ 242.353213] [<ffffffff81102241>] shrink_list+0x91/0xf0 Jun 21 23:56:46 lucid kernel: [ 242.353217] [<ffffffff81102437>] shrink_zone+0x197/0x240 Jun 21 23:56:46 lucid kernel: [ 242.353222] [<ffffffff811034c9>] balance_pgdat+0x659/0x6d0 Jun 21 23:56:46 lucid kernel: [ 242.353227] [<ffffffff81100550>] ? isolate_pages_global+0x0/0x50 Jun 21 23:56:46 lucid kernel: [ 242.353232] [<ffffffff8110363e>] kswapd+0xfe/0x150 Jun 21 23:56:46 lucid kernel: [ 242.353237] [<ffffffff81085320>] ? autoremove_wake_function+0x0/0x40 Jun 21 23:56:46 lucid kernel: [ 242.353242] [<ffffffff81103540>] ? kswapd+0x0/0x150 Jun 21 23:56:46 lucid kernel: [ 242.353247] [<ffffffff81084fa6>] kthread+0x96/0xa0 Jun 21 23:56:46 lucid kernel: [ 242.353252] [<ffffffff810141ea>] child_rip+0xa/0x20 Jun 21 23:56:46 lucid kernel: [ 242.353257] [<ffffffff81084f10>] ? kthread+0x0/0xa0 Jun 21 23:56:46 lucid kernel: [ 242.353261] [<ffffffff810141e0>] ? child_rip+0x0/0x20 ... lots of other blocked processes ... Running the same test, but using /srv directly instead of via NFS on /mnt/srv, worked reliably as expected without any issues. i.e. # while true > do > gzip -c /srv/test >/srv/test.gz > done The hardware the VMware server was running on (AMD Opteron) is totally different to that on which we've seen the issue occur in day to day use (Intel Core i7). Given the error can be caused by using the loopback address I think this means it cannot be network card specific. We only run 64-bit, so I haven't tested to see if it occurs on 32-bit releases. I hope this is helpful. If I can help shed any further light please let me know. Jim -- Writing big files to NFS target causes system lock up https://bugs.launchpad.net/bugs/561210 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs