I'm adding some more test data here:

As a workaround I tried to install an old Ubuntu 2.6 kernel (linux-
image-2.6.35-31-generic_2.6.35-31.63_amd64.deb) into 12.04.1.

I saw a number of locking issues reported and thought these might be
caused by using the kernel in a wrong environment. But now after I have
downgraded the servers back to 10.10 and kept the clients at 12.04.1, I
still see kernel messages like the following:

[ 5474.132324] ------------[ cut here ]------------
[ 5474.132346] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 
rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 5474.132349] Hardware name: PowerEdge R710
[ 5474.132351] Modules linked in: ipmi_si mpt2sas raid_class mptctl 
ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd 
fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial 
shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport 
dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas 
megaraid_sas [last unloaded: ipmi_si]
[ 5474.132386] Pid: 1746, comm: rpciod/16 Tainted: G        W   
2.6.35-32-server #67-Ubuntu
[ 5474.132388] Call Trace:
[ 5474.132399]  [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 5474.132403]  [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 5474.132414]  [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 5474.132426]  [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 5474.132437]  [<ffffffffa016c7f0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[ 5474.132448]  [<ffffffffa016c805>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 5474.132455]  [<ffffffff8107b395>] run_workqueue+0xc5/0x1a0
[ 5474.132460]  [<ffffffff8107b513>] worker_thread+0xa3/0x110
[ 5474.132464]  [<ffffffff810801a0>] ? autoremove_wake_function+0x0/0x40
[ 5474.132468]  [<ffffffff8107b470>] ? worker_thread+0x0/0x110
[ 5474.132472]  [<ffffffff8107fc26>] kthread+0x96/0xa0
[ 5474.132477]  [<ffffffff8100aea4>] kernel_thread_helper+0x4/0x10
[ 5474.132481]  [<ffffffff8107fb90>] ? kthread+0x0/0xa0
[ 5474.132484]  [<ffffffff8100aea0>] ? kernel_thread_helper+0x0/0x10
[ 5474.132487] ---[ end trace 5a3838b115992a79 ]---
[ 6091.800511] ------------[ cut here ]------------
[ 6091.800532] WARNING: at /build/buildd/linux-2.6.35/net/sunrpc/sched.c:597 
rpc_exit_task+0x5c/0x60 [sunrpc]()
[ 6091.800536] Hardware name: PowerEdge R710
[ 6091.800537] Modules linked in: ipmi_si mpt2sas raid_class mptctl 
ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 xfs exportfs nfs lockd 
fscache nfs_acl auth_rpcgss sunrpc joydev ftdi_sio usbhid hid bnx2 usbserial 
shpchp psmouse i7core_edac serio_raw edac_core hed lp power_meter parport 
dcdbas ses enclosure mptsas mptscsih mptbase usb_storage scsi_transport_sas 
megaraid_sas [last unloaded: ipmi_si]
[ 6091.800572] Pid: 1744, comm: rpciod/14 Tainted: G        W   
2.6.35-32-server #67-Ubuntu
[ 6091.800575] Call Trace:
[ 6091.800585]  [<ffffffff810616df>] warn_slowpath_common+0x7f/0xc0
[ 6091.800590]  [<ffffffff8106173a>] warn_slowpath_null+0x1a/0x20
[ 6091.800601]  [<ffffffffa016bd4c>] rpc_exit_task+0x5c/0x60 [sunrpc]
[ 6091.800612]  [<ffffffffa016c52e>] __rpc_execute+0x5e/0x280 [sunrpc]
[ 6091.800623]  [<ffffffffa016c7f0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[ 6091.800634]  [<ffffffffa016c805>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 6091.800642]  [<ffffffff8107b395>] run_workqueue+0xc5/0x1a0
[ 6091.800646]  [<ffffffff8107b513>] worker_thread+0xa3/0x110
[ 6091.800650]  [<ffffffff810801a0>] ? autoremove_wake_function+0x0/0x40
[ 6091.800654]  [<ffffffff8107b470>] ? worker_thread+0x0/0x110
[ 6091.800658]  [<ffffffff8107fc26>] kthread+0x96/0xa0
[ 6091.800663]  [<ffffffff8100aea4>] kernel_thread_helper+0x4/0x10
[ 6091.800667]  [<ffffffff8107fb90>] ? kthread+0x0/0xa0
[ 6091.800671]  [<ffffffff8100aea0>] ? kernel_thread_helper+0x0/0x10
[ 6091.800673] ---[ end trace 5a3838b115992a7a ]---

On the client I see:

[ 7061.756411] INFO: task unzip:8081 blocked for more than 120 seconds.
[ 7061.767633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 7061.790039] unzip           D 0000000000000007     0  8081   8041 0x00000000
[ 7061.790044]  ffff8805ec807b48 0000000000000086 ffff880500000000 
ffffffff00000007
[ 7061.790051]  ffff8805ec807fd8 ffff8805ec807fd8 ffff8805ec807fd8 
00000000000137c0
[ 7061.790063]  ffff880608a02e00 ffff8805fb9f1700 ffff8805ec807b28 
ffff880617c74080
[ 7061.790075] Call Trace:
[ 7061.790082]  [<ffffffff81117130>] ? __lock_page+0x70/0x70
[ 7061.790090]  [<ffffffff816590ff>] schedule+0x3f/0x60
[ 7061.790097]  [<ffffffff816591af>] io_schedule+0x8f/0xd0
[ 7061.790105]  [<ffffffff8111713e>] sleep_on_page+0xe/0x20
[ 7061.790112]  [<ffffffff816599cf>] __wait_on_bit+0x5f/0x90
[ 7061.790119]  [<ffffffff811172a8>] wait_on_page_bit+0x78/0x80
[ 7061.790127]  [<ffffffff8108acc0>] ? autoremove_wake_function+0x40/0x40
[ 7061.790135]  [<ffffffff811173bc>] filemap_fdatawait_range+0x10c/0x1a0
[ 7061.790144]  [<ffffffff8111747b>] filemap_fdatawait+0x2b/0x30
[ 7061.790151]  [<ffffffff811a17b9>] writeback_single_inode+0x399/0x430
[ 7061.790159]  [<ffffffff811a18ca>] sync_inode+0x7a/0xc0
[ 7061.790169]  [<ffffffffa01a20b3>] nfs_wb_all+0x43/0x50 [nfs]
[ 7061.790177]  [<ffffffffa01937f8>] nfs_setattr+0x138/0x140 [nfs]
[ 7061.790181]  [<ffffffff8119402b>] notify_change+0x1bb/0x360
[ 7061.790185]  [<ffffffff8117617b>] chmod_common+0xbb/0xc0
[ 7061.790189]  [<ffffffff8117d0ba>] ? sys_newstat+0x2a/0x40
[ 7061.790193]  [<ffffffff811770bf>] sys_fchmod+0x4f/0x80
[ 7061.790197]  [<ffffffff81663602>] system_call_fastpath+0x16/0x1b

and the NFS mount hangs. Sometimes the clients are able to recover, but
often they hang completely.

It seems that my initial test on Debian was wrong and the Debian testing
kernels have at least less load on the server.  I cannot comment on the
other issues yet. But it was discussed in the linked Debian bug report
that the above mentioned patch has been removed in their kernels. This
seems to provide at least some positive effect.

Is any Ubuntu kernel developer following this? Could you provide a test
kernel with the patch removed?

I'm currently trying to set up a test environment, but fixing my
production environment has priority :-(

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/879334

Title:
  nfsd from nfs-kernel-server very slow and system load from 25%-100%
  from nfsd

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/879334/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to