Deeper inspection of the logs looks like the problem is some connection
attempt when xprt  is not connected. Part of that procedure is to re-use
the connection which forces the xprt to disconnect (so the socket can be
re-used). This triggers a state change (TCP_CLOSE) and wakes up the task
waiting for the connection. But the connection state then in INPROGRESS
which somehow gets translated into EGAIN and that triggers call_bind
which repeats the re-use of socket process.

With that lead, I found two commits upstream referring to this commit
that introduces that behaviour:

* 561ec1603171 (SUNRPC: call_connect_status should recheck bind..)

The two fixes related to that are:

* 1fa3e2e SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN 
tasks
* 485f225 SUNRPC: Ensure that call_connect times out correctly

The latter would at least cause timeouts to be re-adjusted before looping back 
into call_bind. So it might be worth trying those. I build a trusty kernel with 
those two patches added.  The debs are at 
http://people.canonical.com/~smb/lp1322407/
Could you install those on the server side and see whether this helps with the 
problem?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1322407

Title:
  NFS kernel server creates a kworker with 100% CPU usage, then hangs
  randomly

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1322407/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to