Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-25 Thread Ciprian Dorin Craciun
On Mon, Nov 25, 2019 at 2:53 AM Benjamin Kaduk wrote: > > * I suspect that perhaps the issue is due to the latest kernel version, > > because I have run similar patterns a few weeks ago on an older kernel (but > > still from the `5.x` family), but can't say for sure; > > I see the diagnostics and

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-24 Thread Benjamin Kaduk
On Tue, Nov 19, 2019 at 01:53:59PM +0200, Ciprian Dorin Craciun wrote: > > My setup is as follows: > > * OpenSUSE Tumbleweed, kernel 5.3.9-1-default, client package > `openafs-client` and `openafs-kmp-default` at `1.8.5_k5.3.9_1-1.3` as > provided by OpenSUSE; > > * `afsd` parameters (neither me

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
On Wed, Nov 20, 2019 at 7:49 PM Mark Vitale wrote: > > The following are the arguments of `fileserver`: > > -syslog -sync always -p 4 -b 524288 -l 524288 -s 1048576 -vc 4096 -cb > > 1048576 -vhandle-max-cachesize 32768 -jumbo -udpsize 67108864 > > -sendsize 67108864 -rxmaxmtu 9000 -rxpck 4096 -bus

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Mark Vitale
> On Nov 20, 2019, at 12:17 PM, Ciprian Dorin Craciun > wrote: > > >> Do you have FileLogs and/or fileserver audit logs for the time in question? > > Yes, I do have access to them. > > The following is the syslog output from OpenAFS server in a 5 minute > time-window to the stacktrace sent

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
On Wed, Nov 20, 2019 at 7:03 PM Mark Vitale wrote: > Thank you for the backtraces. I agree that 'gm' is the problematic thread; > it appears to be stuck in rxi_WriteProc waiting for the Rx packet transmit > window > to advance. That is, it's waiting for acknowledgments - probably from the > fi

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Mark Vitale
Ciprian, > On Nov 19, 2019, at 4:37 PM, Ciprian Dorin Craciun > wrote: > > On Tue, Nov 19, 2019 at 10:38 PM Ciprian Dorin Craciun > wrote: >> At the following link you can find an extract of `dmesg` after the >> sysrq trigger. >> >> >> https://scratchpad.volution.ro/ciprian/f89fc32a0bbd0ae6

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-19 Thread Ciprian Dorin Craciun
On Tue, Nov 19, 2019 at 10:38 PM Ciprian Dorin Craciun wrote: > At the following link you can find an extract of `dmesg` after the > sysrq trigger. > > > https://scratchpad.volution.ro/ciprian/f89fc32a0bbd0ae6d6f3edbbc3ee111c/b9c3bc4f795bbe9e7eaca93b0a57bea0.txt I forgot to mention that in th

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-19 Thread Ciprian Dorin Craciun
On Tue, Nov 19, 2019 at 5:10 PM Ciprian Dorin Craciun wrote: > > # echo t > /proc/sysrq-trigger At the following link you can find an extract of `dmesg` after the sysrq trigger. https://scratchpad.volution.ro/ciprian/f89fc32a0bbd0ae6d6f3edbbc3ee111c/b9c3bc4f795bbe9e7eaca93b0a57bea0.txt (I h

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-19 Thread Ciprian Dorin Craciun
On Tue, Nov 19, 2019 at 5:06 PM Mark Vitale wrote: > If you had a true soft lockup, there should be some information in the syslog. I don't think it was a "softlokup" as per Linux kernel terminology, as it would have been detected by the kernel. (But still it took all my cores to 100% in kernel

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-19 Thread Mark Vitale
Ciprian, > On Nov 19, 2019, at 6:53 AM, Ciprian Dorin Craciun > wrote: > > A few days ago I have encountered a very strange OpenAFS client issue that > basically exhibits in two ways: > > * either the processes accessing the file-system get "stuck" reading (or > perhaps opening) the files; (al

[OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-19 Thread Ciprian Dorin Craciun
A few days ago I have encountered a very strange OpenAFS client issue that basically exhibits in two ways: * either the processes accessing the file-system get "stuck" reading (or perhaps opening) the files; (although if one waits "long" enough, sometimes those processes will finally complete thei