Re: [lustre-discuss] RDMA too many fragments/timed out - clients slowing entire filesystem performance

2016-11-01 Thread Brian W. Johanson
Great, thanks Doug! Quotas are not enabled. There are a few nodes that were exhibiting the issue fairly consistently. We have recently added 70 clients (~900 total) which seems to have caused this to happen more frequently. -b On 11/01/2016 07:57 PM, Oucharek, Doug S wrote: Hi Brian, Y

Re: [lustre-discuss] RDMA too many fragments/timed out - clients slowing entire filesystem performance

2016-11-01 Thread Oucharek, Doug S
Hi Brian, You need this patch: http://review.whamcloud.com/#/c/12451. It has not landed to master yet and is off by default. To activate it, add this module parameter line to your nodes (all of them): options ko2iblnd wrq_sge=2 The issue is that something is causing an offset to be introduce

[lustre-discuss] RDMA too many fragments/timed out - clients slowing entire filesystem performance

2016-11-01 Thread Brian W. Johanson
Centos 7.2 Lustre 2.8.0 ZFS 0.6.5.5 OPA 10.2.0.0.158 The clients and servers are on the same OPA network, no routing. Once a client gets in this state, the filesystem performance drops to a faction of what it is capable of. The client must be rebooted to clear the issue. I imagine I am miss