Re: rdmacm issue

2015-06-10 Thread Bob Ciotti
On 06/10/2015 06:35 AM, Hal Rosenstock wrote: On 6/9/2015 9:52 PM, Bob Ciotti wrote: We have an issue where lustre servers and clients cannot talk to each other. There are about 11,000 clients all trying to connect to a server that just been rebooted (nbp6-oss3 in this example) pfe21 is a lus

RE: rdmacm issue

2015-06-10 Thread Hefty, Sean
> RDMA_CM_EVENT_UNREACHABLE is indicated when there are timeouts in > underlying CM protocol exchange. I suspect that the server is really > busy and doesn't respond to the low level CM MADs in a timely manner. > RDMA CM (and other kernel ULPs like IPoIB and SRP use hard coded local > and remote re

Re: rdmacm issue

2015-06-10 Thread Hal Rosenstock
On 6/9/2015 9:52 PM, Bob Ciotti wrote: > We have an issue where lustre servers and clients cannot talk to each > other. > There are about 11,000 clients all trying to connect to a server that > just been rebooted > (nbp6-oss3 in this example) > > pfe21 is a lustre client thats trying to remount th