Re: [OMPI users] InfiniBand path migration not working

2012-03-21 Thread Shamis, Pavel
Jeremy, As far as I understand the tool that Evgeny recommended showed that the remote port is reachable. Based on the log that have been provided I can't find the issue in ompi, everything seems to be kosher. Unfortunately, I do not have a platform where I may try to reproduce the issue. I wo

Re: [OMPI users] InfiniBand path migration not working

2012-03-21 Thread Jeremy
Hi Pasha, I just wanted to check if you had any further suggestions regarding the APM issue based on the updated info in my previous email. Thanks, -Jeremy On Mon, Mar 12, 2012 at 12:43 PM, Jeremy wrote: > Hi Pasha, Yevgeny, > >>> My educated guess is that from some reason it is no direct conn

Re: [OMPI users] InfiniBand path migration not working

2012-03-12 Thread Jeremy
Hi Pasha, Yevgeny, >> My educated guess is that from some reason it is no direct connection path >> between lid-2 and lid-4. To prove it we have to look and the OpenSM routing >> information. > If you don't get response or you get info of > the device different that what you would expect, > then

Re: [OMPI users] InfiniBand path migration not working

2012-03-11 Thread Yevgeny Kliteynik
Hi, I just noticed that my previous mail bounced, but it doesn't matter. Please ignore it if you got it anyway - I re-read the thread and there is a much simpler way to do it. If you want to check whether LID L is reachable through HCA H from port P, you can run this command: smpquery --Ca H

Re: [OMPI users] InfiniBand path migration not working

2012-03-09 Thread Jeremy
On Thu, Mar 8, 2012 at 10:44 AM, Shamis, Pavel wrote: > Jeremy, > Finally I had a chance to look at log file. Hi Pasha, I appreciate the review you did and the comments you provided. I will see if we can get some additional routing information. I will also do some experiments with a more trivi

Re: [OMPI users] InfiniBand path migration not working

2012-03-08 Thread Shamis, Pavel
Jeremy, Finally I had a chance to look at log file. Initially all qps are created on port 1, and in the same time alternative path loaded (ports 2, lids 4 and 2 ). I guess in some point you switch off port 1, APM even is reported because the alternative path is active now, and from some reason

Re: [OMPI users] InfiniBand path migration not working

2012-02-29 Thread Jeremy
Hi Pasha, >On Wed, Feb 29, 2012 at 11:02 AM, Shamis, Pavel wrote: > > I would like to see all the file. > 28MB is it the size after compression ? > > I think gmail supports up to 25Mb. > You may try to create gzip file and then slice it using "split" command. See attached. At about line 151311 i

Re: [OMPI users] InfiniBand path migration not working

2012-02-29 Thread Shamis, Pavel
> >> On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote: >> I reviewed the code and it seems to be ok :) The error should be reported if >> the port migration is already happened once (port 1 to port 2), and now you >> are trying to shutdown port 2 and MPI reports that it can't migrate anymo

Re: [OMPI users] InfiniBand path migration not working

2012-02-28 Thread Jeremy
Hi Pasha, >On Tue, Feb 28, 2012 at 11:34 AM, Shamis, Pavel wrote: > I reviewed the code and it seems to be ok :) The error should be reported if > the port migration is already happened once (port 1 to port 2), and now you > are trying to shutdown port 2 and MPI reports that it can't migrate an

Re: [OMPI users] InfiniBand path migration not working

2012-02-28 Thread Shamis, Pavel
Jeremy, I reviewed the code and it seems to be ok :) The error should be reported if the port migration is already happened once (port 1 to port 2), and now you are trying to shutdown port 2 and MPI reports that it can't migrate anymore. It assumes that port 1 is still down and it can't go back

Re: [OMPI users] InfiniBand path migration not working

2012-02-23 Thread Jeremy
Hi Pasha, Thanks for your response. I look forward to hearing from you when you have a chance. -Jeremy On Wed, Feb 22, 2012 at 10:43 PM, Shamis, Pavel wrote: > Jeremy, > I implemented the APM support for openib btl a long time ago. I do not > remember all the details of the implementation, but

Re: [OMPI users] InfiniBand path migration not working

2012-02-22 Thread Shamis, Pavel
Jeremy, I implemented the APM support for openib btl a long time ago. I do not remember all the details of the implementation, but I remember that it is used to support LMC bits and multiple ib ports. Unfortunately I'm super busy this week. I will try look at it early next week. Pavel (Pasha) S

[OMPI users] InfiniBand path migration not working

2012-02-22 Thread Jeremy
Hi, I am have a problem getting Alternative Path Migration (APM) to work over the InfiniBand ports on my HCA. Details on my configuration and the issue I have are below. Please let me know if you can provide any suggestions or corrections to my configuration? I will be happy to try other experi