On 10/16/2014 05:38 PM, Nathan Hjelm wrote:
On Thu, Oct 16, 2014 at 05:27:54PM -0400, Gus Correa wrote:
Thank you, Aurelien!

Aha, "vader btl", that is new to me!
I tought Vader was that man dressed in black in Star Wars,
Obi-Wan Kenobi's nemesis.
That was a while ago, my kids were children,
and Alec Guiness younger than Harrison Ford is today.
Oh, how nostalgic code developers can get when it comes
to naming things ...

If I am using "vader", it is totally inadvertent.
There was no such a thing in Open MPI 1.6 and earlier.

Now that you mentioned, I can see lots of it in the 1.8.3
ompi_info output.
In addition, my stderr files show messages like this:

imb.e38352:[1,5]<stddiag>:[node13:16334] mca: bml: Not using sm btl to
[[59987,1],26] on node node13 because vader btl has higher exclusivity
(65536 > 65535)

So, you are right, "vader" is taking over and knocking off "sm" (and openib
and everybody else).
Darn Vader!
Probably knem is going down the tubes along with sm, right?

Depends. If there is a reason to continue supporting knem then vader
will be updated to support it. I don't currently see a reason to at this
time though (since sm continues to live for now).


Right now knem is not working in OMPI 1.8.3, even if I turn off vader,
and leave only sm,self,openib.
I just sent another email documenting that.

I was used to sm, openib, self and tcp BTLs.
I normally just do "btl = ^tcp" in the MCA parameters file,
to stick to sm, openib, and self.

That worked fine in 1.6.5 (and earlier), and knem worked
flawlessly there.
The same settings in 1.8.3 don't bring up the knem functionality.
So, this seems to be yet another change in 1.8.3 that I need to learn.

Can you or some other list subscriber elaborate a bit about
this 'vader' btl?
The Open MPI FAQ doesn't have anthing about it.
What is it after all?
Does it play the same role as "sm", i.e., an intra-node btl?
Considering the name, is "vader" good or bad?
Or better: In which circumstances is "vader" good and when is it bad?

Vader is a btl I originally wrote to support Cray's XPMEM shared memory
interface. It was designed to be cleaner than btl/sm have better small
message latency, bandwidth, and message rates. Because its latency is so
much better than sm I removed the XPMEM requirement and added CMA
support.


I presume this requires kernel 3.X, as Aurelien pointed out.
As a matter of policy, and to keep your user base broad,
I would suggest to keep a generous
range of backwards compatible support built into OMPI.
This would be sm, knem, etc, which I suppose can coexist with vader, or not?
I can't speak for others but we run production codes in
standard Linux distributions (Centos 6.X, 5.X) whith 2.6.Y kernels.
I suppose other people have similar situations.

Should I give in to the dark side of the force and keep "vader"
turned on, or should I just do something like
"btl = ^tcp,^vader" ?

You can turn off vader if you want to use knem. I would run some tests
to see if there is much of a difference between sm/knem and vader
though. I don't have any systems that have knem installed so I haven't
been able to run these tests myself. I would primarily focus on the
memory usage and the bandwidth.
>
> -Nathan

Please, see my last email.
Turning off vader and sm on, still doesn't make knem work,
unless I made some big mistake along the way.
I would love to use 1.8.3 in production,
as long as sm+knem support works, hence it it would be
great if somebody points out any mistake that I may have made.

Also, for large messages, IMB with 1.6.5+sm+knem gives
me ~30% speedups w.r.t. 1.8.3+sm+(broken)-knem or w.r.t. 1.8.3+vader,
although admittedly due to our 2.6 kernel, no CMA, etc,
the environment is not favorable to vader to begin with.
[And yet another good reason to fix/keep sm+knem in OMPI 1.8.]

Thank you,
Gus Correa




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/10/25516.php


Reply via email to