See my last comment on #4257 :
https://github.com/open-mpi/ompi/pull/4257#issuecomment-332900393
We should completely disable CUDA in hwloc. It is breaking the build,
but more importantly, it creates an extra dependency on the CUDA runtime
that Open MPI doesn't have, even when compiled with --
Hi Chris,
First, you will need to have some configure stuff to detect nvcc and use
it inside your Makefile. UTK may have some examples to show here.
For the C/C++ API, you need to add 'extern "C"' statements around the
interfaces you want to export in C so that you can use them inside Open MP
ote:
On Apr 26, 2016, at 3:35 PM, Sylvain Jeaugey wrote:
Indeed, I implied that affinity was set before MPI_Init (usually even before
the process is launched).
And yes, that would require a modex ... but I thought there was one already and
maybe we could pack the affinity information inside t
we could do
it - but at the cost of forcing a modex. You can only detect your own affinity,
so to get the relative placement, you have to do an exchange if we can’t pass
it to you. Perhaps we could offer it as an option?
On Apr 26, 2016, at 2:27 PM, Sylvain Jeaugey wrote:
Within the BTL code
Within the BTL code (and surely elsewhere), we can use those convenient
OPAL_PROC_ON_LOCAL_{NODE,SOCKET, ...} macros to figure out where another
endpoint is located compared to us.
The problem is that it only works when ORTE defines it. The NODE works
almost always since ORTE is always doing i
. does it write to
mpirun stdin ?
On 02/26/2016 11:46 AM, Ralph Castain wrote:
So the child processes are not calling orte_init or anything like that? I can
check it - any chance you can give me a line number via a debug build?
On Feb 26, 2016, at 11:42 AM, Sylvain Jeaugey wrote:
I got
I got this strange crash on master this night running nv/mpix_test :
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x50
[ 0] /lib64/libpthread.so.0(+0xf710)[0x7f9f19a80710]
[ 1]
/ivylogin/home/sjeaugey/tests/mtt/scratches/mtt-scratch-4/installs/eGXW/inst
.
Thanks,
Sylvain
On 01/22/2016 10:07 AM, Sylvain Jeaugey wrote:
It looks like the errors are produced by the hwloc configure ; this
one somehow can't find CUDA (I have to check if that's a problem btw).
Anyway, later in the configure, the VT configure finds cuda correctly,
so it seems s
Hi Jeff,
Do you mean "attend" or "do a talk" ?
Sylvain
Le 20/11/2012 16:16, Jeff Squyres a écrit :
Cool! Thanks for the invite.
Do we have any European friends who would be able to attend this conference?
On Nov 20, 2012, at 10:02 AM, Sylwester Arabas wrote:
Dear Open MPI Team,
A day-lo
Hi Matthias,
You might want to play with process binding to see if your problem is
related to bad memory affinity.
Try to launch pingpong on two CPUs of the same socket, then on different
sockets (i.e. bind each process to a core, and try different
configurations).
Sylvain
De :Matthias
Please note that configure requirements on components HAVE
> CHANGED. For example. a configure.params file is no longer required
> in each component directory. See Jeff's emails for an explanation.
>
>
>
> ________
> From: devel-boun...@op
e.params file is no longer required in each
component directory. See Jeff's emails for an explanation.
>>
>>
>>
>>
>> From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf
Of Sylvain Jeaugey [sylvain.jeau...@bull.net]
Hi All,
I just realized that Bull Vendor IDs for Infiniband cards disappeared from
the trunk. Actually, they were removed shortly after we included them in
last September.
The original commit was :
r23715 | derbeyn | 2010-09-03 16:13:19 +0200 (Fri, 03 Sep 2010) | 1 line
Added Bull vendor id f
Kawashima-san,
Congratulations for your machine, this is a stunning achievement !
> Kawashima wrote :
> Also, we modified tuned COLL to implement interconnect-and-topology-
> specific bcast/allgather/alltoall/allreduce algorithm. These algorithm
> implementations also bypass PML/BML/BTL to elimi
On Wed, 9 Mar 2011, George Bosilca wrote:
One gets multiple non-overlapping BTL (in terms of peers), each with its
own set of parameters and eventually accepted protocols. Mainly there
will be one BTL per memory hierarchy.
Pretty cool :-)
I'll cleanup the code and send you a patch.
We'd be
Hi George,
This certainly looks like our motivations are close. However, I don't see
in the presentation how you implement it (maybe I misread it), especially
how you manage to not modify the BTL interface.
Do you have any code / SVN commit references for us to better understand
what it's ab
in locality.
Sylvain
On Mon, Nov 15, 2010 at 9:00 AM, Sylvain Jeaugey
wrote:
I already mentionned it answering Terry's e-mail, but to be sure I'm clear
: don't confuse node full topology with MPI job topology. It _is_ different.
And every process does not get the whole top
code may not have direct
relationship to hitopo the use of hwloc and standardization of what you call
level 4-7 might help avoid some user confusions.
--td
On 11/15/2010 06:56 AM, Sylvain Jeaugey wrote:
As a followup of Stuttgart's developper's meeting, here is an RFC for our
to inter- node.
Sylvain
On 11/15/2010 06:56 AM, Sylvain Jeaugey wrote:
As a followup of Stuttgart's developper's meeting, here is an RFC for our
topology detection framework.
WHAT: Add a framework for hardware topology detection to be used by any
other part of Open MPI to help optim
As a followup of Stuttgart's developper's meeting, here is an RFC for our
topology detection framework.
WHAT: Add a framework for hardware topology detection to be used by any
other part of Open MPI to help optimization.
WHY: Collective operations or shared memory algorithms among others may
, 2010, at 6:01 AM, Sylvain Jeaugey wrote:
On Tue, 26 Oct 2010, Jeff Squyres wrote:
I don't think this is the right way to fix it. Sorry! :-(
I don't think it is the right way to do it either :-)
I say this because it worked somewhat by luck before, and now it's
broken. If
On Tue, 26 Oct 2010, Jeff Squyres wrote:
I don't think this is the right way to fix it. Sorry! :-(
I don't think it is the right way to do it either :-)
I say this because it worked somewhat by luck before, and now it's
broken. If we put in another "it'll work because of a side effect of a
components (one to get the
priorities, and then another to execute) and additional API functions in the
various modules.
On Oct 7, 2010, at 6:25 AM, Sylvain Jeaugey wrote:
Hi list,
Remember this old bug ? I think I finally found out what was going wrong.
The opal "installdirs"
On Wed, 29 Sep 2010, Ashley Pittman wrote:
On 17 Sep 2010, at 11:36, Pascal Deveze wrote:
Hi all,
In charge of ticket 1888 (see at
https://svn.open-mpi.org/trac/ompi/ticket/1888) ,
I have put the resulting code in bitbucket at:
http://bitbucket.org/devezep/new-romio-for-openmpi/
The work in
opened first
regardless of its position in the static components array ;
3. Any other idea ?
Sylvain
On Fri, 19 Jun 2009, Sylvain Jeaugey wrote:
On Thu, 18 Jun 2009, Jeff Squyres wrote:
On Jun 18, 2009, at 11:25 AM, Sylvain Jeaugey wrote:
My problem seems related to library generation throu
Hi ananda,
I didn't try to run your program, but this seems logical to me.
The problem with calling MPI_Bcast repeatedly is that you may have an
infinite desynchronization between the sender and the receiver(s).
MPI_Bcast is an unidirectional operation. It does not necessary block
until the r
Steve,
This is indeed strange. The mechanism you describe works for me.
Here is my simple test :
-- mpi-sig.c --
#include "mpi.h"
#include
#include
void warn(int sig) {
printf("Got signal %d\n", sig);
}
int main (int argc, char ** argv) {
Thanks Jeff for this very useful explanation. I guess locking is not
needed as long as the system is well understood by everyone (which was not
the case for us, sorry).
Sylvain
On Thu, 22 Jul 2010, Ralph Castain wrote:
On Jul 22, 2010, at 8:01 AM, Jeff Squyres wrote:
On Jul 22, 2010, at 9
On Wed, 23 Jun 2010, Jeff Squyres wrote:
BTW, are you guys waiting for us to commit that, or do we ever give you guys
SVN commit access?
Nadia is off today. She should commit it tomorrow.
Sylvain
Hi Jeff,
Why do we want to set this value so low ? Well, just to see if it crashes
:-)
More seriously, we're working on lowering the memory usage of the openib
BTL, which is achieved at most by having only 1 send queue element (at
very large scale, send queues prevail).
This "extreme" conf
On Fri, 11 Jun 2010, Jeff Squyres wrote:
On Jun 11, 2010, at 5:43 AM, Paul H. Hargrove wrote:
Interesting. Do you think this behavior of the linux kernel would
change if the file was unlink()ed after attach ?
After a little talk with kernel guys, it seems that unlinking wouldn't
change anythi
On Thu, 10 Jun 2010, Jeff Squyres wrote:
Sam -- if the shmat stuff fails because the limits are too low, it'll
(silently) fall back to the mmap module, right?
From my experience, it completely disabled the sm component. Having a nice
fallback would be indeed a very Good thing.
Sylvain
On Thu, 10 Jun 2010, Paul H. Hargrove wrote:
One should not ignore the option of POSIX shared memory: shm_open() and
shm_unlink(). When present this mechanism usually does not suffer from
the small (eg 32MB) limits of SysV, and uses a "filename" (in an
abstract namespace) which can portably b
On Wed, 9 Jun 2010, Jeff Squyres wrote:
On Jun 9, 2010, at 3:26 PM, Samuel K. Gutierrez wrote:
System V shared memory cleanup is a concern only if a process dies in
between shmat and shmctl IPC_RMID. Shared memory segment cleanup
should happen automagically in most cases, including abnormal p
As stated at the conf call, I did some performance testing on a 32 cores
node.
So, here is graph showing 500 timings of an allreduce operation (repeated
15,000 times for good timing) with sysv, mmap on /dev/shm and mmap on
/tmp.
What is shows :
- sysv has the better performance ;
- having
On Wed, 2 Jun 2010, Jeff Squyres wrote:
Don't you mean return NULL? This function is supposed to return a (struct
ibv_cq *).
Oops. My bad. Yes, it should return NULL. And it seems that if I make
ibv_create_cq always return NULL, the scenario described by George works
smoothly : returned OMPI
On Tue, 1 Jun 2010, Jeff Squyres wrote:
On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:
In my case, the error happens in :
mca_btl_openib_add_procs()
mca_btl_openib_size_queues()
adjust_cq()
ibv_create_cq_compat()
ibv_create_cq()
Can you nail this down
Couldn't explain it better. Thanks Jeff for the summary !
On Tue, 1 Jun 2010, Jeff Squyres wrote:
On May 31, 2010, at 10:27 AM, Ralph Castain wrote:
Just curious - your proposed fix sounds exactly like what was done in
the OPAL SOS work. Are you therefore proposing to use SOS to provide a
mo
L init / query sequence is
it returning an error for you, Sylvain? Is it just a matter of tidying
something up properly before returning the error?
On May 28, 2010, at 2:21 PM, George Bosilca wrote:
On May 28, 2010, at 10:03 , Sylvain Jeaugey wrote:
On Fri, 28 May 2010, Jeff Squyres wr
On Fri, 28 May 2010, Jeff Squyres wrote:
On May 28, 2010, at 9:32 AM, Jeff Squyres wrote:
Understood, and I agreed that the bug should be fixed. Patches would
be welcome. :-)
I sent a patch on the bml layer in my first e-mail. We will apply it on
our tree, but as always we're trying to send
On Fri, 28 May 2010, Jeff Squyres wrote:
Herein lies the quandary: we don't/can't know the user or sysadmin
intent. They may not care if the IB is borked -- they might just want
the job to fall over to TCP and continue. But they may care a lot if IB
is borked -- they might want the job to ab
On Thu, 27 May 2010, Jeff Squyres wrote:
On May 27, 2010, at 10:32 AM, Sylvain Jeaugey wrote:
That's pretty much my first proposition : abort when an error arises,
because if we don't, we'll crash soon afterwards. That's my original
concern and this should really be fixe
rocs does return an error, the job should abort.
Brian
--
Brian W. Barrett
Scalable System Software Group
Sandia National Laboratories
From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf Of
Sylvain Jeaugey [sylvain.jeau...@bull.net]
rmda endpoint
arrays will not be built.
george.
On May 25, 2010, at 05:10 , Sylvain Jeaugey wrote:
Hi,
I'm currently trying to have Open MPI exit more gracefully when a BTL returns an error
during the "add procs" phase.
The current bml/r2 code silently ignores btl->add_procs
Hi,
I'm currently trying to have Open MPI exit more gracefully when a BTL
returns an error during the "add procs" phase.
The current bml/r2 code silently ignores btl->add_procs() error codes with
the following comment :
ompi/mca/bml/r2/bml_r2.c:208
/* This BTL has troubles adding
On Mon, 17 May 2010, Pavel Shamis (Pasha) wrote:
Sylvain Jeaugey wrote:
The XRC protocol seems to create shared receive queues, which is a good
thing. However, comparing memory used by an "X" queue versus and "S"
queue, we can see a large difference. Digging a bit into
s wrote:
How's this?
http://www.open-mpi.org/faq/?category=sm#poor-sm-btl-performance
What's the advantage of /dev/shm? (I don't know anything about /dev/shm)
On May 17, 2010, at 4:08 AM, Sylvain Jeaugey wrote:
I agree with Paul on the fact that a FAQ update would be grea
Thanks Pasha for these details.
On Mon, 17 May 2010, Pavel Shamis (Pasha) wrote:
blocking is the receive queues, because they are created during MPI_Init,
so in a way, they are the "basic fare" of MPI.
BTW SRQ resources are also allocated on demand. We start with very small SRQ
and it is incre
Hi list,
We did some testing on memory taken by Infiniband queues in Open MPI using
the XRC protocol, which is supposed to reduce the needed memory for
Infiniband connections.
When using XRC queues, Open MPI is indeed creating only one XRC queue per
node (instead of per-host). The problem is
I agree with Paul on the fact that a FAQ update would be great on this
subject. /dev/shm seems a good place to put the temporary files (when
available, of course).
Putting files in /dev/shm also showed better performance on our systems,
even with /tmp on a local disk.
Sylvain
On Sun, 16 May
On Mon, 10 May 2010, N.M. Maclaren wrote:
As explained by Sylvain, current Open MPI implementation always returns
MPI_THREAD_SINGLE as provided thread level if neither --enable-mpi-threads
nor --enable-progress-threads was specified at configure (v1.4).
That is definitely the correct action.
Hi list,
I'm currently working on IB bandwidth improvements and maybe some of you
may help me understanding some things. I'm trying to align every IB RDMA
operation to 64 bytes, because having it unaligned can hurt your
performance from lightly to very badly, depending on your architecture.
On Mon, 29 Mar 2010, Abhishek Kulkarni wrote:
#define ORTE_NOTIFIER_DEFINE_EVENT(eventstr, associated_text) {
static int event = -1;
if (OPAL_UNLIKELY(event == -1) {
event = opal_sos_create_new_event(eventstr, associated_text);
}
..
}
This
Hi Ralph,
For now, I think that yes, this is a unique identifier. However, in my
opinion, this could be improved in the future replacing it by a unique
string.
Something like :
#define ORTE_NOTIFIER_DEFINE_EVENT(eventstr, associated_text) {
static int event = -1;
if (OPAL_UNL
While we're at it, why not call the option giving MPI_THREAD_MULTIPLE
support --enable-thread-multiple ?
About ORTE and OPAL, if you have --enable-thread-multiple=yes, it may
force the usage of --enable-thread-safety to configure OPAL and/or ORTE.
I know there are other projects using ORTE an
Hi list,
The file ompi/contrib/vt/vt/config.h.in seems to have been added to the
repository, but it is also created by autogen.sh.
Is it normal ?
The result is that when I commit after autogen, I have my patches polluted
with diffs in this file.
Sylvain
On Jan 17, 2010, at 11:31 AM, Ashley Pittman wrote:
Tuning the libc malloc implementation using the options they provide to
do is is valid and provides real benefit to a lot of applications. For
the record we used to disable mmap based allocations by default on
Quadrics systems and I can't thi
On Thu, 7 Jan 2010, Eugene Loh wrote:
Could someone tell me how these settings are used in OMPI or give any
guidance on how they should or should not be used?
This is a very good question :-) As this whole e-mail, though it's hard
(in my opinion) to give it a Good (TM) answer.
This means that
Hi list,
I'm currently playing with thread levels in Open MPI and I'm quite
surprised by the current code.
First, the C interface :
at ompi/mpi/c/init_thread.c:56 we have :
#if OPAL_ENABLE_MPI_THREADS
*provided = MPI_THREAD_MULTIPLE;
#else
*provided = MPI_THREAD_SINGLE;
#endif
prior to
Thanks Rainer for the patch. I confirm it solves my testcase as well as
the real application that triggered the bug.
Sylvain
On Mon, 7 Dec 2009, Rainer Keller wrote:
Hello Sylvain,
On Friday 04 December 2009 02:27:22 pm Sylvain Jeaugey wrote:
There is definetly something wrong in types
l_datatype.h Fri Dec 04 19:59:26 2009 +0100
@@ -56,7 +56,7 @@
*
* XXX TODO Adapt to whatever the OMPI-layer needs
*/
-#define OPAL_DATATYPE_MAX_SUPPORTED 46
+#define OPAL_DATATYPE_MAX_SUPPORTED 56
/* flags for the datatypes. */
On Fri, 4 Dec 2009, Sylvain Jeaugey wrote:
For t
For the record, and to try to explain why all MTT tests may have missed
this "bug", configuring without --enable-debug makes the bug disappear.
Still trying to figure out why.
Sylvain
On Thu, 3 Dec 2009, Sylvain Jeaugey wrote:
Hi list,
I hope this time I won't be the onl
conds
[rhc@odin mpi]$
Sorry I don't have more time to continue pursuing this. I have no idea what is
going on with your system(s), but it clearly is something peculiar to what you
are doing or the system(s) you are running on.
Ralph
On Dec 2, 2009, at 1:56 AM, Sylvain Jeaugey wrote:
Ok,
Hi list,
I hope this time I won't be the only one to suffer this bug :)
It is very simple indeed, just perform an allreduce with MPI_REAL8
(fortran) and you should get a crash in ompi/op/op.h:411. Tested with
trunk and v1.5, working fine on v1.3.
From what I understand, in the trunk, MPI_REA
t it). But since this is a
race condition, your mileage may vary on a different cluster.
With the patch however, I'm in every time. I'll continue to try different
configurations (e.g. without slurm ...) to see if I can reproduce it on
much common configurations.
Sylvain
On Mon, 30 Nov 2
en FC11 and the
compiler.
On Nov 30, 2009, at 8:48 AM, Sylvain Jeaugey wrote:
Hi Ralph,
I'm also puzzled :-)
Here is what I did today :
* download the latest nightly build (openmpi-1.7a1r22241)
* untar it
* patch it with my "ORTE_RELAY_DELAY" patch
* build it directly on t
ain wrote:
On Nov 27, 2009, at 8:23 AM, Sylvain Jeaugey wrote:
Hi Ralph,
I tried with the trunk and it makes no difference for me.
Strange
Looking at potential differences, I found out something strange. The bug may have
something to do with the "routed" framework. I can repro
hreads?? That is the only way I can recreate this
behavior.
I plan to modify the relay/message processing method anyway to clean it up. But
there doesn't appear to be anything wrong with the current code.
Ralph
On Nov 20, 2009, at 6:55 AM, Sylvain Jeaugey wrote:
Hi Ralph,
Thanks for your e
l-crcp2,crcp
enable_io_romio=no
On Nov 19, 2009, at 8:08 AM, Ralph Castain wrote:
On Nov 19, 2009, at 7:52 AM, Sylvain Jeaugey wrote:
Thank you Ralph for this precious help.
I setup a quick-and-dirty patch basically postponing process_msg (hence
daemon_collective) until the launch is done. I
2. send the relay - the daemon collective can now proceed without a
"wait" in it
3. now launch the local procs
It would be a fairly simple reorganization of the code in the
orte/mca/odls area. I can do it this weekend if you like, or you can do
it - either way is fine, but if you
n Nov 17, 2009, at 9:01 AM, Sylvain Jeaugey wrote:
I don't think so, and I'm not doing it explicitely at least. How do I know ?
Sylvain
On Tue, 17 Nov 2009, Ralph Castain wrote:
We routinely launch across thousands of nodes without a problem...I have never
seen it stick in this fash
ded by any chance? If so, that
definitely won't work.
On Nov 17, 2009, at 9:27 AM, Sylvain Jeaugey wrote:
Hi all,
We are currently experiencing problems at launch on the 1.5 branch on
relatively large number of nodes (at least 80). Some processes are not spawned
and orted processes are de
Hi all,
We are currently experiencing problems at launch on the 1.5 branch on
relatively large number of nodes (at least 80). Some processes are not
spawned and orted processes are deadlocked.
When MPI processes are calling MPI_Init before send_relay is complete, the
send_relay function and
We worked a bit on it and yes, there is some work to do :
* The syntax used to describe the various components is far from being
consistent from one usage to another ("SOCKET", "NODE", ...). We manage to
make things reading the various not up to date example files - but mainly
the code.
* Th
You were faster to fix the bug than I was to send my bug report :-)
So I confirm : this fixes the problem.
Thanks !
Sylvain
On Mon, 21 Sep 2009, Edgar Gabriel wrote:
what version of OpenMPI did you use? Patch #21970 should have fixed this
issue on the trunk...
Thanks
Edgar
Sylvain Jeaugey
Hi list,
We are currently experiencing deadlocks when using communicators other
than MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create then
MPI_Barrier on the communicator - see end of e-mail).
We can reproduce the deadlock only with openib and with at least 8 cores
(no succes
penib, but if I'm not mistaken (again !) tcp still hangs.
Sylvain
On Fri, 4 Sep 2009, Sylvain Jeaugey wrote:
Hi Rolf,
I was indeed running a more than 4 weeks old trunk, but after pulling the
latest version (and checking the patch was in the code), it seems to make no
difference.
Howev
Understood. So, let's say that we're only implementing a hurdle to
discourage users from doing things wrong. I guess the efficiency of this
will reside in the message displayed to the user ("You are about to break
the entire machine and you will be fined if you try to circumvent this in
any way
Looks like users at LANL are not very nice ;)
Indeed, this is no hard security. Only a way to prevent users from doing
mistakes. We often give users special tuning for their application and
when they see their application is going faster, they start messing with
every parameter hoping that it
/changeset/21833
If you are running the latest bits and still seeing the problem, then I guess
it is something else.
Rolf
On 09/04/09 04:40, Sylvain Jeaugey wrote:
Hi all,
We're currently working with romio and we hit a problem when exchanging
data with hindexed types with the openi
On Fri, 4 Sep 2009, Jeff Squyres wrote:
I haven't looked at the code deeply, so forgive me if I'm parsing this wrong:
is the code actually reading the file into one list and then moving the
values to another list? If so, that seems a little hackish. Can't it just
read directly to the target
On Fri, 4 Sep 2009, Jeff Squyres wrote:
--
*** Checking versions
checking for SVN version... done
checking Open MPI version... 1.4a1hgf11244ed72b5
up to changeset c4b117c5439b
checking Open MPI release date... Unreleased developer copy
checking Open MPI Subversion repository version... hgf11
Hi all,
We're currently working with romio and we hit a problem when exchanging
data with hindexed types with the openib btl.
The attached reproducer (adapted from romio) is working fine on tcp,
blocks on openib when using 1 port but works if we use 2 ports (!). I
tested it against the trunk
For the record, I see an big interest in this.
Sometimes, you have to answer calls for tender featuring applications that
must work with no code change, even if the code is completely not
MPI-compliant.
That's sad, but true (no pun intended :-))
Sylvain
On Mon, 24 Aug 2009, George Bosilca w
e
RPM build command passing --pkgname or somesuch to OMPI's configure to
override the built-in name?
Hum, I guess you're right, this is indeed not something to change. Sorry
about that.
Sylvain
On Jul 31, 2009, at 11:51 AM, Sylvain Jeaugey wrote:
Hi all,
We had to apply a litt
Hi Jeff,
I bet you're refering to Euro PVM MPI 09 ?
If this is what you're refering to, I should attend as usual. And of
course, I'm very interested in joining a devel meeting :)
Sylvain
On Tue, 4 Aug 2009, Jeff Squyres wrote:
Who's going to Helsinki?
Does anyone want to meet up for some
On Mon, 3 Aug 2009, Jeff Squyres wrote:
On Aug 3, 2009, at 8:23 AM, Arthur Huillet wrote:
I have recently started working on OpenMPI, and part of my job consists in
adding a new module to OpenMPI.
Cool. What are you adding?
A collective component to support some Bull specific hardware.
Sy
n a couple of places
- Add an %{opt_prefix} option to be able to install in a specific path
(e.g. in /opt//mpi/-/ instead of
/opt/-)
The patch is done with "hg extract" but should apply on the SVN trunk.
Sylvain# HG changeset patch
# User Sylvain Jeaugey
# Date 124904
Hi Jeff,
I'm interested in joining the effort, since we will likely have the same
problem with SLURM's cpuset support.
On Wed, 22 Jul 2009, Jeff Squyres wrote:
But as to why it's getting EINVAL, that could be wonky. We might want to
take this to the PLPA list and have you run some small, no
On Thu, 18 Jun 2009, Jeff Squyres wrote:
On Jun 18, 2009, at 11:25 AM, Sylvain Jeaugey wrote:
My problem seems related to library generation through RPM, not with
1.3.2, nor the patch.
I'm not sure I understand -- is there something we need to fix in our SRPM?
I need to dig a bit
Ok, never mind.
My problem seems related to library generation through RPM, not with
1.3.2, nor the patch.
Sylvain
On Thu, 18 Jun 2009, Sylvain Jeaugey wrote:
Hi all,
Until Open MPI 1.3 (maybe 1.3.1), I used to find it convenient to be able to
move a library from its "normal&q
Hi all,
Until Open MPI 1.3 (maybe 1.3.1), I used to find it convenient to be able
to move a library from its "normal" place (either /usr or /opt) to
somewhere else (i.e. my NFS home account) to be able to try things only on
my account.
So, I used to set OPAL_PREFIX to the root of the Open MP
ou seem to have a real
reproducer).
Sylvain
On Wed, 10 Jun 2009, Sylvain Jeaugey wrote:
Hum, very glad that padb works with Open MPI, I couldn't live without it. In
my opinion, the best debug tool for parallel applications, and more
importantly, the only one that scales.
About the is
Hum, very glad that padb works with Open MPI, I couldn't live without it.
In my opinion, the best debug tool for parallel applications, and more
importantly, the only one that scales.
About the issue, I couldn't reproduce it on my platform (tried 2 nodes
with 2 to 8 processes each, nodes are t
putting the process to sleep. You could let someone know so
a human can decide what, if anything, to do about it, or provide a hook so
that people can explore/utilize different response strategies...or both!
HTH
Ralph
On Tue, Jun 9, 2009 at 6:52 AM, Sylvain Jeaugey
wrote:
I understand your
o about it,
or provide a hook so that people can
explore/utilize different response strategies...or both!
HTH
Ralph
On Tue, Jun 9, 2009 at 6:52 AM, Sylvain Jeaugey
wrote:
I understand your point of view, and mostly share it.
I think the biggest point in my example is that sleep
MPI processes wait for us to reach a communication point. We
-want- those processes spinning away so that, when the comm starts, it can
proceed as quickly as possible.
Just some thoughts...
Ralph
On Jun 9, 2009, at 5:28 AM, Terry Dontje wrote:
Sylvain Jeaugey wrote:
Hi Ralph,
I'm enti
On Mon, 8 Jun 2009, NiftyOMPI Tom Mitchell wrote:
??? dual rail does double the number of switch ports. If you want to
address switch failure each rail must connect to a different switch.
If you do not want to have isolated fabrics you must have some
additional ports on all switches to connect
system
to behave similarly to today isn't enough - we still wind up adding logic
into a very critical timing loop for no reason. A simple configure option of
--enable-mpi-progress-monitoring would be sufficient to protect the code.
HTH
Ralph
On Jun 8, 2009, at 9:50 AM, Sylvain
What : when nothing has been received for a very long time - e.g. 5
minutes, stop busy polling in opal_progress and switch to a usleep-based
one.
Why : when we have long waits, and especially when an application is
deadlock'ed, detecting it is not easy and a lot of power is wasted until
the e
1 - 100 of 107 matches
Mail list logo