On Nov 20 2012, Jeff Squyres wrote:
Cool! Thanks for the invite.
Do we have any European friends who would be able to attend this
conference?
If I count, in theory, yes. In practice, I doubt it. It would
depend on other things, which are unlikely to be decided in time,
and whether I dare
On Nov 20 2012, Jeff Squyres wrote:
I very, VERY strongly advise you to decide what you mean by that. Very
few compilers that 'support' C99 do so for the whole language, and most
use the more system-dependent features in very different ways. In
theory, just enabling C99 could break code, and
On Nov 20 2012, Jeff Squyres wrote:
While at SC, Brian, Ralph, Nathan and I had long conversations about C99.
We decided:
- all compilers that we care about seem to support C99 - so let's move
the trunk and v1.7 to *require* C99 and see if anyone screams
--> we're NOT doing this in v1.6 - m
On Nov 5 2012, Ralph Castain wrote:
We adhere to the MPI standard, so we expect the user in such an instance
to define a datatype that reflects the structure they are trying to send.
We will then do the voodoo to correctly send that data in a heterogeneous
environment, and pass the data back
On Oct 15 2012, Iliev, Hristo wrote:
Numeric differences are to be expected with parallel applications. The
basic reason for that is that on many architectures floating-point
operations are performed using higher internal precision than that of the
arguments and only the final result is round
On Sep 27 2012, Eugene Loh wrote:
Good discussion, but as far as my specific issue goes, it looks like
it's some peculiar interaction between different compiler versions. I'm
asking some experts.
Module incompatibility is a common problem, and the solution is NOT to
put a hack into the conf
On Sep 27 2012, Jeff Squyres (jsquyres) wrote:
Fwiw, we have put in many hours of engineering to "that obscene hack"
*because* compilers all have differing degrees of compatibility suck.
It's going to be years before compilers fully support f08, for example,
so we have no choice but to test f
On Sep 27 2012, Jeff Squyres wrote:
On Sep 27, 2012, at 7:30 AM, Paul Hargrove wrote:
PUBLIC should be a standard part of F95 (no configure probe required).
Good.
However, the presence of "OMPI_PRIVATE" suggests you already have a
configure probe for the "PRIVATE" keyword.
Yes, we do, bec
On Mar 5 2012, George Bosilca wrote:
I gave it a try (r26103). It was messy, and I hope I got it right. Let's
soak it for few days with our nightly testing to see how it behave.
That'll at least check that it's not totally broken. The killer about
such wording is that you cannot guarantee ex
On Mar 5 2012, George Bosilca wrote:
I was afraid about all those little intermediary steps. I asked a
compiler guy and apparently reversing the order (aka starting with the
ptrdiff_t variable) will not solve anything. The only portable way to
solve this is to cast every single member, to pre
On May 21 2011, Dan Reynolds wrote:
./test_driver.F90:12.39: call mpi_abort(MPI_COMM_WORLD, -1, 0)
It's unlikely to provoke that particular error, but that call is erroneous.
It should be something like:
integer :: ierror
call mpi_abort(MPI_COMM_WORLD, 1, ierror)
Negative error numbers
On Apr 22 2011, Ralph Castain wrote:
Several of us are. Josh and George (plus teammates), and some other
outside folks, are working the MPI side of it.
I'm working only the ORTE side of the problem.
Quite a bit of capability is already in the trunk, but there is always
more to do :-)
Is th
On Apr 14 2011, Jeff Squyres wrote:
I think Ralph's point is that OMPI is providing the run-time environment
for the application, and it would probably behoove us to support both
kinds of behaviors since there are obviously people in both camps out
there.
It's pretty easy to add a non-defaul
On Apr 14 2011, Ralph Castain wrote:
... It's hopeless, and whatever you do will be wrong for many
people. ...
I think that sums it up pretty well. :-)
It does seem a little strange that the scenario you describe somewhat
implies that one process is calling MPI_Finalize lng before th
On Apr 14 2011, Ralph Castain wrote:
I've run across an interesting issue for which I don't have a ready answer.
If an MPI process aborts, we automatically abort the entire job.
If an MPI process returns a non-zero exit status, indicating that there
was something abnormal about its terminatio
On Mar 15 2011, George Bosilca wrote:
Nobody challenged your statements about threading or about the
correctness of the POSIX standard. However, such concerns are better
voiced on forums related to that specific subject, where they have a
chance to be taken into account by people who understa
On Mar 12 2011, George Bosilca wrote:
Removing thread support is _NOT_ an option
(https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/MPI3Hybrid).
Unlike the usual claims on this mailing list, MPI_THREAD_MULTIPLE had
been fully supported for several BTLs in Open MPI
(http://www.springerlink.co
On Mar 11 2011, Eugene Loh wrote:
The idea would be to hardwire support for MPI_THREAD_MULTIPLE to be off,
just as we have done for progress threads. Threads might still be used
for other purposes -- e.g., ORTE, openib async thread, etc.
That's what I was assuming, too. Threads used behind
On Mar 10 2011, Eugene Loh wrote:
Any comments on this? We wanted to clean up MPI_THREAD_MULTIPLE
support in the trunk and port these changes back to 1.5.x, but it's
unclear to me what our expectations should be about any
MPI_THREAD_MULTIPLE test succeeding. How do we assess (test) our
chan
On Dec 20 2010, George Bosilca wrote:
There is a hint for F77 users at the bottom of the page. It suggests to
use INTEGER*MPI_OFFSET_KIND as type for the SIZE. I guess if we cast it
correctly, and the users follow the MPI specification, this should work.
Please tell me you are joking?
No, th
On Dec 18 2010, Ken Lloyd wrote:
Yes, this is a hard problem. It is not endemic to OpenMPI, however.
This hints at the distributed memory/process/thread issues either
through the various OSs or alternately external to them in many solution
spaces.
Absolutely. I hope that I never implied an
On Dec 17 2010, Jeff Squyres wrote:
It's not an unknown problem -- as George and Ralph were trying to say, it
was a design decision on our part.
Sadly, flexible dynamic processing is not something that many people ask
for. We have invested time in it over the year to get it working and have
On Dec 17 2010, George Bosilca wrote:
Let me try to round the edges on this one. It is not that we couldn't or
wouldn't like to have a more "MPI" compliant approach on this, but the
definition of connected processes in the MPI standard is [kind of] shady.
One thing is clear however, it is a t
On Dec 17 2010, Suraj Prabhakaran wrote:
I am observing a behavior where when the parent spawns a child and when
the child terminates abruptly (for example with exit() before
MPI_Finalize() ), the parent also terminates even after both the child
and parent have explicitly called a MPI_disconn
On Nov 30 2010, Ralph Castain wrote:
Here is what one IB vendor says about the issue on their web site
(redacted to protect the innocent):
"At the time of this release, the (redacted-openib) driver has issues
with buffers sharing pages when fork( ) is used. Pinned (locked in
memory) pages a
On Nov 29 2010, George Bosilca wrote:
If your code doesn't exactly what is described in the code snippet
attached to your previous email, then you can safely ignore the warning.
In fact, any fork done prior to the communication is a non-issue, but it
is difficult to identify. Therefore, we ou
On Oct 30 2010, George Bosilca wrote:
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
My version of the MPI standard doesn't say this? Should I update?
The best diagnostic I ever got out of a comp
It was, of course, my error. However, something very weird (and potentially
very inefficient) is going on behind writev() and I shall look into what.
Regards,
Nick Maclaren.
As part of writing a course, I was trying to investigate how OpenMPI
handles transfers when using bog-standard Linux and Ethernet (which I
assume means TCP/IP). Having failed to track down the actual transfer call,
I ran a simple test program under 'strace -f' but, in between two
diagnostic cal
On Jul 20 2010, Jeff Squyres wrote:
> Also, it seems like the 3rd parameter could be problematic if it ever
> goes larger than 2B -- it'll increment in the wrong direction, won't
> it?
Not on most systems.
Ah -- I just checked -- the associativity of + and (cast) are equal, and
are righ
On Jul 20 2010, Jeff Squyres wrote:
The change was to add casting:
} while (!OPAL_ATOMIC_CMPSET_32((int32_t*)&ep->eager_rdma_remote.seq,
(int32_t)ftr->seq, (int32_t)ftr->seq+1));
Is it safe to simply cast a (uint32_t*) to (int32_t*) in the first param?
Pretty safe. While there ARE
On Jul 15 2010, Jeff Squyres wrote:
On Jul 15, 2010, at 2:14 AM, nadia.derbey wrote:
The only warning I'm getting in the part of the code impacted by the
patch is:
-
../../../../../ompi/mca/btl/openib/btl_openib_async.c(322): warning
#188: enumerated type mixed with another
On May 10 2010, Kawashima wrote:
Because MPI_THREAD_FUNNELED/SERIALIZED doesn't restrict other threads to
call functions other than those of MPI library, code bellow are not
thread safe if malloc is not thread safe and MPI_Allreduce calls malloc.
#pragma omp parallel for private(is_master)
On May 10 2010, Sylvain Jeaugey wrote:
That is definitely the correct action. Unless an application or library
has been built with thread support, or can guaranteed to be called only
from a single thread, using threads is catastrophic.
I personnaly see that as a bug, but I certainly lack so
On May 10 2010, Kawashima wrote:
Though Sylvain's original mail (*1) was sent 4 months ago and nobody
replied to it, I'm interested in this issue and strongly agree with
Sylvain.
*1 http://www.open-mpi.org/community/lists/devel/2010/01/7275.php
As explained by Sylvain, current Open MPI implemen
On May 4 2010, Jeff Squyres wrote:
If there's a sleep(1) in the run-time test, that would be an annoying
source of delay in the startup of a job. This is not a deal-breaker, but
it would be nice(r) if there was a "fast" run-time check that could be
checked during the sysv selection logic (i.e.
On May 4 2010, Terry Dontje wrote:
Ralph Castain wrote:
Is a configure-time test good enough? For example, are all Linuxes
the same in this regard. That is if you built OMPI on RH and it
configured in the new SysV SM will those bits actually run on other
Linux systems correctly? I think J
On May 4 2010, Terry Dontje wrote:
Is a configure-time test good enough? For example, are all Linuxes the
same in this regard. That is if you built OMPI on RH and it configured
in the new SysV SM will those bits actually run on other Linux systems
correctly? I think Jeff had hinted to this
On May 3 2010, Jeff Squyres wrote:
Write a small C program that does something like the following (this is
off the top of my head):
fork a child
child goes to sleep immediately
sysv alloc a segment
attach to it
ipc rm it
parent wakes up child
child tries to attach to segment
If that succeed
On May 2 2010, Ashley Pittman wrote:
On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote:
As to performance there should be no difference in use between sys-V
shared memory and file-backed shared memory, the instructions issued and
the MMU flags for the page should both be the same so the perfo
On Apr 30 2010, Ralph Castain wrote:
Guess this has been too upsetting a question - I'll work off-list with
the other developers to determine an appropriate OMPI behavior.
I have responded to Ralph Castain by Email, but I need to correct the
implication in the above.
What OpenMPI chooses to
On Apr 30 2010, Ralph Castain wrote:
On Apr 30, 2010, at 6:15 AM, Jeff Squyres wrote:
MPI quite rightly does not specify this, because the matter is very
system- dependent, and it is not possible to return the exit code (or
display it) in all environments. Sorry, but that is reality.
Correct
On Apr 30 2010, Jeff Squyres wrote:
The last paragraph of the specification of MPI_Finalize makes it clear
that it is the USER'S responsibility to return an exit code to the system
for process 0, and that what happens for other ones is undefined. Or
fairly clear - it could be stated in so many
On Apr 30 2010, Larry Baker wrote:
I don't know if there is any standard ordering of non-zero exit status
codes. If so, another option would be to return the the largest
(smallest) value, when that is the most serious exit status.
There isn't, and some systems have used exit codes in othe
On Apr 6 2010, luyang dong wrote:
Regardless of any mpi implementation , there
is always a command named mpirun. And correspondingly there is a source
file called mpirun.c.(at least in lam/mpi),but i can not find this file
in openmpi. can you tell me how to produce this command in openmpi.
On Jan 22 2010, Ralph Castain wrote:
For SLURM, there is a config file where you can specify what gets
propagated. It is clearly an error to include hostname as it messes many
things up, not just OMPI. Frankly, I've never seen someone do that on
SLURM.
Well, it's USUALLY an error That'
On Jan 22 2010, Nadia Derbey wrote:
I'm wondering whether the HOSTNAME environment variable shouldn't be
handled as a "special case" when the orted daemons launch the remote
jobs. This particularly applies to batch schedulers where the caller's
environment is copied to the remote job: we are inh
On Sep 9 2009, George Bosilca wrote:
On Sep 9, 2009, at 14:16 , Lenny Verkhovsky wrote:
does C99 complient compiler is something unusual
or is there a policy among OMPI developers/users that prevent me f
rom using __func__ instead of hardcoded strings in the code ?
__func__ is what you shoul
On Aug 17 2009, Paul H. Hargrove wrote:
+ I wonder if one can do any "introspection" with the dynamic linker to
detect hybrid OpenMP (no "I") apps and avoid pinning them by default
(examining OMP_NUM_THREADS in the environment is no good, since that
variable may have a site default value othe
On Aug 17 2009, Jeff Squyres wrote:
Yes, BUT... We had a similar option to this for a long, long time.
Sorry, perhaps I should have spelled out what I meant by "mandatory".
The system would not build (or run, depending on where it was set)
without such a value being specified. There would
On Aug 17 2009, Ralph Castain wrote:
At issue for us is that other MPIs -do- bind by default, thus creating an
apparent performance advantage for themselves compared to us on standard
benchmarks run "out-of-the-box". We repeatedly get beat-up in papers and
elsewhere over our performance, when ma
On Aug 17 2009, Ralph Castain wrote:
The problem is that the two mpiruns don't know about each other, and
therefore the second mpirun doesn't know that another mpirun has
already used socket 0.
We hope to change that at some point in the future.
It won't help. The problem is less likely
On Aug 17 2009, Jeff Squyres wrote:
On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
I think the problem here, Eugene, is that performance benchmarks are
far from the typical application. We have repeatedly seen this -
optimizing for benchmarks frequently makes applications run less
effi
On Jun 22 2009, Iain Bason wrote:
Jeff Squyres wrote:
Thanks for looking into this, David.
So if I understand that correctly, it means you have to assign all
literals in your fortran program with a "_16" suffix. I don't know if
that's standard Fortran or not.
Yes, it is.
Sorry - no, it
On Feb 7 2009, Jeff Squyres wrote:
On Feb 7, 2009, at 12:23 PM, Brian W. Barrett wrote:
That is significantly higher than I would have expected for a single
function call. When I did all the component tests a couple years
ago, a function call into a shared library was about 5ns on an Intel
On Jan 23 2009, Jeff Squyres wrote:
FWIW, ABI is not necessarily a bad thing; it has its benefits and
drawbacks (and enablers and limitations). Some people want it and
some people don't (most don't care, I think). We'll see where that
effort goes in the Forum and elsewhere.
Right. But
On Jan 23 2009, Jeff Squyres wrote:
No. Open MPI's Fortran MPI_COMM_WORLD is pretty much hard-wired to 0.
That's a mistake. But probably non-trivial to fix.
Could you explain what you meant by that? There is no "fix"; Open
MPI's Fortran MPI_COMM_WORLD has always been 0. More specifica
On Jan 23 2009, Jeff Squyres wrote:
On Jan 23, 2009, at 12:30 AM, David Robertson wrote:
I have looked for both MPI_COMM_WORLD and mpi_comm_world but neither
can be found by totalview (the parallel debugger we use) when I
compile with "USE mpi". When I use "include 'mpif.h'" both
MPI_COMM_
58 matches
Mail list logo