Re: [OMPI devel] FOSS for scientists devroom at FOSDEM 2013

2012-11-20 Thread N.M. Maclaren
On Nov 20 2012, Jeff Squyres wrote: Cool! Thanks for the invite. Do we have any European friends who would be able to attend this conference? If I count, in theory, yes. In practice, I doubt it. It would depend on other things, which are unlikely to be decided in time, and whether I dare

Re: [OMPI devel] C99 in wrapper compilers?

2012-11-20 Thread N.M. Maclaren
On Nov 20 2012, Jeff Squyres wrote: I very, VERY strongly advise you to decide what you mean by that. Very few compilers that 'support' C99 do so for the whole language, and most use the more system-dependent features in very different ways. In theory, just enabling C99 could break code, and

Re: [OMPI devel] C99 in wrapper compilers?

2012-11-20 Thread N.M. Maclaren
On Nov 20 2012, Jeff Squyres wrote: While at SC, Brian, Ralph, Nathan and I had long conversations about C99. We decided: - all compilers that we care about seem to support C99 - so let's move the trunk and v1.7 to *require* C99 and see if anyone screams --> we're NOT doing this in v1.6 - m

Re: [OMPI devel] About Marshalling and Umarshalling

2012-11-05 Thread N.M. Maclaren
On Nov 5 2012, Ralph Castain wrote: We adhere to the MPI standard, so we expect the user in such an instance to define a datatype that reflects the structure they are trying to send. We will then do the voodoo to correctly send that data in a heterogeneous environment, and pass the data back

Re: [OMPI devel] MPI_Reduce() is losing precision

2012-10-15 Thread N.M. Maclaren
On Oct 15 2012, Iliev, Hristo wrote: Numeric differences are to be expected with parallel applications. The basic reason for that is that on many architectures floating-point operations are performed using higher internal precision than that of the arguments and only the final result is round

Re: [OMPI devel] making Fortran MPI_Status components public

2012-09-27 Thread N.M. Maclaren
On Sep 27 2012, Eugene Loh wrote: Good discussion, but as far as my specific issue goes, it looks like it's some peculiar interaction between different compiler versions. I'm asking some experts. Module incompatibility is a common problem, and the solution is NOT to put a hack into the conf

Re: [OMPI devel] making Fortran MPI_Status components public

2012-09-27 Thread N.M. Maclaren
On Sep 27 2012, Jeff Squyres (jsquyres) wrote: Fwiw, we have put in many hours of engineering to "that obscene hack" *because* compilers all have differing degrees of compatibility suck. It's going to be years before compilers fully support f08, for example, so we have no choice but to test f

Re: [OMPI devel] making Fortran MPI_Status components public

2012-09-27 Thread N.M. Maclaren
On Sep 27 2012, Jeff Squyres wrote: On Sep 27, 2012, at 7:30 AM, Paul Hargrove wrote: PUBLIC should be a standard part of F95 (no configure probe required). Good. However, the presence of "OMPI_PRIVATE" suggests you already have a configure probe for the "PRIVATE" keyword. Yes, we do, bec

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread N.M. Maclaren
On Mar 5 2012, George Bosilca wrote: I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave. That'll at least check that it's not totally broken. The killer about such wording is that you cannot guarantee ex

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread N.M. Maclaren
On Mar 5 2012, George Bosilca wrote: I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to pre

Re: [OMPI devel] F90 open-mpi module bug

2011-05-21 Thread N.M. Maclaren
On May 21 2011, Dan Reynolds wrote: ./test_driver.F90:12.39: call mpi_abort(MPI_COMM_WORLD, -1, 0) It's unlikely to provoke that particular error, but that call is erroneous. It should be something like: integer :: ierror call mpi_abort(MPI_COMM_WORLD, 1, ierror) Negative error numbers

Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread N.M. Maclaren
On Apr 22 2011, Ralph Castain wrote: Several of us are. Josh and George (plus teammates), and some other outside folks, are working the MPI side of it. I'm working only the ORTE side of the problem. Quite a bit of capability is already in the trunk, but there is always more to do :-) Is th

Re: [OMPI devel] Exit status

2011-04-14 Thread N.M. Maclaren
On Apr 14 2011, Jeff Squyres wrote: I think Ralph's point is that OMPI is providing the run-time environment for the application, and it would probably behoove us to support both kinds of behaviors since there are obviously people in both camps out there. It's pretty easy to add a non-defaul

Re: [OMPI devel] Exit status

2011-04-14 Thread N.M. Maclaren
On Apr 14 2011, Ralph Castain wrote: ... It's hopeless, and whatever you do will be wrong for many people. ... I think that sums it up pretty well. :-) It does seem a little strange that the scenario you describe somewhat implies that one process is calling MPI_Finalize lng before th

Re: [OMPI devel] Exit status

2011-04-14 Thread N.M. Maclaren
On Apr 14 2011, Ralph Castain wrote: I've run across an interesting issue for which I don't have a ready answer. If an MPI process aborts, we automatically abort the entire job. If an MPI process returns a non-zero exit status, indicating that there was something abnormal about its terminatio

Re: [OMPI devel] [Fwd: multi-threaded test]

2011-03-15 Thread N.M. Maclaren
On Mar 15 2011, George Bosilca wrote: Nobody challenged your statements about threading or about the correctness of the POSIX standard. However, such concerns are better voiced on forums related to that specific subject, where they have a chance to be taken into account by people who understa

Re: [OMPI devel] [Fwd: multi-threaded test]

2011-03-12 Thread N.M. Maclaren
On Mar 12 2011, George Bosilca wrote: Removing thread support is _NOT_ an option (https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/MPI3Hybrid). Unlike the usual claims on this mailing list, MPI_THREAD_MULTIPLE had been fully supported for several BTLs in Open MPI (http://www.springerlink.co

Re: [OMPI devel] [Fwd: multi-threaded test]

2011-03-11 Thread N.M. Maclaren
On Mar 11 2011, Eugene Loh wrote: The idea would be to hardwire support for MPI_THREAD_MULTIPLE to be off, just as we have done for progress threads. Threads might still be used for other purposes -- e.g., ORTE, openib async thread, etc. That's what I was assuming, too. Threads used behind

Re: [OMPI devel] [Fwd: multi-threaded test]

2011-03-10 Thread N.M. Maclaren
On Mar 10 2011, Eugene Loh wrote: Any comments on this? We wanted to clean up MPI_THREAD_MULTIPLE support in the trunk and port these changes back to 1.5.x, but it's unclear to me what our expectations should be about any MPI_THREAD_MULTIPLE test succeeding. How do we assess (test) our chan

Re: [OMPI devel] MPI_File_get_size fails for files > 2 GB in Fortran

2010-12-20 Thread N.M. Maclaren
On Dec 20 2010, George Bosilca wrote: There is a hint for F77 users at the bottom of the page. It suggests to use INTEGER*MPI_OFFSET_KIND as type for the SIZE. I guess if we cast it correctly, and the users follow the MPI specification, this should work. Please tell me you are joking? No, th

Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)

2010-12-18 Thread N.M. Maclaren
On Dec 18 2010, Ken Lloyd wrote: Yes, this is a hard problem. It is not endemic to OpenMPI, however. This hints at the distributed memory/process/thread issues either through the various OSs or alternately external to them in many solution spaces. Absolutely. I hope that I never implied an

Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)

2010-12-18 Thread N.M. Maclaren
On Dec 17 2010, Jeff Squyres wrote: It's not an unknown problem -- as George and Ralph were trying to say, it was a design decision on our part. Sadly, flexible dynamic processing is not something that many people ask for. We have invested time in it over the year to get it working and have

Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)

2010-12-17 Thread N.M. Maclaren
On Dec 17 2010, George Bosilca wrote: Let me try to round the edges on this one. It is not that we couldn't or wouldn't like to have a more "MPI" compliant approach on this, but the definition of connected processes in the MPI standard is [kind of] shady. One thing is clear however, it is a t

Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)

2010-12-17 Thread N.M. Maclaren
On Dec 17 2010, Suraj Prabhakaran wrote: I am observing a behavior where when the parent spawns a child and when the child terminates abruptly (for example with exit() before MPI_Finalize() ), the parent also terminates even after both the child and parent have explicitly called a MPI_disconn

Re: [OMPI devel] Warning on fork() disappears if I use MPI threads!!

2010-11-30 Thread N.M. Maclaren
On Nov 30 2010, Ralph Castain wrote: Here is what one IB vendor says about the issue on their web site (redacted to protect the innocent): "At the time of this release, the (redacted-openib) driver has issues with buffers sharing pages when fork( ) is used. Pinned (locked in memory) pages a

Re: [OMPI devel] Warning on fork() disappears if I use MPI threads!!

2010-11-29 Thread N.M. Maclaren
On Nov 29 2010, George Bosilca wrote: If your code doesn't exactly what is described in the code snippet attached to your previous email, then you can safely ignore the warning. In fact, any fork done prior to the communication is a non-issue, but it is difficult to identify. Therefore, we ou

Re: [OMPI devel] Restore sanity

2010-10-30 Thread N.M. Maclaren
On Oct 30 2010, George Bosilca wrote: *** The MPI_Init() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. My version of the MPI standard doesn't say this? Should I update? The best diagnostic I ever got out of a comp

Re: [OMPI devel] Weird problem with strace, and question about transfers

2010-07-23 Thread N.M. Maclaren
It was, of course, my error. However, something very weird (and potentially very inefficient) is going on behind writev() and I shall look into what. Regards, Nick Maclaren.

[OMPI devel] Weird problem with strace, and question about transfers

2010-07-22 Thread N.M. Maclaren
As part of writing a course, I was trying to investigate how OpenMPI handles transfers when using bog-standard Linux and Ethernet (which I assume means TCP/IP). Having failed to track down the actual transfer call, I ran a simple test program under 'strace -f' but, in between two diagnostic cal

Re: [OMPI devel] IB warnings

2010-07-20 Thread N.M. Maclaren
On Jul 20 2010, Jeff Squyres wrote: > Also, it seems like the 3rd parameter could be problematic if it ever > goes larger than 2B -- it'll increment in the wrong direction, won't > it? Not on most systems. Ah -- I just checked -- the associativity of + and (cast) are equal, and are righ

Re: [OMPI devel] IB warnings

2010-07-20 Thread N.M. Maclaren
On Jul 20 2010, Jeff Squyres wrote: The change was to add casting: } while (!OPAL_ATOMIC_CMPSET_32((int32_t*)&ep->eager_rdma_remote.seq, (int32_t)ftr->seq, (int32_t)ftr->seq+1)); Is it safe to simply cast a (uint32_t*) to (int32_t*) in the first param? Pretty safe. While there ARE

Re: [OMPI devel] PATCH: Wrong event_type value passed in to show_help when getting xrc async events

2010-07-15 Thread N.M. Maclaren
On Jul 15 2010, Jeff Squyres wrote: On Jul 15, 2010, at 2:14 AM, nadia.derbey wrote: The only warning I'm getting in the part of the code impacted by the patch is: - ../../../../../ompi/mca/btl/openib/btl_openib_async.c(322): warning #188: enumerated type mixed with another

Re: [OMPI devel] Thread safety levels

2010-05-10 Thread N.M. Maclaren
On May 10 2010, Kawashima wrote: Because MPI_THREAD_FUNNELED/SERIALIZED doesn't restrict other threads to call functions other than those of MPI library, code bellow are not thread safe if malloc is not thread safe and MPI_Allreduce calls malloc. #pragma omp parallel for private(is_master)

Re: [OMPI devel] Thread safety levels

2010-05-10 Thread N.M. Maclaren
On May 10 2010, Sylvain Jeaugey wrote: That is definitely the correct action. Unless an application or library has been built with thread support, or can guaranteed to be called only from a single thread, using threads is catastrophic. I personnaly see that as a bug, but I certainly lack so

Re: [OMPI devel] Thread safety levels

2010-05-10 Thread N.M. Maclaren
On May 10 2010, Kawashima wrote: Though Sylvain's original mail (*1) was sent 4 months ago and nobody replied to it, I'm interested in this issue and strongly agree with Sylvain. *1 http://www.open-mpi.org/community/lists/devel/2010/01/7275.php As explained by Sylvain, current Open MPI implemen

Re: [OMPI devel] System V Shared Memory for Open MPI:Request forCommunity Input and Testing

2010-05-04 Thread N.M. Maclaren
On May 4 2010, Jeff Squyres wrote: If there's a sleep(1) in the run-time test, that would be an annoying source of delay in the startup of a job. This is not a deal-breaker, but it would be nice(r) if there was a "fast" run-time check that could be checked during the sysv selection logic (i.e.

Re: [OMPI devel] System V Shared Memory for Open MPI:Request for Community Input and Testing

2010-05-04 Thread N.M. Maclaren
On May 4 2010, Terry Dontje wrote: Ralph Castain wrote: Is a configure-time test good enough? For example, are all Linuxes the same in this regard. That is if you built OMPI on RH and it configured in the new SysV SM will those bits actually run on other Linux systems correctly? I think J

Re: [OMPI devel] System V Shared Memory for Open MPI:Request for Community Input and Testing

2010-05-04 Thread N.M. Maclaren
On May 4 2010, Terry Dontje wrote: Is a configure-time test good enough? For example, are all Linuxes the same in this regard. That is if you built OMPI on RH and it configured in the new SysV SM will those bits actually run on other Linux systems correctly? I think Jeff had hinted to this

Re: [OMPI devel] System V Shared Memory for Open MPI:Request for Community Input and Testing

2010-05-03 Thread N.M. Maclaren
On May 3 2010, Jeff Squyres wrote: Write a small C program that does something like the following (this is off the top of my head): fork a child child goes to sleep immediately sysv alloc a segment attach to it ipc rm it parent wakes up child child tries to attach to segment If that succeed

Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing

2010-05-02 Thread N.M. Maclaren
On May 2 2010, Ashley Pittman wrote: On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote: As to performance there should be no difference in use between sys-V shared memory and file-backed shared memory, the instructions issued and the MMU flags for the page should both be the same so the perfo

Re: [OMPI devel] MPI Forum question?

2010-05-01 Thread N.M. Maclaren
On Apr 30 2010, Ralph Castain wrote: Guess this has been too upsetting a question - I'll work off-list with the other developers to determine an appropriate OMPI behavior. I have responded to Ralph Castain by Email, but I need to correct the implication in the above. What OpenMPI chooses to

Re: [OMPI devel] MPI Forum question?

2010-04-30 Thread N.M. Maclaren
On Apr 30 2010, Ralph Castain wrote: On Apr 30, 2010, at 6:15 AM, Jeff Squyres wrote: MPI quite rightly does not specify this, because the matter is very system- dependent, and it is not possible to return the exit code (or display it) in all environments. Sorry, but that is reality. Correct

Re: [OMPI devel] MPI Forum question?

2010-04-30 Thread N.M. Maclaren
On Apr 30 2010, Jeff Squyres wrote: The last paragraph of the specification of MPI_Finalize makes it clear that it is the USER'S responsibility to return an exit code to the system for process 0, and that what happens for other ones is undefined. Or fairly clear - it could be stated in so many

Re: [OMPI devel] MPI Forum question?

2010-04-30 Thread N.M. Maclaren
On Apr 30 2010, Larry Baker wrote: I don't know if there is any standard ordering of non-zero exit status codes. If so, another option would be to return the the largest (smallest) value, when that is the most serious exit status. There isn't, and some systems have used exit codes in othe

Re: [OMPI devel] inquiry about mpirun

2010-04-06 Thread N.M. Maclaren
On Apr 6 2010, luyang dong wrote: Regardless of any mpi implementation , there is always a command named mpirun. And correspondingly there is a source file called mpirun.c.(at least in lam/mpi),but i can not find this file in openmpi. can you tell me how to produce this command in openmpi.

Re: [OMPI devel] HOSTNAME environment variable

2010-01-22 Thread N.M. Maclaren
On Jan 22 2010, Ralph Castain wrote: For SLURM, there is a config file where you can specify what gets propagated. It is clearly an error to include hostname as it messes many things up, not just OMPI. Frankly, I've never seen someone do that on SLURM. Well, it's USUALLY an error That'

Re: [OMPI devel] HOSTNAME environment variable

2010-01-22 Thread N.M. Maclaren
On Jan 22 2010, Nadia Derbey wrote: I'm wondering whether the HOSTNAME environment variable shouldn't be handled as a "special case" when the orted daemons launch the remote jobs. This particularly applies to batch schedulers where the caller's environment is copied to the remote job: we are inh

Re: [OMPI devel] Error message improvement

2009-09-09 Thread N.M. Maclaren
On Sep 9 2009, George Bosilca wrote: On Sep 9, 2009, at 14:16 , Lenny Verkhovsky wrote: does C99 complient compiler is something unusual or is there a policy among OMPI developers/users that prevent me f rom using __func__ instead of hardcoded strings in the code ? __func__ is what you shoul

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread N.M. Maclaren
On Aug 17 2009, Paul H. Hargrove wrote: + I wonder if one can do any "introspection" with the dynamic linker to detect hybrid OpenMP (no "I") apps and avoid pinning them by default (examining OMP_NUM_THREADS in the environment is no good, since that variable may have a site default value othe

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread N.M. Maclaren
On Aug 17 2009, Jeff Squyres wrote: Yes, BUT... We had a similar option to this for a long, long time. Sorry, perhaps I should have spelled out what I meant by "mandatory". The system would not build (or run, depending on where it was set) without such a value being specified. There would

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread N.M. Maclaren
On Aug 17 2009, Ralph Castain wrote: At issue for us is that other MPIs -do- bind by default, thus creating an apparent performance advantage for themselves compared to us on standard benchmarks run "out-of-the-box". We repeatedly get beat-up in papers and elsewhere over our performance, when ma

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread N.M. Maclaren
On Aug 17 2009, Ralph Castain wrote: The problem is that the two mpiruns don't know about each other, and therefore the second mpirun doesn't know that another mpirun has already used socket 0. We hope to change that at some point in the future. It won't help. The problem is less likely

Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-17 Thread N.M. Maclaren
On Aug 17 2009, Jeff Squyres wrote: On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote: I think the problem here, Eugene, is that performance benchmarks are far from the typical application. We have repeatedly seen this - optimizing for benchmarks frequently makes applications run less effi

Re: [OMPI devel] MPI_REAL16

2009-06-22 Thread N.M. Maclaren
On Jun 22 2009, Iain Bason wrote: Jeff Squyres wrote: Thanks for looking into this, David. So if I understand that correctly, it means you have to assign all literals in your fortran program with a "_16" suffix. I don't know if that's standard Fortran or not. Yes, it is. Sorry - no, it

Re: [OMPI devel] RFC: [slightly] Optimize Fortran MPI_SEND / MPI_RECV

2009-02-08 Thread N.M. Maclaren
On Feb 7 2009, Jeff Squyres wrote: On Feb 7, 2009, at 12:23 PM, Brian W. Barrett wrote: That is significantly higher than I would have expected for a single function call. When I did all the component tests a couple years ago, a function call into a shared library was about 5ns on an Intel

Re: [OMPI devel] Fortran 90 Interface

2009-01-23 Thread N.M. Maclaren
On Jan 23 2009, Jeff Squyres wrote: FWIW, ABI is not necessarily a bad thing; it has its benefits and drawbacks (and enablers and limitations). Some people want it and some people don't (most don't care, I think). We'll see where that effort goes in the Forum and elsewhere. Right. But

Re: [OMPI devel] Fortran 90 Interface

2009-01-23 Thread N.M. Maclaren
On Jan 23 2009, Jeff Squyres wrote: No. Open MPI's Fortran MPI_COMM_WORLD is pretty much hard-wired to 0. That's a mistake. But probably non-trivial to fix. Could you explain what you meant by that? There is no "fix"; Open MPI's Fortran MPI_COMM_WORLD has always been 0. More specifica

Re: [OMPI devel] Fortran 90 Interface

2009-01-23 Thread N.M. Maclaren
On Jan 23 2009, Jeff Squyres wrote: On Jan 23, 2009, at 12:30 AM, David Robertson wrote: I have looked for both MPI_COMM_WORLD and mpi_comm_world but neither can be found by totalview (the parallel debugger we use) when I compile with "USE mpi". When I use "include 'mpif.h'" both MPI_COMM_