One potential other issue, r32555 means that any other struct members are now
no longer zeroed, it might be worth putting a memset() or simply assigning a
value of {0} to the struct in order to preserve the old behaviour.
Ashley.
On 21 Aug 2014, at 04:31, Gilles Gouaillardet
wrote:
> Paul,
gt; On May 8, 2014, at 9:09 AM, Ashley Pittman wrote:
>
>>
>> I started getting build failures against trunk on the 29th, most likely as a
>> result of this commit:
>>
>> https://github.com/open-mpi/ompi-svn-mirror/commit/3f42cbf50670c5b311cc4414dbb3f4ccf762e455
&
I was thinking of something even easier than that ;) I try to keep an eye on
the message queue functionality so it’s not often that I need to build code
over four years old from source.
Ashley.
On 8 May 2014, at 14:27, Jeff Squyres (jsquyres) wrote:
> On May 8, 2014, at 8:59 AM, Ash
I started getting build failures against trunk on the 29th, most likely as a
result of this commit:
https://github.com/open-mpi/ompi-svn-mirror/commit/3f42cbf50670c5b311cc4414dbb3f4ccf762e455
It looks like there was another commit almost immediately afterwards which
fixed the first problem (in
This will break my build but it’s an easy fix so don’t let that stop you.
Ashley.
On 8 May 2014, at 11:08, Jeff Squyres (jsquyres) wrote:
> WHAT: Remove the backwards-compatibility autogen.sh sym link
>
> WHY: Because it's time
>
> WHERE: svn rm autogen.sh
>
> TIMEOUT: Teleconf next Tuesday
On 19 Dec 2013, at 13:59, Jeff Squyres (jsquyres) wrote:
>
> - if we oversubscribe, (possibly) warn about the performance loss of
> oversubscription, and don't bind
> - don't warn about lack of memory binding
>
> Thoughts?
+1, I hit this myself today. I typically run on a VM and oversubscrib
fixes the issue?
>
> -Nathan
>
> On Thu, 15 Dec 2011, Ashley Pittman wrote:
>
>>
>> padb just calls gdb, you can see the error using gdb alone using just the
>> trace I sent when I started this thread.
>>
>> Perhaps the difference is in versions of gdb,
is totalview, STAT, and GDB see the correct values despite them
> being in the B section. What does padb do differently?
>
> This is a dynamic, optimized build of 1.5.5rc1.
>
> -Nathan Hjelm
> HPC-3, LANL
>
> On Thu, 15 Dec 2011, Ashley Pittman wrote:
>
>>
m
> HPC-3, LANL
>
> On Thu, 15 Dec 2011, George Bosilca wrote:
>
>>
>> On Dec 15, 2011, at 16:55 , Ashley Pittman wrote:
>>
>>> There is a problem with 1.5.5rc1 that prevents padb from loading the
>>> process table start from the orterun proc
There is a problem with 1.5.5rc1 that prevents padb from loading the process
table start from the orterun process, what appears to be happening is that
MPIR_proctable and MPIR_proctable_size is present in both orterun itself and
also in libopen-rte.so, the code is correctly setting them in libo
On 15 Dec 2011, at 20:16, Ashley Pittman wrote:
>
> On 14 Dec 2011, at 04:36, Jeff Squyres wrote:
>
>> In the usual place:
>>
>> http://www.open-mpi.org/software/ompi/v1.5/
>>
>> Please test! I would really like to get this out by the end of the wee
On 14 Dec 2011, at 04:36, Jeff Squyres wrote:
> In the usual place:
>
>http://www.open-mpi.org/software/ompi/v1.5/
>
> Please test! I would really like to get this out by the end of the week.
As with 1.4 I've tested it on two nodes of Ubuntu 11.10 (hpcloud) and confirm
that it works as
On 8 Dec 2011, at 22:13, Jeff Squyres wrote:
> 1.4.5rc1 is now posted in the usual place:
>
>http://www.open-mpi.org/software/ompi/v1.4/
>
> Gearing up for a pre-Christmas release -- please test! There have only been
> a few bug fixes since 1.4.4. See
> http://svn.open-mpi.org/svn/ompi/
I think the volatiles are there to ensure the compiler doesn't optimise away
reads or function calls which has been a problem with this interface in the
past.
On 8 Nov 2011, at 22:18, George Bosilca wrote:
> MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the
> startup
On 8 Nov 2011, at 00:59, George Bosilca wrote:
> A started process is defined as being our mpirun. In Open MPI
> MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a
> means to synchronize the processes not based on MPIR_debug_gate. Therefore
> only one behavior if acc
more than happy with the current output, the only problem I've had in
the time I've been using it is some extra fields that are appended if using
checkpoint-restart.
Ashley.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
https://svn.open-mpi.org/trac/ompi/ticket/2603
I've set it to critical as this is clearly a regression.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
On 15 Oct 2010, at 00:28, Ralph Castain wrote:
> Yeah, I heard about it a couple of weeks ago. On the "to-do" list for the
> future.
Ok, is there a bug or do you want me to file one?
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluste
rc$ ./padb -Ormgr=mpirun -a --proc-summary
-Odebug
Ashley.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
n this repo consisted in refreshing ROMIO to a newer
> version: the one from the very last MPICH2 release (mpich2-1.3b1).
Is there any word on when this will be pulled into the mainline?
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
e on the bitmask returned by the btls.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
On 4 May 2010, at 15:41, Jeff Squyres wrote:
> On May 4, 2010, at 9:32 AM, Ashley Pittman wrote:
>
>>> One thing to be careful with a run-time check is that you might not want
>>> *all* processes on a box to try to alloc a sysv segment, fork a child, try
>>>
s to the rest of the local
> procs -- maybe in the modex?
I think as as user I'd be quite surprised if my MPI job was spawning
sub-processes during MPI_Init().
Ashley.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
of this, sysv support may be limited to Linux systems - that is,
> until we can get a better sense of which systems provide the shmctl
> IPC_RMID behavior that I am relying on.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
it
> possible that with the newer kernels, operations to that shared file are
> going all the way out to disk? Maybe you don't know the answer, but
> hopefully someone on this mail list can provide some insight.
Is the /tmp filesystem on NFS by any chance?
Ashley,
--
Ashley
lized.
>>
>> On Feb 23, 2010, at 12:58 PM, Greg Watson wrote:
>>
>>> Ralph,
>>>
>>> I notice that you've got support in the XML output code to display the pids
>>> of the processes, but I can't see how to enable them. Can you give me any
>>> pointers?
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
On 17 Jan 2010, at 16:50, Barrett, Brian W wrote:
> On Jan 17, 2010, at 11:31 AM, Ashley Pittman wrote:
>
>> On 10 Jan 2010, at 03:45, Barrett, Brian W wrote:
>>
>>> We should absolutely not change this. For simple applications, yes, things
>>> work if la
On 11 Jan 2010, at 16:52, Jeff Squyres wrote:
> Arrgh -- if only the Linux kernel community had accepted ummunotify, this
> would now be a moot point (i.e., the argument would be solely with the
> OS/glibc, not the MPI!).
Disabling MMAP based malloc is purely about performance, ummunotify is a
On 10 Jan 2010, at 03:45, Barrett, Brian W wrote:
> We should absolutely not change this. For simple applications, yes, things
> work if large blocks are allocated on the heap. However, ptmalloc (and most
> allocators, really), can't rationally cope with repeated allocations and
> deallocati
em to be a specific name orte gives to
multiple "jobs" within the same job, if there is and I've missed one I'm
happy to change the name of the option to something more appropriate.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
ome/debian/ashley/code/tmp/v1.4/test'
make[1]: *** [check-recursive] Error 1
make: *** [check-recursive] Error 1
ashley@alpha:~/code/tmp/v1.4$
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
On Tue, 2009-12-08 at 07:39 -0500, Terry Dontje wrote:
> Ashley Pittman wrote:
> > I've seen several cases now where people have functional, installed MPI
> > libraries yet when they've come to use padb they have discovered a build
> > problem with the Message Q
quot; or "make install". As such it's a step forward but it would be
better if the test was performed in the make stage, I haven't figured
out how to do this however.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
ht
opyright
rests with them.
http://sourceforge.net/projects/hilite/
The astute will note that it has a --html option which formats build
logs nicely into web pages with fixed width fonts and appropriate
colouring.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
this
regard, it'll forward both stout and stderr to the terminal as normal
but will colour sdterr in red. Simply save the script to somewhere on
PATH and run the alias make to "hilte make", better still add this line
to your .bashrc.
alias make="hilite make"
Ashley,
--
Ashl
complaining about not being able to find symbol
mca_topo_base_comm_1_0_0_t, I saw this myself on the head a few months
ago but can't reproduce it locally now.
Ashley,
On Thu, 2009-10-22 at 11:06 -0400, Jeff Squyres wrote:
> That rocks!!
--
Ashley Pittman, Bath, UK.
Padb - A parallel job in
example can be found at the link below, I hope people find it useful.
http://www.open-mpi.org/mtt/index.php?do_redir=1161
I am actively working on improvements for this if anybody has anything
they would like to see added.
Ashley.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection
On Thu, 2009-10-22 at 11:05 +0200, Brice Goglin wrote:
> Ashley Pittman wrote:
> > Does this imply the default is to report on processes in the current
> > cpuset rather than the entire system? Does anyone else feel that
> > violates the principal of least surprise?
> Y
t rather than the entire system? Does anyone else feel that
violates the principal of least surprise?
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
bug
in reduce as it is in ompi_comm_nextcid() from the trace.
I assume all four processes are actually in the same call to comm_dup,
re-compiling your program with -g and re-running padb would confirm this
as it would show the line numbers.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
node and send along the output I'm
sure it would be of help.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
ets that files (e.g. --xml-socket=localhost:1234 rather than
--xml-file=/tmp/app_.xml) but I'll leave that up to you.
I hope this gives you something to think over.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
orse you may end up
with end-user applications that only run in debug mode and not with a
normal build.
I'm all for having as much error checking enabled in debug builds as
possible but to change the behaviour risks masking problems elsewhere
IMHO.
Ashley,
--
Ashley Pittman, Bath, U
sync
>
> And we can leave the default as "compiled out".
>
> Howzat?
I don't understand, what the purpose of the middle state? It seems like
a bad idea to me.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
g). I'm asking all to give the
> prototype code a whirl to shake out any remaining design bugs.
Good to hear this long-standing issue is getting the attention it
deserves, this will be a huge step forward when it's up and running.
Ashley,
--
Ashley Pittman, Bath, UK.
with
OMPI xml but I do work with valgrind xml and the tools need to be
updated for every change to the specification.
Having good documentation of at least the bits which have changed and
when is essential to be able to make a version-independant tool.
Ashley,
--
Ashley Pittman, Bath, UK.
Pa
r may not be perfect but
at least it's never bad.
One (small) point nobody has mentioned yet is that when using
round-robin core binding some applications prefer you to round robin
by-socket and some prefer you to round-robin by-core. This will depend
on their level of comms and any cache-
On Tue, 2009-07-28 at 19:06 -0600, Ralph Castain wrote:
> In other words, a divide-by-zero floating point exception on a
> collective test.
To throw another question into the mix why are the collective tests
using floating point calculations in the first place?
Ashley,
--
Ashley P
On Tue, 2009-07-14 at 18:54 -0700, Eugene Loh wrote:
> P.S. Until the page goes live, I'll also leave it at
> http://www.osl.iu.edu/~eloh/faq/?category=perftools . Or, check out a
> workspace.
I'm happy with it.
Ashley.
--
Ashley Pittman, Bath, UK.
Padb - A parallel job
n see it'll be necessary to have it when it comes to tighter
integration with other software.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
ocket, all it's for is to cause your peer to return from
poll/select so it can query the shared memory state. Signals would also
likely work however they tend to present other problems in my
experience.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
On Tue, 2009-06-16 at 13:39 -0600, Bryan Lally wrote:
> Ashley Pittman wrote:
>
> > Whilst the fact that it appears to only happen on your machine implies
> > it's not a general problem with OpenMPI the fact that it happens in the
> > same location/rep count every t
_Allgather because as I said before these three collectives have
radically different communication patterns.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
ppy to discuss orte specific issues on this list.
The website is at http://padb.pittman.org.uk and I welcome any feedback,
either here, off-list or on either of the padb mailing lists.
Yours,
Ashley Pittman,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
er machines if you use the same process geometry. That would tell us
if we are looking for a pure OpenMPI problem or a wider issue,
potentially eliminating any questions about numa memory layout.
> Will be back after I look at the tool.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parall
ective functionality you'll need to patch openmp with the patch from
http://padb.pittman.org.uk/extensions.html
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
l happening or not.
This is a valuable thing to know however I don't view the proposed
solution as the correct one, if this were the problem you were aiming to
solve I'd recommend a different approach, more like the llnl solution
that Ralph described.
Yours,
Ashley Pittman.
--
Ashle
gt; (defined in the .c file), the compiler include a reference to this in
> > the library.
> >
> > george.
> >
> >
> >
> > On Fri, 15 May 2009, Jeff Squyres wrote:
> >
> >> Could well be our visibility settings, too... Are those symbols
7:06 +0100, Ashley Pittman wrote:
> It's certainly helped and now runs for me however if I run mpirun under
> valgrind and then opmi-ps in another window Valgrind reports errors and
> ompi-ps doesn't list the job so there is clearly something still amiss.
> I'm t
r init to
> cover all cases, but it should be working now with r21249
>
>
>
> On May 18, 2009, at 8:08 AM, Ashley Pittman wrote:
>
> >
> > Ralph,
> >
> > This patch fixed it, num_nodes was being used initialised and hence
> > the
>
Ralph,
This patch fixed it, num_nodes was being used initialised and hence the
client was getting a bogus value for the number of nodes.
Ashley,
On Mon, 2009-05-18 at 10:09 +0100, Ashley Pittman wrote:
> No joy I'm afraid, now I get errors when I run it. This is a single
> node j
/ashley/code/OpenMPI/ompi-trunk-tes/trunk/orte/tools/orte-ps/orte-ps.c
at line 818
Ashley.
On Sat, 2009-05-16 at 08:15 -0600, Ralph Castain wrote:
> This is fixed now, Ashley - sorry for the problem.
>
>
> On May 15, 2009, at 4:47 AM, Ashley Pittman wrote:
>
> > On Thu,
On Fri, 2009-05-15 at 07:43 -0600, Ralph Castain wrote:
> We are running it with 1.3.2, last I heard - haven't tried the current
> 1.3 branch. Ashley reported a problem with some other symbol that
> couldn't be loaded that blocked him on message queue debugging, but
> that was on the trunk.
My pro
All,
The message queue code in the head seems to be broken as well, the
image_has_queues function is failing because it can't find type
mca_topo_base_comm_1_0_0_t in the program.
Attached is a simple patch which improves the error message reported to
the user, it does not however allow message q
On Thu, 2009-05-14 at 22:49 -0600, Ralph Castain wrote:
> It is definitely broken at the moment, Ashley. I have it pretty well
> fixed, but need/want to cleanup some corner cases that have plagued us
> for a long time.
>
> Should have it for you sometime Friday.
Ok, thanks. I might try switc
All,
I'm developing some code to run against Open-MPI trunk and it appears
that the output of ompi-ps has changed since I last looked at it.
Running a two process job on one node I see barely any info when run
directly and little more when run with -n. Is it still possible to
discover the host
On Thu, 2009-05-14 at 19:46 +0200, Ralf Wildenhues wrote:
> Hello,
>
> Ashley, did you rebootstrap with Debian's Libtool?
I'm not sure I understand the question, I did a fresh checkout and
re-ran ./autogen.sh if that's what you mean.
> They enable link_all_deplibs=no in their Libtool
That appea
e current
build.
Ashley Pittman,
On Thu, 2009-05-14 at 11:50 -0400, Jeff Squyres wrote:
> Hmm; odd. I'm not getting these errors. Just to be sure, I did a
> VPATH build and still am not getting these errors... :-\
>
> Are those symbols publicly available in libopen-pal.so
mfs/openmpi/ompi/tools/ompi_info'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/mnt/memfs/openmpi/ompi'
make: *** [all-recursive] Error 1
ashley@alpha:/mnt/memfs/openmpi$
I can provide more information if requested although as I say I don't
think I'm doing anything out of the ordinary.
Ashley Pittman,
f the opinion that mmaping shared files was a much
more advanced solution.
Ashley Pittman.
70 matches
Mail list logo