Hi Open MPI developers,
I found a small bug in Open MPI.
See attached program cancelled.c.
In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
instead if the cancellation succeeds. This program should terminate whether
the cancellation succeeds or not. But it leads a deadlock
CK(&ompi_request_lock);
> @@ -138,7 +129,7 @@
> MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
> OPAL_THREAD_UNLOCK(&ompi_request_lock);
> /*
> - * Receive request cancelled, make user buffer accessable.
> + * Receive request cancelled, make user buffe
Hi Open MPI developers,
I found some bugs in Open MPI and attach a patch to fix them.
The bugs are:
(1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
3.7.3 Communication Completion in MPI-3.0 (and also MPI-2.2)
says an MPI_Status object returned by MPI_{Wait|Test}{|any
Hi Eugene,
> > I found some bugs in Open MPI and attach a patch to fix them.
> >
> > The bugs are:
> >
> > (1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
> >
> > (2) MPI_Status for an inactive request must be an empty status.
> >
> > (3) Possible BUS errors on sparc64 proc
Hi Open MPI developers,
> > > The bugs are:
> > >
> > > (1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
> > >
> > > (2) MPI_Status for an inactive request must be an empty status.
> > >
> > > (3) Possible BUS errors on sparc64 processors.
> > >
> > > r23554 fixed possible
Hi Open MPI developers,
How is my updated patch?
If there is an another concern, I'll try to update it.
> > > > The bugs are:
> > > >
> > > > (1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
> > > >
> > > > (2) MPI_Status for an inactive request must be an empty status.
> >
tate ==
> OMPI_REQUEST_INACTIVE". I see similar checks in all the other test/wait
> files. Basically, it doesn't matter that we leave the last returned error
> code on an inactive request, as we always return MPI_STATUS_EMPTY in the
> status for such requests.
>
> Thanks,
&
Hi Open MPI developers,
I found another issue in Open MPI.
In MCA_PML_OB1_RECV_FRAG_INIT macro in ompi/mca/pml/ob1/pml_ob1_recvfrag.h
file, we copy a PML header from an arrived message to another buffer,
as follows:
frag->hdr = *(mca_pml_ob1_hdr_t*)hdr;
On this copy, we cast hdr to mca_pml_
llowing this idea.
>
> Thanks,
> george.
>
>
> On Oct 18, 2012, at 03:06 , "Kawashima, Takahiro"
> wrote:
>
> > Hi Open MPI developers,
> >
> > I found another issue in Open MPI.
> >
> > In MCA_PML_OB1_RECV_FRAG_INIT macro in o
Hi,
Sorry for not replying sooner.
I'm taliking with the authors (they are not in this list) and
will request linking the PDF soon if they allowed.
Takahiro Kawashima,
MPI development team,
Fujitsu
> Our policy so far was that adding a paper to the list of publication on the
> Open MPI website
Hi Open MPI core members and Rayson,
I've confirmed to the authors and created the bibtex reference.
Could you make a page in the "Open MPI Publications" page that
links to Fujitsu's PDF file? The attached file contains information
of title, authors, abstract, link URL, and bibtex reference.
Best
. If you like it, take in
this patch.
Though I'm a employee of a company, this is my independent and private
work at my home. No intellectual property from my company. If needed,
I'll sign to Individual Contributor License Agreement.
Regards,
KAWASHIMA Takahiro
delete-attr-order.patch.gz
Description: Binary data
George,
Your idea makes sense.
Is anyone working on it? If not, I'll try.
Regards,
KAWASHIMA Takahiro
> Takahiro,
>
> Thanks for the patch. I deplore the lost of the hash table in the attribute
> management, as the potential of transforming all attributes operation to a
&g
Jeff,
OK. I'll try implementing George's idea and then you can compare which
one is simpler.
Regards,
KAWASHIMA Takahiro
> Not that I'm aware of; that would be great.
>
> Unlike George, however, I'm not concerned about converting to linear
> operations for att
Hi,
Fujitsu is interested in completing MPI-2.2 on Open MPI and Open MPI
-based Fujitsu MPI.
We've read wiki and tickets. These two tickets seem to be almost done
but need testing and bug fixing.
https://svn.open-mpi.org/trac/ompi/ticket/2223
MPI-2.2: MPI_Dist_graph_* functions missing
ht
I've confirmed. Thanks.
Takahiro Kawashima,
MPI development team,
Fujitsu
> Done -- thank you!
>
> On Jan 11, 2013, at 3:52 AM, "Kawashima, Takahiro"
> wrote:
>
> > Hi Open MPI core members and Rayson,
> >
> > I've confirmed to the au
>
> >>> On Jan 18, 2013, at 5:47 AM, George Bosilca
> >>> wrote:
> >>>
> >>>> Takahiro,
> >>>>
> >>>> The MPI_Dist_graph effort is happening in
> >>>> ssh://h...@bitbucket.org/bosilca/ompi-topo. I would definitely be
George,
I reported the bug three months ago.
Your commit r27880 resolved one of the bugs reported by me,
in another approach.
http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
But other bugs are still open.
"(1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
cket #3123,
and other 7 latest changesets are for bug/typo-fixes.
Regards,
KAWASHIMA Takahiro
> Jeff,
>
> OK. I'll try implementing George's idea and then you can compare which
> one is simpler.
>
> Regards,
> KAWASHIMA Takahiro
>
> > Not that I'
n the wilderness of other bugs.
>
> Sent from my phone. No type good.
>
> On Jan 22, 2013, at 8:57 PM, "Kawashima, Takahiro"
> wrote:
>
> > George,
> >
> > I reported the bug three months ago.
> > Your commit r27880 resolved one of the bugs re
Hi,
Agreed.
But how about backward traversal in addition to forward traversal?
e.g. OPAL_LIST_FOREACH_FW, OPAL_LIST_FOREACH_FW_SAFE,
OPAL_LIST_FOREACH_BW, OPAL_LIST_FOREACH_BW_SAFE
We sometimes search an item from the end of a list.
Thanks,
KAWASHIMA Takahiro
> What: Add two new macros
I don't care the macro names. Either one is OK for me.
Thanks,
KAWASHIMA Takahiro
> Hmm, maybe something like:
>
> OPAL_LIST_FOREACH, OPAL_LISTFOREACH_REV, OPAL_LIST_FOREACH_SAFE,
> OPAL_LIST_FOREACH_REV_SAFE?
>
> -Nathan
>
> On Thu, Jan 31, 2013 at 12:36:2
eature and another for bug fixes, as described in
my previous mail.
Regards,
KAWASHIMA Takahiro
> Jeff, George,
>
> I've implemented George's idea for ticket #3123 "MPI-2.2: Ordering of
> attribution deletion callbacks on MPI_COMM_SELF". See attached
> delet
ed a new patch. Basically I went over all
> the possible cases, both in test and wait, and ensure the behavior is always
> consistent. Please give it a try, and let us know of the outcome.
>
> Thanks,
> George.
>
>
>
> On Jan 25, 2013, at 00:53 , "Kawashima, Takahir
redefined_elem_desc
array? But having same 'type' value in OPAL datatypes and OMPI
datatypes is allowed?
Regards,
KAWASHIMA Takahiro
as the
> Fortran one. This turned the management of the common desc_t into a nightmare
> … with the effect you noticed few days ago. Too bad for the optimization
> part. I now duplicate the desc_t between the two layers, and all OMPI
> datatypes have now their own desc_t.
553). Thanks for your help and your patience in
> investigating this issues.
>
> George.
>
>
> On May 22, 2013, at 02:05 , "Kawashima, Takahiro"
> wrote:
>
> > George,
> >
> > Thanks for your quick response.
> > Your fix seem
George,
My colleague was working on your ompi-topo bitbucket repository
but it was not completed. But he found bugs in your patch attached
in your previous mail and created the fixing patch. See the attached
patch, which is a patch against Open MPI trunk + your patch.
His test programs are also a
types
and the calculation of total_pack_size is also involved. It seems not
so simple.
Regards,
KAWASHIMA Takahiro
#include
#include
#include
#define PRINT_ARGS
#ifdef PRINT_ARGS
/* defined in ompi/datatype/ompi_datatype_args.c */
extern int32_t ompi_datatype_print_args(const struct ompi_dat
t;total_pack_size = 0;
break;
case MPI_COMBINER_CONTIGUOUS:
This patch in addition to your patch works correctly for my program.
But I'm not sure this is a correct solution.
Regards,
KAWASHIMA Takahiro
> Takahiro,
>
> Nice catch. That particular code was an over-opt
No. My patch doesn't work for a more simple case,
just a duplicate of MPI_INT.
Datatype is too complex for me ...
Regards,
KAWASHIMA Takahiro
> George,
>
> Thanks. But no, your patch does not work correctly.
>
> The assertion failure disappeared by your patch but the v
George,
A improved patch is attached. Latter half is same as your patch.
But again, I'm not sure this is a correct solution.
It works correctly for my attached put_dup_type_3.c.
Run as "mpiexec -n 1 ./put_dup_type_3".
It will print seven OKs if succeeded.
Regards,
KAWASHIMA Tak
George,
Thanks. I've confirmed your patch.
I wrote a simple program to test your patch and no problems are found.
The test program is attached to this mail.
Regards,
KAWASHIMA Takahiro
> Takahiro,
>
> Please find below another patch, this time hopefully fixing all issues. The
Hi,
I and my colleague found 3 OSC-related bugs in OMPI datatype code.
One for trunk and v1.6/v1.7 branches, and two for only v1.6 branch.
(1) OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy
Last year I reported a bug in OMPI datatype code and it was
fixed in r25721. But the fix was no
Hi,
My colleague tested MPI_IN_PLACE for MPI_ALLTOALL, MPI_ALLTOALLV,
and MPI_ALLTOALLW, which was implemented two months ago in Open MPI
trunk. And he found three bugs and created a patch.
Found bugs are:
(A) Missing MPI_IN_PLACE support in self COLL component
The attached alltoall-self-in
sted for so
> long before being discovered (especially the extent issue in the
> MPI_ALLTOALL). Please feel free to apply your patch and add the correct
> copyright at the beginning of all altered files.
>
> Thanks,
> George.
>
>
>
> On Sep 17, 2013, at 0
Thanks!
Takahiro Kawashima,
MPI development team,
Fujitsu
> Pushed in r29187.
>
> George.
>
>
> On Sep 17, 2013, at 12:03 , "Kawashima, Takahiro"
> wrote:
>
> > George,
> >
> > Copyright-added patch is attached.
> > I don't
It is a bug in the test program, test/datatype/ddt_raw.c, and it was
fixed at r24328 in trunk.
https://svn.open-mpi.org/trac/ompi/changeset/24328
I've confirmed the failure occurs with plain v1.6.5 and it doesn't
occur with patched v1.6.5.
Thanks,
KAWASHIMA Takahiro
> Not su
Hi,
Open MPI's signal handler (show_stackframe function defined in
opal/util/stacktrace.c) calls non-async-signal-safe functions
and it causes a problem.
See attached mpisigabrt.c. Passing corrupted memory to realloc(3)
will cause SIGABRT and show_stackframe function will be invoked.
But invoked
Hi,
The attached patch corrects trivial typos in man files and
FUNC_NAME variables in ompi/mpi/c/*.c files.
One note which may not be trivial:
Before MPI-2.1, MPI standard says MPI_PACKED should be used for
MPI_{Pack,Unpack}_external. But in MPI-2.1, it was changed to
use MPI_BYTE. See 'B.3 Chang
ned to kernel, and abnormal values are printed
if not yet.
So this SEGV doesn't occur if I configure Open MPI with
--disable-dlopen option. I think it's the reason why Nathan
doesn't see this error.
Regards,
KAWASHIMA Takahiro
> I just checked MPICH 3.2, and they *do* include MPI_SIZEOF interfaces for
> CHARACTER and LOGICAL, but they are missing many of the other MPI_SIZEOF
> interfaces that we have in OMPI. Meaning: OMPI and MPICH already diverge
> wildly on MPI_SIZEOF. :-\
And OMPI 1.6 also had MPI_SIZEOF interf
ram” for OpenMPI code base that shows
> existing classes and dependencies/associations. Are there any available tools
> to extract and visualize this information.
Thanks,
KAWASHIMA Takahiro
Gilles, Jeff,
In Open MPI 1.6 days, MPI_ARGVS_NULL and MPI_STATUSES_IGNORE
were defined as double precision and MPI_Comm_spawn_multiple
and MPI_Waitall etc. interfaces had two subroutines each.
https://github.com/open-mpi/ompi-release/blob/v1.6/ompi/include/mpif-common.h#L148
https://github
Hi,
I created a pull request to add the persistent collective
communication request feature to Open MPI. Though it's
incomplete and will not be merged into Open MPI soon,
you can play your collective algorithms based on my work.
https://github.com/open-mpi/ompi/pull/2758
Takahiro Kawashima,
MP
Hi,
I encountered a similar problem using MPI_COMM_SPAWN last month.
Your problem my be same.
The problem was fixed by commit 0951a34 in Open MPI master and
backported to v2.1.x v2.0.x but not backported to v1.8.x and
v1.10.x.
https://github.com/open-mpi/ompi/commit/0951a34
Please try the att
s not working now, nor in
> 2.0.x. I've checked the master, and it also does not work there. Is
> there any time line for this?
>
> Thanks a lot!
>
> Marcin
>
>
>
> On 04/04/2017 11:03 AM, Kawashima, Takahiro wrote:
> > Hi,
> >
> > I enc
It might be related to https://github.com/open-mpi/ompi/issues/3697 .
I added a comment to the issue.
Takahiro Kawashima,
Fujitsu
> On a PPC64LE w/ gcc-7.1.0 I see opal_fifo hang instead of failing.
>
> -Paul
>
> On Mon, Jul 3, 2017 at 4:39 PM, Paul Hargrove wrote:
>
> > On a PPC64 host with
Paul,
Did you upgrade glibc or something? I suspect newer glibc
supports process_vm_readv and process_vm_writev and output
of configure script changed. My Linux/SPARC64 with old glibc
can compile Open MPI 2.1.2rc2 (CMA is disabled).
To fix this, we need to cherry-pick d984b4b. Could you test the
ebian/Sid system w/ glibc-2.24.
>
> The patch you pointed me at does appear to fix the problem!
> I will note this in your PRs.
>
> -Paul
>
> On Mon, Aug 21, 2017 at 9:17 PM, Kawashima, Takahiro <
> t-kawash...@jp.fujitsu.com> wrote:
>
> > Paul,
> >
> &g
Samuel,
I am a developer of Fujitsu MPI. Thanks for using the K computer.
For official support, please consult with the helpdesk of K,
as Gilles said. The helpdesk may have information based on past
inquiries. If not, the inquiry will be forwarded to our team.
As other people said, Fujitsu MPI us
> > As other people said, Fujitsu MPI used in K is based on old
> > Open MPI (v1.6.3 with bug fixes).
>
> I guess the obvious question is will the vanilla Open-MPI work on K?
Unfortunately no. Support of Tofu and Fujitsu resource manager
are not included in Open MPI.
Takahiro Kawashima,
MPI dev
with_wrapper_cxxflags=-g
with_wrapper_fflags=-g
with_wrapper_fcflags=-g
Regards,
KAWASHIMA Takahiro
> The problem is the code in question does not check the return code of
> MPI_T_cvar_handle_alloc . We are returning an error and they still try
> to use the handle (which is stale). Uncom
flag.
Regards,
KAWASHIMA Takahiro
> This is odd. The variable in question is registered by the MCA itself. I
> will take a look and see if I can determine why it isn't being
> deregistered correctly when the rest of the component's parameters are.
>
> -Nathan
>
> On W
George,
I compiled trunk with your patch for SPARCV9/Linux/GCC.
I see following warning/errors.
In file included from opal/include/opal/sys/atomic.h:175,
from opal/asm/asm.c:21:
opal/include/opal/sys/sparcv9/atomic.
Hi,
> > >>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
> > >>> 10 Sparc and I receive a bus error, if I run a small program.
I've finally reproduced the bus error in my SPARC environment.
#0 0x00db4740 (__waitpid_nocancel + 0x44)
(0x200,0x0,0x0,0xa0,0xf80100064af0,0
extremely vague recollection about a similar issue in the
> datatype engine: on the SPARC architecture the 64 bits integers must be
> aligned on a 64bits boundary or you get a bus error.
>
> Takahiro you can confirm this by printing the value of data when signal is
> raised.
>
&
; as a workaround, you can declare an opal_process_name_t (for alignment),
> and cast it to an orte_process_name_t
>
> i will write a patch (i will not be able to test on sparc ...)
> please note this issue might be present in other places
>
> Cheers,
>
> Gilles
>
&
upported by all compilers */
>
> as far as i am concerned, the same issue is also in the trunk,
> and if you do not hit it, it just means you are lucky :-)
>
> the same issue might also be in other parts of the code :-(
>
> Cheers,
>
> Gilles
>
> On 2014/08/08 13:4
Siegmar, Ralph,
I'm sorry to response so late since last week.
Ralph fixed the problem in r32459 and it was merged to v1.8
in r32474. But in v1.8 an additional custom patch is needed
because the db/dstore source codes are different between trunk
and v1.8.
I'm preparing and testing the custom pat
Hi Ralph,
Your commit r32459 fixed the bus error by correcting
opal/dss/dss_copy.c. It's OK for trunk because mca_dstore_hash
calls dss to copy data. But it's insufficient for v1.8 because
mca_db_hash doesn't call dss and copies data itself.
The attached patch is the minimum patch to fix it in v1
Hi Siegmar, Ralph,
I forgot to follow the previous report, sorry.
The patch I suggested is not included in Open MPI 1.8.2.
The backtrace Siegmar reported points the problem that I fixed
in the patch.
http://www.open-mpi.org/community/lists/users/2014/08/24968.php
Siegmar:
Could you try my patc
just FYI:
configure && make && make install && make test
succeeded on my SPARC64/Linux/GCC (both enable-debug=yes and no).
Takahiro Kawashima,
MPI development team,
Fujitsu
> Usual place:
>
> http://www.open-mpi.org/software/ompi/v1.8/
>
> Please beat it up as we want to release on Fri, barring
Hi George,
Thank you for attending the meeting at Kyoto. As we talked
at the meeting, my colleague suffers from a datatype problem.
See attached create_resized.c. It creates a datatype with an
LB marker using MPI_Type_create_struct and MPI_Type_create_resized.
Expected contents of the output fil
Hi,
The attached program intercommunicator-iallgather.c outputs
message "MPI Error in MPI_Testall() (18)" forever and doesn't
finish. This is because libnbc has typos of send/recv.
See attached intercommunicator-iallgather.patch for the fix.
The patch modifies iallgather_inter and iallgather_intr
Thanks!
> Takahiro,
>
> Sorry for the delay in answering. Thanks for the bug report and the patch.
> I applied you patch, and added some tougher tests to make sure we catch
> similar issues in the future.
>
> Thanks,
> George.
>
>
> On Mon, Sep 29, 2014 at 8
Yes, Fujitsu MPI is running on sparcv9-compatible CPU.
Though we currently use only stable-series (v1.6, v1.8),
they work fine.
Takahiro Kawashima,
MPI development team,
Fujitsu
> Nathan,
>
> Fujitsu MPI is openmpi based and is running on their sparcv9 like proc.
>
> Cheers,
>
> Gilles
>
> On
Hi Gilles, Nathan,
I read the MPI standard but I think the standard doesn't
require a barrier in the test program.
>From the standards (11.5.1 Fence) :
A fence call usually entails a barrier synchronization:
a process completes a call to MPI_WIN_FENCE only after all
other processes in th
nually adding a MPI_Barrier.
>
> Cheers,
>
> Gilles
>
> On 4/21/2015 10:20 AM, Kawashima, Takahiro wrote:
> > Hi Gilles, Nathan,
> >
> > I read the MPI standard but I think the standard doesn't
> > require a barrier in the test program.
> >
e epochs.
>
>
> and the test case calls MPI_Win_fence with MPI_MODE_NOPRECEDE.
>
> are you saying Open MPI implementation of MPI_Win_fence should perform
> a barrier in this case (e.g. MPI_MODE_NOPRECEDE) ?
>
> Cheers,
>
> Gilles
>
> On 4/21/2015 11:08 AM
formation that may be useful for users and developers.
Not so verbose. Output only on initialization or
object creation etc.
DEBUG:
Information that is useful only for developers.
Not so verbose. Output once per MPI routine call.
TRACE:
Information that is useful only for developers.
V
Hi folks,
`configure && make && make install && make test` and
running some sample MPI programs succeeded with 1.10.0rc1
on my SPARC-V9/Linux/GCC machine (Fujitsu PRIMEHPC FX10).
Takahiro Kawashima,
MPI development team,
Fujitsu
> Hi folks
>
> Now that 1.8.7 is out the door, we need to switch o
Oh, I also noticed it yesterday and was about to report it.
And one more, the base parameter of MPI_Win_detach.
Regards,
Takahiro Kawashima
> Dear OpenMPI developers,
>
> I noticed a bug in the definition of the 3 MPI-3 RMA functions
> MPI_Compare_and_swap, MPI_Fetch_and_op and MPI_Raccumulate.
n the source code to explain this.
>
> On Thursday, August 27, 2015, Kawashima, Takahiro <
> t-kawash...@jp.fujitsu.com> wrote:
>
> > Oh, I also noticed it yesterday and was about to report it.
> >
> > And one more, the base parameter of MPI_Win_detach.
> &
Brice,
I'm a developer of Fujitsu MPI for K computer and Fujitsu
PRIMEHPC FX10/FX100 (SPARC-based CPU).
Though I'm not familiar with the hwloc code and didn't know
the issue reported by Gilles, I also would be able to help
you to fix the issue.
Takahiro Kawashima,
MPI development team,
Fujitsu
`configure && make && make install && make check` and
running some sample MPI programs succeeded with 1.10.1rc3
on my SPARC-V9/Linux/GCC machine (Fujitsu PRIMEHPC FX10).
No @SET_MAKE@ appears in any Makefiles, of course.
> > For the first time I was also able to (attempt to) test SPARC64 via QEMU
Nathan,
Is is sufficient?
Multiple windows can be created on a communicator.
So I think PID + CID is not sufficient.
Possible fixes:
- The root process creates a filename with a random number
and broadcast it in the communicator.
- Use per-communicator counter and use it in the filename.
Regar
ble check about using PID. if a broadcast is needed, i would
> rather use the process name of rank 0 in order to avoid a broadcast.
>
> Cheers,
>
> Gilles
>
> On 2/3/2016 8:40 AM, Kawashima, Takahiro wrote:
> > Nathan,
> >
> > Is is sufficient?
> > Mul
78 matches
Mail list logo