Looks like you failed to build the shared memory component. The system isn't
seeing a comm path between procs on the same node.
Sent from my iPad
On Apr 2, 2012, at 7:47 AM, Alex Margolin wrote:
> I found the problem(s) - It was more then just type redefinition, but I fixed
> it too. I also a
I found the problem(s) - It was more then just type redefinition, but I
fixed it too. I also added some code for btl/base to prevent/detect a
similar problem in the future. A newer version of my MOSIX patch (odls +
btl + fix) is attached. The BTL, still doesn't work, though, and when I
try to u
I suspect the problem is here:
/**
+ * MOSIX BTL component.
+ */
+struct mca_btl_base_component_t {
+mca_btl_base_component_2_0_0_t super; /**< base BTL component */
+mca_btl_mosix_module_t mosix_module; /**< local module */
+};
+typedef struct mca_btl_base_component_t mca_btl_mosix_com
I traced the problem to the BML component:
Index: ompi/mca/bml/r2/bml_r2.c
===
--- ompi/mca/bml/r2/bml_r2.c(revision 26191)
+++ ompi/mca/bml/r2/bml_r2.c(working copy)
@@ -105,6 +105,8 @@
}
}
if (
I've added some documentation and made a few other changes in the hope
of making the code more readable (the attached diff replaces the
previous one), though the BTL is still giving me that error. There are
some TODOs in the code where I was unsure about the code (it should
still work - I'm not
MOSIX works as a sandbox, wrapping the executed process. Suppose I run
with "-n 3": three processes will be launched via MOSIX on nodes A, B
and C. MOSIX can choose to "migrate" process #2 from B to D - this will
not restart the process, nor will the process know about it's current
location unl
I can't speak to the BTL itself, but I do have questions as to how this can
work. If MOSIX migrates a process, or starts new processes on another node
during the course of a job, there is no way for MPI to handle the wireup and so
it will fail. We need ALL the procs started at the beginning of t