Hi Josh.
Thanks for the reply, i've fixed the stuff with the passwd. But i'm still
getting the segmentation fault. I'm sending you the output. I think that is
almost the same output that i sent you yesterday.
Best Regards.
Hugo Meyer
2011/1/31 Joshua Hursey
> That helped. There was a missing
That helped. There was a missing check in the automatic recovery logic that
prevents it from starting up while the migration is going on. r24326 should fix
this bug. The segfault should have just been residual fallout from this bug.
Can you try the current trunk to confirm?
One other thing I no
Hi Josh.
As you say, the first problem was because of the name of the node. But the
second problem persist (the segmentation fault). As you ask, i'm sending you
the output of execute with the mca params that you pass me. At the end of
the file i put the output of the second terminal.
Best Regards
So I was not able to reproduce this issue.
A couple notes:
- You can see the node-to-process-rank mapping using the '-display-map'
command line option to mpirun. This will give you the node names that Open MPI
is using, and how it intends to layout the processes. You can use the
'-display-allo
On Jan 31, 2011, at 6:47 AM, Hugo Meyer wrote:
> Hi Joshua.
>
> I've tried the migration again, and i get the next (running process where
> mpirun is running):
>
> Terminal 1:
>
> [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun
> -np 2 -am ft-enable-cr-recovery -
Hi Joshua.
I've tried the migration again, and i get the next (running process where
mpirun is running):
Terminal 1:
*[hmeyer@clus9 whoami]$
/home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 2 -am
ft-enable-cr-recovery --mca orte_base_help_aggregate 0 ./whoami 10 10*
*Antes de MPI_Init*
Thanks to you Joshua.
I will try the procedure with this modifications and i will let you know how
it goes.
Best Regards.
Hugo Meyer
2011/1/27 Joshua Hursey
> I believe that this is now fixed on the trunk. All the details are in the
> commit message:
> https://svn.open-mpi.org/trac/ompi/chan
I believe that this is now fixed on the trunk. All the details are in the
commit message:
https://svn.open-mpi.org/trac/ompi/changeset/24317
In my testing yesterday, I did not test the scenario where the node with mpirun
also contains processes (the test cluster I was using does not by default
Hi Josh.
Thanks for your reply. I'll tell you what i'm getting now from the
executions in the next lines.
When i run without doing a checkpoint i get this output, and the process
don' finish:
*[hmeyer@clus9 whoami]$
/home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 2 -am
ft-enable-cr-reco
I found a few more bugs after testing the C/R functionality this morning. I
just committed some more C/R fixes in r24306 (things are now working correctly
on my test cluster).
https://svn.open-mpi.org/trac/ompi/changeset/24306
One thing I just noticed in your original email was that you are sp
Josh.
The ompi-checkpoint with his restart now are working great, but the same
error persist with ompi-migrate. I've also tried using "-r", but i get the
same error.
Best regards.
Hugo Meyer
2011/1/26 Hugo Meyer
> Thanks Josh.
>
> I've already check te prelink and is set to "no".
>
> I'm goin
Thanks Josh.
I've already check te prelink and is set to "no".
I'm going to try with the trunk head, and then i'll let you know how it
goes.
Best regards.
Hugo Meyer
2011/1/25 Joshua Hursey
> Can you try with the current trunk head (r24296)?
> I just committed a fix for the C/R functionality
Can you try with the current trunk head (r24296)?
I just committed a fix for the C/R functionality in which restarts were getting
stuck. This will likely affect the migration functionality, but I have not had
an opportunity to test just yet.
Another thing to check is that prelink is turned off o
Hello @ll
I've got a problem when i try to use the ompi-migrate command.
What i'm doing is execute for example the next application in one node of a
cluster (both process wil run on the same node):
*mpirun -np 2 -am ft-enable-cr ./whoami 10 10*
Then in the same node i try to migrate the process
14 matches
Mail list logo