On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote:
>> What about setsid and pushing it in a new
>> seesion instead of using& in the script?
>
> :-) That works. Thanks!
>
> NB, the working script looks like:
>
> setsid bash -c "mpirun command>& out"&
> tail -f out
>
Yes - but now you can't kill mpirun when something goes wrong....<shrug>
> Thanks,
> Pablo
>
>
> On 23/04/11 18:39, Reuti wrote:
>> Am 23.04.2011 um 19:33 schrieb Ralph Castain:
>>
>>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
>>>
>>>>> I'm not sure what you are actually trying to accomplish
>>>> I simply want a script that runs the equivalent of:
>>>>
>>>> mpirun command>& out&
>>>> tail -f out
>>>>
>>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can
>>>> certainly do this without mpirun,
>>> I don't think that's true. If both commands are in a script, then at least
>>> for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both-
>>> processes.
>> What about setsid and pushing it in a new seesion instead of using& in the
>> script?
>>
>> -- Reuti
>>
>>
>>> At least when I test it, even non-mpirun processes will abort.
>>>
>>>> it's not unreasonable to expect to be able to do the same with mpirun.
>>> I'm afraid it won't work, per my earlier comments.
>>>
>>>> I need mpirun to either ignore the SIGINT or not receive it at all -- and
>>>> as per your comments, ignoring it is not an option.
>>>>
>>>> Let me rephrase my question then. With the following script:
>>>>
>>>> mpirun command>& out&
>>>> tail -f out
>>>>
>>>> SIGINT stops tail AND mpirun. That's OK. The following:
>>>>
>>>> (
>>>> trap : SIGINT
>>>> mpirun command>& out&
>>>> )
>>>> tail -f out
>>>>
>>>> has the same effect, idicating that mpirun overrides previous traps in the
>>>> same subshell. That's OK too. However the following:
>>>>
>>>> (
>>>> trap : SIGINT
>>>> (
>>>> mpirun command>& out&
>>>> )
>>>> )
>>>> tail -f out
>>>>
>>>> also has the same effect. How is mpirun overriding the trap in the
>>>> *parent* subshell so that it ends up getting the SIGINT that was
>>>> supposedly blocked at that level? Am I missing something trivial? How can
>>>> I avoid this?
>>> I keep telling you - you can't. The better way to do this is to execute
>>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail
>>> without mpirun seeing it.
>>>
>>> But you are welcome to not believe me and continue thrashing... :-/
>>>
>>>> Thanks,
>>>> Pablo
>>>>
>>>>
>>>> On 23/04/11 16:27, Ralph Castain wrote:
>>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>>>>
>>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>>>> should continue.
>>>>>>> I don't think that is true at all. When you hit ctrl-C,
>>>>>>> every process executing in the script receives it. Mpirun
>>>>>>> traps the ctrl-c and immediately terminates all running
>>>>>>> MPI procs.
>>>>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>>>>> desired behaviour of the script, not that this is what ought
>>>>>> to happen in general. My question is how to achieve this
>>>>>> behaviour, since I'm having trouble working around mpirun
>>>>>> catching sigint.
>>>>> Like I said in my other response, you can't - mpirun automatically traps
>>>>> sigint and terminates the job in order to ensure proper cleanup during
>>>>> abnormal terminations.
>>>>>
>>>>> I'm not sure what you are actually trying to accomplish, but there are
>>>>> other signals that don't cause termination. For example, we trap and
>>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
>>>>>
>>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to
>>>>> ignore it.
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Pablo
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>>>>
>>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The
>>>>>>>>> script needs to run an MPI job in the background and tail -f the
>>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should
>>>>>>>>> continue.
>>>>>>> I don't think that is true at all. When you hit ctrl-C, every process
>>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and
>>>>>>> immediately terminates all running MPI procs.
>>>>>>>
>>>>>>>
>>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail,
>>>>>>>>> and kills the job immediately. I've tried workarounds involving
>>>>>>>>> nohup, disown, trap, subshells (including calling the script from
>>>>>>>>> within itself), etc, to no avail.
>>>>>>>>>
>>>>>>>>> The problem is that this doesn't happen if I run the command directly
>>>>>>>>> instead, without mpirun. Attached is a script that reproduces the
>>>>>>>>> problem. It runs a simple counting script in the background which
>>>>>>>>> takes 10 seconds to run, and tails the output. If called with "nompi"
>>>>>>>>> as first argument, it will simply run bash -c "$SCRIPT">& "$out"&,
>>>>>>>>> and with "mpi" it will do the same with 'mpirun -np 1' prepended. The
>>>>>>>>> output I get is:
>>>>>>>> what about:
>>>>>>>>
>>>>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>>>>
>>>>>>>> i.e. replace the subshell with changed interrupt handling with the
>>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This
>>>>>>>> can be checked in /proc/<pid>/status
>>>>>>>>
>>>>>>>> -- Reuti
>>>>>>>>
>>>>>>>>> $ ./ompi_bug.sh mpi
>>>>>>>>> mpi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> ^C
>>>>>>>>> $ ./ompi_bug.sh nompi
>>>>>>>>> nompi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> ^C
>>>>>>>>> $ cat output.*
>>>>>>>>> mpi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> mpirun: killing job...
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme
>>>>>>>>> exited on signal 0 (Unknown signal 0).
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>
>>>>>>>>> nompi:
>>>>>>>>> 1
>>>>>>>>> 2
>>>>>>>>> 3
>>>>>>>>> 4
>>>>>>>>> 5
>>>>>>>>> 6
>>>>>>>>> 7
>>>>>>>>> 8
>>>>>>>>> 9
>>>>>>>>> 10
>>>>>>>>> Done
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This convinces me that there is something strange with OpenMPI, since
>>>>>>>>> I expect no difference in signal handling when running a simple
>>>>>>>>> command with or without mpirun in the middle.
>>>>>>>>>
>>>>>>>>> I've tried looking for options to change this behaviour, but I don't
>>>>>>>>> seem to find any. Is there one, preferably in the form of an
>>>>>>>>> environment variable? Or is this a bug?
>>>>>>>>>
>>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also
>>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Pablo
>>>>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> [email protected]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users