On Apr 23, 2011, at 12:07 PM, Reuti wrote:
> Am 23.04.2011 um 19:58 schrieb Ralph Castain:
>
>>
>> On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote:
>>
>>>> What about setsid and pushing it in a new
>>>> seesion instead of using& in the script?
>>>
>>> :-) That works. Thanks!
>>>
>>> NB, the working script looks like:
>>>
>>> setsid bash -c "mpirun command>& out"&
>>> tail -f out
>>>
>>
>> Yes - but now you can't kill mpirun when something goes wrong....<shrug>
>
> You can still send a sigint from the command line to the mpirun process or
> its process group besides killall.
Yes - or I could just have run tail in a separate shell and avoided the entire
email thread and problem... :-)
Whatever...so long as peace returns.
>
> -- Reuti
>
>
>>> Thanks,
>>> Pablo
>>>
>>>
>>> On 23/04/11 18:39, Reuti wrote:
>>>> Am 23.04.2011 um 19:33 schrieb Ralph Castain:
>>>>
>>>>> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
>>>>>
>>>>>>> I'm not sure what you are actually trying to accomplish
>>>>>> I simply want a script that runs the equivalent of:
>>>>>>
>>>>>> mpirun command>& out&
>>>>>> tail -f out
>>>>>>
>>>>>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can
>>>>>> certainly do this without mpirun,
>>>>> I don't think that's true. If both commands are in a script, then at
>>>>> least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to
>>>>> -both- processes.
>>>> What about setsid and pushing it in a new seesion instead of using& in
>>>> the script?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> At least when I test it, even non-mpirun processes will abort.
>>>>>
>>>>>> it's not unreasonable to expect to be able to do the same with mpirun.
>>>>> I'm afraid it won't work, per my earlier comments.
>>>>>
>>>>>> I need mpirun to either ignore the SIGINT or not receive it at all --
>>>>>> and as per your comments, ignoring it is not an option.
>>>>>>
>>>>>> Let me rephrase my question then. With the following script:
>>>>>>
>>>>>> mpirun command>& out&
>>>>>> tail -f out
>>>>>>
>>>>>> SIGINT stops tail AND mpirun. That's OK. The following:
>>>>>>
>>>>>> (
>>>>>> trap : SIGINT
>>>>>> mpirun command>& out&
>>>>>> )
>>>>>> tail -f out
>>>>>>
>>>>>> has the same effect, idicating that mpirun overrides previous traps in
>>>>>> the same subshell. That's OK too. However the following:
>>>>>>
>>>>>> (
>>>>>> trap : SIGINT
>>>>>> (
>>>>>> mpirun command>& out&
>>>>>> )
>>>>>> )
>>>>>> tail -f out
>>>>>>
>>>>>> also has the same effect. How is mpirun overriding the trap in the
>>>>>> *parent* subshell so that it ends up getting the SIGINT that was
>>>>>> supposedly blocked at that level? Am I missing something trivial? How
>>>>>> can I avoid this?
>>>>> I keep telling you - you can't. The better way to do this is to execute
>>>>> mpirun, and then run tail in a -separate- command. Now you can ctrl-c
>>>>> tail without mpirun seeing it.
>>>>>
>>>>> But you are welcome to not believe me and continue thrashing... :-/
>>>>>
>>>>>> Thanks,
>>>>>> Pablo
>>>>>>
>>>>>>
>>>>>> On 23/04/11 16:27, Ralph Castain wrote:
>>>>>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>>>>>>
>>>>>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>>>>>> should continue.
>>>>>>>>> I don't think that is true at all. When you hit ctrl-C,
>>>>>>>>> every process executing in the script receives it. Mpirun
>>>>>>>>> traps the ctrl-c and immediately terminates all running
>>>>>>>>> MPI procs.
>>>>>>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>>>>>>> desired behaviour of the script, not that this is what ought
>>>>>>>> to happen in general. My question is how to achieve this
>>>>>>>> behaviour, since I'm having trouble working around mpirun
>>>>>>>> catching sigint.
>>>>>>> Like I said in my other response, you can't - mpirun automatically
>>>>>>> traps sigint and terminates the job in order to ensure proper cleanup
>>>>>>> during abnormal terminations.
>>>>>>>
>>>>>>> I'm not sure what you are actually trying to accomplish, but there are
>>>>>>> other signals that don't cause termination. For example, we trap and
>>>>>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of
>>>>>>> use.
>>>>>>>
>>>>>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to
>>>>>>> ignore it.
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Pablo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>>>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>>>>>>
>>>>>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The
>>>>>>>>>>> script needs to run an MPI job in the background and tail -f the
>>>>>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should
>>>>>>>>>>> continue.
>>>>>>>>> I don't think that is true at all. When you hit ctrl-C, every process
>>>>>>>>> executing in the script receives it. Mpirun traps the ctrl-c and
>>>>>>>>> immediately terminates all running MPI procs.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail,
>>>>>>>>>>> and kills the job immediately. I've tried workarounds involving
>>>>>>>>>>> nohup, disown, trap, subshells (including calling the script from
>>>>>>>>>>> within itself), etc, to no avail.
>>>>>>>>>>>
>>>>>>>>>>> The problem is that this doesn't happen if I run the command
>>>>>>>>>>> directly instead, without mpirun. Attached is a script that
>>>>>>>>>>> reproduces the problem. It runs a simple counting script in the
>>>>>>>>>>> background which takes 10 seconds to run, and tails the output. If
>>>>>>>>>>> called with "nompi" as first argument, it will simply run bash -c
>>>>>>>>>>> "$SCRIPT">& "$out"&, and with "mpi" it will do the same with
>>>>>>>>>>> 'mpirun -np 1' prepended. The output I get is:
>>>>>>>>>> what about:
>>>>>>>>>>
>>>>>>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>>>>>>
>>>>>>>>>> i.e. replace the subshell with changed interrupt handling with the
>>>>>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This
>>>>>>>>>> can be checked in /proc/<pid>/status
>>>>>>>>>>
>>>>>>>>>> -- Reuti
>>>>>>>>>>
>>>>>>>>>>> $ ./ompi_bug.sh mpi
>>>>>>>>>>> mpi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> ^C
>>>>>>>>>>> $ ./ompi_bug.sh nompi
>>>>>>>>>>> nompi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> ^C
>>>>>>>>>>> $ cat output.*
>>>>>>>>>>> mpi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> mpirun: killing job...
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme
>>>>>>>>>>> exited on signal 0 (Unknown signal 0).
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>>>
>>>>>>>>>>> nompi:
>>>>>>>>>>> 1
>>>>>>>>>>> 2
>>>>>>>>>>> 3
>>>>>>>>>>> 4
>>>>>>>>>>> 5
>>>>>>>>>>> 6
>>>>>>>>>>> 7
>>>>>>>>>>> 8
>>>>>>>>>>> 9
>>>>>>>>>>> 10
>>>>>>>>>>> Done
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This convinces me that there is something strange with OpenMPI,
>>>>>>>>>>> since I expect no difference in signal handling when running a
>>>>>>>>>>> simple command with or without mpirun in the middle.
>>>>>>>>>>>
>>>>>>>>>>> I've tried looking for options to change this behaviour, but I
>>>>>>>>>>> don't seem to find any. Is there one, preferably in the form of an
>>>>>>>>>>> environment variable? Or is this a bug?
>>>>>>>>>>>
>>>>>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also
>>>>>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Pablo
>>>>>>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> [email protected]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users