On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
>> I'm not sure what you are actually trying to accomplish
>
> I simply want a script that runs the equivalent of:
>
> mpirun command>& out&
> tail -f out
>
> such that hitting Ctrl+C stops tail but leaves mpirun running. I can
> certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for
me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
At least when I test it, even non-mpirun processes will abort.
> it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
> I need mpirun to either ignore the SIGINT or not receive it at all -- and as
> per your comments, ignoring it is not an option.
>
> Let me rephrase my question then. With the following script:
>
> mpirun command>& out&
> tail -f out
>
> SIGINT stops tail AND mpirun. That's OK. The following:
>
> (
> trap : SIGINT
> mpirun command>& out&
> )
> tail -f out
>
> has the same effect, idicating that mpirun overrides previous traps in the
> same subshell. That's OK too. However the following:
>
> (
> trap : SIGINT
> (
> mpirun command>& out&
> )
> )
> tail -f out
>
> also has the same effect. How is mpirun overriding the trap in the *parent*
> subshell so that it ends up getting the SIGINT that was supposedly blocked at
> that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun,
and then run tail in a -separate- command. Now you can ctrl-c tail without
mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
>
> Thanks,
> Pablo
>
>
> On 23/04/11 16:27, Ralph Castain wrote:
>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>
>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>> should continue.
>>>> I don't think that is true at all. When you hit ctrl-C,
>>>> every process executing in the script receives it. Mpirun
>>>> traps the ctrl-c and immediately terminates all running
>>>> MPI procs.
>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>> desired behaviour of the script, not that this is what ought
>>> to happen in general. My question is how to achieve this
>>> behaviour, since I'm having trouble working around mpirun
>>> catching sigint.
>> Like I said in my other response, you can't - mpirun automatically traps
>> sigint and terminates the job in order to ensure proper cleanup during
>> abnormal terminations.
>>
>> I'm not sure what you are actually trying to accomplish, but there are other
>> signals that don't cause termination. For example, we trap and forward
>> SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
>>
>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to
>> ignore it.
>>
>>
>>> Thanks,
>>> Pablo
>>>
>>>
>>>
>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>
>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The
>>>>>> script needs to run an MPI job in the background and tail -f the output.
>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
>>>> I don't think that is true at all. When you hit ctrl-C, every process
>>>> executing in the script receives it. Mpirun traps the ctrl-c and
>>>> immediately terminates all running MPI procs.
>>>>
>>>>
>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, and
>>>>>> kills the job immediately. I've tried workarounds involving nohup,
>>>>>> disown, trap, subshells (including calling the script from within
>>>>>> itself), etc, to no avail.
>>>>>>
>>>>>> The problem is that this doesn't happen if I run the command directly
>>>>>> instead, without mpirun. Attached is a script that reproduces the
>>>>>> problem. It runs a simple counting script in the background which takes
>>>>>> 10 seconds to run, and tails the output. If called with "nompi" as first
>>>>>> argument, it will simply run bash -c "$SCRIPT">& "$out"&, and with
>>>>>> "mpi" it will do the same with 'mpirun -np 1' prepended. The output I
>>>>>> get is:
>>>>> what about:
>>>>>
>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>
>>>>> i.e. replace the subshell with changed interrupt handling with the
>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can
>>>>> be checked in /proc/<pid>/status
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>> $ ./ompi_bug.sh mpi
>>>>>> mpi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> ^C
>>>>>> $ ./ompi_bug.sh nompi
>>>>>> nompi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> ^C
>>>>>> $ cat output.*
>>>>>> mpi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> mpirun: killing job...
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme exited
>>>>>> on signal 0 (Unknown signal 0).
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun: clean termination accomplished
>>>>>>
>>>>>> nompi:
>>>>>> 1
>>>>>> 2
>>>>>> 3
>>>>>> 4
>>>>>> 5
>>>>>> 6
>>>>>> 7
>>>>>> 8
>>>>>> 9
>>>>>> 10
>>>>>> Done
>>>>>>
>>>>>>
>>>>>> This convinces me that there is something strange with OpenMPI, since I
>>>>>> expect no difference in signal handling when running a simple command
>>>>>> with or without mpirun in the middle.
>>>>>>
>>>>>> I've tried looking for options to change this behaviour, but I don't
>>>>>> seem to find any. Is there one, preferably in the form of an environment
>>>>>> variable? Or is this a bug?
>>>>>>
>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also
>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>
>>>>>> Thanks,
>>>>>> Pablo
>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users