Am 23.04.2011 um 19:33 schrieb Ralph Castain:
> On Apr 23, 2011, at 10:40 AM, Pablo Lopez Rios wrote:
>
>>> I'm not sure what you are actually trying to accomplish
>>
>> I simply want a script that runs the equivalent of:
>>
>> mpirun command>& out&
>> tail -f out
>>
>> such that hitting Ctrl+C stops tail but leaves mpirun running. I can
>> certainly do this without mpirun,
>
> I don't think that's true. If both commands are in a script, then at least
> for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both-
> processes.
What about setsid and pushing it in a new seesion instead of using & in the
script?
-- Reuti
>
> At least when I test it, even non-mpirun processes will abort.
>
>> it's not unreasonable to expect to be able to do the same with mpirun.
>
> I'm afraid it won't work, per my earlier comments.
>
>> I need mpirun to either ignore the SIGINT or not receive it at all -- and as
>> per your comments, ignoring it is not an option.
>>
>> Let me rephrase my question then. With the following script:
>>
>> mpirun command>& out&
>> tail -f out
>>
>> SIGINT stops tail AND mpirun. That's OK. The following:
>>
>> (
>> trap : SIGINT
>> mpirun command>& out&
>> )
>> tail -f out
>>
>> has the same effect, idicating that mpirun overrides previous traps in the
>> same subshell. That's OK too. However the following:
>>
>> (
>> trap : SIGINT
>> (
>> mpirun command>& out&
>> )
>> )
>> tail -f out
>>
>> also has the same effect. How is mpirun overriding the trap in the *parent*
>> subshell so that it ends up getting the SIGINT that was supposedly blocked
>> at that level? Am I missing something trivial? How can I avoid this?
>
> I keep telling you - you can't. The better way to do this is to execute
> mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail
> without mpirun seeing it.
>
> But you are welcome to not believe me and continue thrashing... :-/
>
>>
>> Thanks,
>> Pablo
>>
>>
>> On 23/04/11 16:27, Ralph Castain wrote:
>>> On Apr 23, 2011, at 9:11 AM, Pablo Lopez Rios wrote:
>>>
>>>>>> Pressing Ctrl+C should stop tail -f, and the MPI job
>>>>>> should continue.
>>>>> I don't think that is true at all. When you hit ctrl-C,
>>>>> every process executing in the script receives it. Mpirun
>>>>> traps the ctrl-c and immediately terminates all running
>>>>> MPI procs.
>>>> By "Ctrl+C should stop tail -f" I mean that this is the
>>>> desired behaviour of the script, not that this is what ought
>>>> to happen in general. My question is how to achieve this
>>>> behaviour, since I'm having trouble working around mpirun
>>>> catching sigint.
>>> Like I said in my other response, you can't - mpirun automatically traps
>>> sigint and terminates the job in order to ensure proper cleanup during
>>> abnormal terminations.
>>>
>>> I'm not sure what you are actually trying to accomplish, but there are
>>> other signals that don't cause termination. For example, we trap and
>>> forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
>>>
>>> But ctrl-c has a special meaning ("die"), and you can't tell mpirun to
>>> ignore it.
>>>
>>>
>>>> Thanks,
>>>> Pablo
>>>>
>>>>
>>>>
>>>> On 23/04/11 15:12, Ralph Castain wrote:
>>>>> On Apr 23, 2011, at 6:20 AM, Reuti wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>>>>>>
>>>>>>> I'm having a bit of a problem with wrapping mpirun in a script. The
>>>>>>> script needs to run an MPI job in the background and tail -f the
>>>>>>> output. Pressing Ctrl+C should stop tail -f, and the MPI job should
>>>>>>> continue.
>>>>> I don't think that is true at all. When you hit ctrl-C, every process
>>>>> executing in the script receives it. Mpirun traps the ctrl-c and
>>>>> immediately terminates all running MPI procs.
>>>>>
>>>>>
>>>>>>> However mpirun seems to detect the SIGINT that was meant for tail, and
>>>>>>> kills the job immediately. I've tried workarounds involving nohup,
>>>>>>> disown, trap, subshells (including calling the script from within
>>>>>>> itself), etc, to no avail.
>>>>>>>
>>>>>>> The problem is that this doesn't happen if I run the command directly
>>>>>>> instead, without mpirun. Attached is a script that reproduces the
>>>>>>> problem. It runs a simple counting script in the background which takes
>>>>>>> 10 seconds to run, and tails the output. If called with "nompi" as
>>>>>>> first argument, it will simply run bash -c "$SCRIPT">& "$out"&, and
>>>>>>> with "mpi" it will do the same with 'mpirun -np 1' prepended. The
>>>>>>> output I get is:
>>>>>> what about:
>>>>>>
>>>>>> ( trap "" sigint; exec mpiexec ...)&
>>>>>>
>>>>>> i.e. replace the subshell with changed interrupt handling with the
>>>>>> mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can
>>>>>> be checked in /proc/<pid>/status
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>> $ ./ompi_bug.sh mpi
>>>>>>> mpi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> ^C
>>>>>>> $ ./ompi_bug.sh nompi
>>>>>>> nompi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> ^C
>>>>>>> $ cat output.*
>>>>>>> mpi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> mpirun: killing job...
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpirun noticed that process rank 0 with PID 1222 on node pablomme
>>>>>>> exited on signal 0 (Unknown signal 0).
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpirun: clean termination accomplished
>>>>>>>
>>>>>>> nompi:
>>>>>>> 1
>>>>>>> 2
>>>>>>> 3
>>>>>>> 4
>>>>>>> 5
>>>>>>> 6
>>>>>>> 7
>>>>>>> 8
>>>>>>> 9
>>>>>>> 10
>>>>>>> Done
>>>>>>>
>>>>>>>
>>>>>>> This convinces me that there is something strange with OpenMPI, since I
>>>>>>> expect no difference in signal handling when running a simple command
>>>>>>> with or without mpirun in the middle.
>>>>>>>
>>>>>>> I've tried looking for options to change this behaviour, but I don't
>>>>>>> seem to find any. Is there one, preferably in the form of an
>>>>>>> environment variable? Or is this a bug?
>>>>>>>
>>>>>>> I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also
>>>>>>> v1.2.8 as distributed with OpenSUSE 11.3.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Pablo
>>>>>>> <ompi_bug.sh.gz>_______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users