[slurm-dev] Re: slurm + openmpi + suspend problem

r...@open-mpi.org Tue, 18 Jul 2017 07:32:36 -0700
Just looking at it today...

> On Jul 18, 2017, at 7:25 AM, Eugene Dedits <eugene.ded...@gmail.com> wrote:
> 
> Hi Ralph, 
> 
> 
> did you have a chance to take a look at this problem? 
> 
> Thanks!
> Eugene.
> 
> 
> 
> 
> On Tue, Jul 11, 2017 at 12:51 PM, Eugene Dedits <eugene.ded...@gmail.com 
> <mailto:eugene.ded...@gmail.com>> wrote:
> Thanks! I really appreciate your help.
> In a meantime I’ve tried experimenting with 1.8.3. Here is what I’ve noticed. 
> 
> 1. Running the job with “sbatch ./my_script” where my script calls
> mpirun -np 160 -mca orte_forward_job_control 1 ./xhpl
> 
> and then suspending the job with “scontrol suspend JOBID” 
> does not work. Of 10 nodes assigned to my job 4 are still running 
> 16 mpi threads of xhpl. 
> 
> 2. Running exactly the same job and then sending TSPT to mpirun process
> does work: all 10 nodes show that xhpl processes are stopped. Resuming 
> them with -CONT also works. 
> 
> Again, this is with OpenMPI 1.8.3
> 
> Once again, thank you for all the help. 
> 
> Cheers,
> Eugene. 
> 
> 
> 
> 
>> On Jul 11, 2017, at 12:08 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
>> wrote:
>> 
>> Very odd - let me explore when I get back. Sorry for delay 
>> 
>> Sent from my iPad
>> 
>> On Jul 11, 2017, at 10:59 AM, Eugene Dedits <eugene.ded...@gmail.com 
>> <mailto:eugene.ded...@gmail.com>> wrote:
>> 
>>> Ralph, 
>>> 
>>> 
>>> Are you suggesting doing something similar to this:
>>> https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume 
>>> <https://www.open-mpi.org/faq/?category=sge#sge-suspend-resume>
>>> 
>>> If yes, here is what I’ve done:
>>> - start a job using slurm and "mpirun -mca orte_forward_job_control 1 -np 
>>> 160 xhpl”
>>> - ssh to the node where mpirun is launched
>>> - “kill -STOP PID” where PID is mpirun pid
>>> - “kill -TSTP PID” 
>>> 
>>> In both cases (STOP and TSTP) I observer that there were 16 mpi processes 
>>> running
>>> at 100% on all 10 nodes where the job was started. 
>>> 
>>> Thanks,
>>> Eugene. 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Jul 11, 2017, at 10:35 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
>>>> wrote:
>>>> 
>>>> 
>>>> Odd - I'm on travel this week but can look at it next week. One 
>>>> possibility - have you tried hitting us with SIGTSTOP instead of SIGSTOP? 
>>>> Difference in ability to trap and forward
>>>> 
>>>> Sent from my iPad
>>>> 
>>>>> On Jul 11, 2017, at 9:29 AM, Eugene Dedits <eugene.ded...@gmail.com 
>>>>> <mailto:eugene.ded...@gmail.com>> wrote:
>>>>> 
>>>>> 
>>>>> I’ve just tried 3.0.0rc1 and problems still persists there… 
>>>>> 
>>>>> Thanks,
>>>>> E. 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 11, 2017, at 10:20 AM, r...@open-mpi.org 
>>>>>> <mailto:r...@open-mpi.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> Just checked the planning board and saw that my PR to bring that change 
>>>>>> to 2.1.2 is pending and not yet in the release branch. I’ll try to make 
>>>>>> that happen soon
>>>>>> 
>>>>>> Sent from my iPad
>>>>>> 
>>>>>>> On Jul 11, 2017, at 8:03 AM, "r...@open-mpi.org 
>>>>>>> <mailto:r...@open-mpi.org>" <r...@open-mpi.org 
>>>>>>> <mailto:r...@open-mpi.org>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> There is an mca param ess_base_forward_signals that controls which 
>>>>>>> signals to forward. However, I just looked at source code and see that 
>>>>>>> it wasn't backported. Sigh.
>>>>>>> 
>>>>>>> You could try the 3.0.0 branch as it is in release candidate and should 
>>>>>>> go out within a week. I'd suggest just cloning that branch of the OMPI 
>>>>>>> repo to get the latest state. The fix is definitely there 
>>>>>>> 
>>>>>>> Sent from my iPad
>>>>>>> 
>>>>>>>> On Jul 11, 2017, at 7:45 AM, Eugene Dedits <eugene.ded...@gmail.com 
>>>>>>>> <mailto:eugene.ded...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi Ralph, 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> thanks for reply. I’ve just tried upgrading to ompi 2.1.1. The same 
>>>>>>>> problem… :-\
>>>>>>>> Could you point me to some discussion of this? 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Eugene. 
>>>>>>>> 
>>>>>>>>> On Jul 11, 2017, at 6:17 AM, r...@open-mpi.org 
>>>>>>>>> <mailto:r...@open-mpi.org> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> There is an issue with how the signal is forwarded. This has been 
>>>>>>>>> fixed in the latest OMPI release so you might want to upgrade 
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>> Sent from my iPad
>>>>>>>>> 
>>>>>>>>>> On Jul 11, 2017, at 2:53 AM, Dennis Tants 
>>>>>>>>>> <dennis.ta...@zarm.uni-bremen.de 
>>>>>>>>>> <mailto:dennis.ta...@zarm.uni-bremen.de>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hello Eugene,
>>>>>>>>>> 
>>>>>>>>>> it is just a wild guess, but could you try "srun --mpi=pmi2"(you said
>>>>>>>>>> you built OMPI with pmi support) instead of "mpirun".
>>>>>>>>>> srun is build-in and I think the preferred way of running parallel
>>>>>>>>>> processes. Maybe scontrol is able to suspend it this way.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Dennis
>>>>>>>>>> 
>>>>>>>>>>> Am 10.07.2017 um 22:20 schrieb Eugene Dedits:
>>>>>>>>>>> Hello SLURM-DEV
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I have a problem with slurm, openmpi, and “scontrol suspend”. 
>>>>>>>>>>> 
>>>>>>>>>>> My setup is:
>>>>>>>>>>> 96-node cluster with IB, running rhel 6.8
>>>>>>>>>>> slurm 17.02.1
>>>>>>>>>>> openmpi 2.0.0 (built using Intel 2016 compiler)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I am running some application (hpl in this particular case) using 
>>>>>>>>>>> batch script similar to:
>>>>>>>>>>> -----------------------------
>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>> #SBATCH —partiotion=standard
>>>>>>>>>>> #SBATCH -N 10
>>>>>>>>>>> #SBATCH —ntasks-per-node=16
>>>>>>>>>>> 
>>>>>>>>>>> mpirun -np 160 xhpl | tee LOG
>>>>>>>>>>> -----------------------------
>>>>>>>>>>> 
>>>>>>>>>>> So I am running it on 160 cores, 2 nodes. 
>>>>>>>>>>> 
>>>>>>>>>>> Once job is submitted to the queue and is running I suspend it using
>>>>>>>>>>> ~# scontrol suspend JOBID
>>>>>>>>>>> 
>>>>>>>>>>> I see that indeed my job stopped producing output. I go to each of 
>>>>>>>>>>> the 10
>>>>>>>>>>> nodes that were assigned for my job and see if the xhpl processes 
>>>>>>>>>>> are running
>>>>>>>>>>> there with :
>>>>>>>>>>> 
>>>>>>>>>>> ~# for i in {10..19}; do ssh node$i “top -b -n | head -n 50 | grep 
>>>>>>>>>>> xhpl | wc -l”; done
>>>>>>>>>>> 
>>>>>>>>>>> I expect this little script to return 0 from every node (because 
>>>>>>>>>>> suspend sent the
>>>>>>>>>>> SIGSTOP and they shouldn’t show up in top). However I see that 
>>>>>>>>>>> processes 
>>>>>>>>>>> are reliable suspended only on node10. I get:
>>>>>>>>>>> 0
>>>>>>>>>>> 16
>>>>>>>>>>> 16
>>>>>>>>>>> …
>>>>>>>>>>> 16
>>>>>>>>>>> 
>>>>>>>>>>> So 9 out of 10 nodes still have 16 MPI threads of my xhpl 
>>>>>>>>>>> application running at 100%. 
>>>>>>>>>>> 
>>>>>>>>>>> If I run “scontrol resume JOBID” and then suspend it again I see 
>>>>>>>>>>> that (sometimes) more
>>>>>>>>>>> nodes have “xhpl” processes properly suspended. Every time I resume 
>>>>>>>>>>> and suspend the
>>>>>>>>>>> job, I see different nodes returning 0 in my “ssh-run-top” script. 
>>>>>>>>>>> 
>>>>>>>>>>> So all together it looks like the suspend mechanism doesn’t 
>>>>>>>>>>> properly work in SLURM with 
>>>>>>>>>>> OpenMPI. I’ve tried compiling OpenMPI with “—with-slurm 
>>>>>>>>>>> —with-pmi=/path/to/my/slurm”. 
>>>>>>>>>>> I’ve observed the same behavior. 
>>>>>>>>>>> 
>>>>>>>>>>> I would appreciate any help.   
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Eugene. 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Dennis Tants
>>>>>>>>>> Auszubildender: Fachinformatiker für Systemintegration
>>>>>>>>>> 
>>>>>>>>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und 
>>>>>>>>>> Mikrogravitation
>>>>>>>>>> ZARM - Center of Applied Space Technology and Microgravity
>>>>>>>>>> 
>>>>>>>>>> Universität Bremen
>>>>>>>>>> Am Fallturm
>>>>>>>>>> 28359 Bremen, Germany
>>>>>>>>>> 
>>>>>>>>>> Telefon: 0421 218 57940
>>>>>>>>>> E-Mail: ta...@zarm.uni-bremen.de <mailto:ta...@zarm.uni-bremen.de>
>>>>>>>>>> 
>>>>>>>>>> www.zarm.uni-bremen.de <http://www.zarm.uni-bremen.de/>
>>> 
> 
>
[slurm-dev] Re: slurm + openmpi + suspend problem

Reply via email to