Can you make stata to not detach from its parent process? As it is
returning srun is killing all remaining unowned processes.


On Wed, Oct 16, 2013 at 4:32 PM, Yann Sagon <[email protected]> wrote:

>  Updata: when xstata is started, it immediately return with an error code
> of 0 (but lets the GUI open). I suspect that srun think that the job
> terminated successfully and kills stata. Any clue on that?
>
>
> 2013/10/16 Yann Sagon <[email protected]>
>
>>  In my cluster with slurm 2.6.2 I'm having a problem to run xstata (it's
>> the graphical version of stata).
>>
>> If I launch directly xstata on the master or on any node as normal user,
>> everything is fine.
>>
>> If I lauch xstata with srun (just srun xstata) nothings happens (no
>> output, nothing special in the slurm log) and the command terminate almost
>> immediately.
>>
>> I'm able to launch other graphical application.
>>
>> I have tried as well to launch xstata with --slurmd-debug :
>>
>> srun --slurmd-debug=4 xstata
>> slurmd[node01]: debug level = 6
>> slurmd[node01]: Uncached user/gid: sagon/1000
>> slurmd[node01]: IO handler started pid=105416
>> slurmd[node01]: task 0 (105421) started 2013-10-16T15:44:54
>> slurmd[node01]: Setting slurmstepd oom_adj to -1000
>> slurmd[node01]: adding task 0 pid 105421 on node 0 to jobacct
>> slurmd[node01]: 105421 mem size 1008 200024 time 0(0+0)
>> slurmd[node01]: _get_sys_interface_freq_line: filename =
>> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
>> slurmd[node01]:  cpu 0 freq= 2201000
>> slurmd[node01]: Task average frequency = 2201000 pid 105421 mem size 1008
>> 200024 time 0(0+0)
>> slurmd[node01]: energycounted = 0
>> slurmd[node01]: getjoules_task energy = 0
>> slurmd[node01]: Sending launch resp rc=0
>> slurmd[node01]: auth plugin for Munge (http://code.google.com/p/munge/)
>> loaded
>> slurmd[node01]: Handling REQUEST_INFO
>> slurmd[node01]: Handling REQUEST_SIGNAL_CONTAINER
>> slurmd[node01]: _handle_signal_container for step=48997.0 uid=0 signal=995
>> slurmd[node01]: Uncached user/gid: sagon/1000
>> slurmd[node01]: mpi type = (null)
>> slurmd[node01]: Using mpi/openmpi
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_CPU no change in value:
>> 18446744073709551615
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_FSIZE no change in
>> value: 18446744073709551615
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_DATA no change in
>> value: 18446744073709551615
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_STACK no change in
>> value: 18446744073709551615
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_CORE no change in
>> value: 0
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_RSS no change in value:
>> 18446744073709551615
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_NPROC no change in
>> value: 18446744073709551615
>> slurmd[node01]: _set_limit: RLIMIT_NOFILE : max:8192 cur:8192 req:1024
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_NOFILE succeeded
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_MEMLOCK no change in
>> value: 18446744073709551615
>> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_AS no change in value:
>> 18446744073709551615
>> slurmd[node01]: removing task 0 pid 105421 from jobacct
>> slurmd[node01]: task 0 (105421) exited with exit code 0.
>> slurmd[node01]: Aggregated 1 task exit messages
>> slurmd[node01]: killing process 105424 (inherited_task) with signal 9
>> slurmd[node01]: killing process 105424 (inherited_task) with signal 9
>> slurmd[node01]: Sending SIGKILL to pgid 105416
>> slurmd[node01]: Waiting for IO
>> slurmd[node01]: Closing debug channel
>>
>> Thanks for your ideas!
>>
>
>


-- 
--
Carles Fenoy

Reply via email to