Updata: when xstata is started, it immediately return with an error code of
0 (but lets the GUI open). I suspect that srun think that the job
terminated successfully and kills stata. Any clue on that?


2013/10/16 Yann Sagon <[email protected]>

>  In my cluster with slurm 2.6.2 I'm having a problem to run xstata (it's
> the graphical version of stata).
>
> If I launch directly xstata on the master or on any node as normal user,
> everything is fine.
>
> If I lauch xstata with srun (just srun xstata) nothings happens (no
> output, nothing special in the slurm log) and the command terminate almost
> immediately.
>
> I'm able to launch other graphical application.
>
> I have tried as well to launch xstata with --slurmd-debug :
>
> srun --slurmd-debug=4 xstata
> slurmd[node01]: debug level = 6
> slurmd[node01]: Uncached user/gid: sagon/1000
> slurmd[node01]: IO handler started pid=105416
> slurmd[node01]: task 0 (105421) started 2013-10-16T15:44:54
> slurmd[node01]: Setting slurmstepd oom_adj to -1000
> slurmd[node01]: adding task 0 pid 105421 on node 0 to jobacct
> slurmd[node01]: 105421 mem size 1008 200024 time 0(0+0)
> slurmd[node01]: _get_sys_interface_freq_line: filename =
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
> slurmd[node01]:  cpu 0 freq= 2201000
> slurmd[node01]: Task average frequency = 2201000 pid 105421 mem size 1008
> 200024 time 0(0+0)
> slurmd[node01]: energycounted = 0
> slurmd[node01]: getjoules_task energy = 0
> slurmd[node01]: Sending launch resp rc=0
> slurmd[node01]: auth plugin for Munge (http://code.google.com/p/munge/)
> loaded
> slurmd[node01]: Handling REQUEST_INFO
> slurmd[node01]: Handling REQUEST_SIGNAL_CONTAINER
> slurmd[node01]: _handle_signal_container for step=48997.0 uid=0 signal=995
> slurmd[node01]: Uncached user/gid: sagon/1000
> slurmd[node01]: mpi type = (null)
> slurmd[node01]: Using mpi/openmpi
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_CPU no change in value:
> 18446744073709551615
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_FSIZE no change in
> value: 18446744073709551615
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_DATA no change in value:
> 18446744073709551615
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_STACK no change in
> value: 18446744073709551615
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_CORE no change in value:
> 0
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_RSS no change in value:
> 18446744073709551615
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_NPROC no change in
> value: 18446744073709551615
> slurmd[node01]: _set_limit: RLIMIT_NOFILE : max:8192 cur:8192 req:1024
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_NOFILE succeeded
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_MEMLOCK no change in
> value: 18446744073709551615
> slurmd[node01]: _set_limit: conf setrlimit RLIMIT_AS no change in value:
> 18446744073709551615
> slurmd[node01]: removing task 0 pid 105421 from jobacct
> slurmd[node01]: task 0 (105421) exited with exit code 0.
> slurmd[node01]: Aggregated 1 task exit messages
> slurmd[node01]: killing process 105424 (inherited_task) with signal 9
> slurmd[node01]: killing process 105424 (inherited_task) with signal 9
> slurmd[node01]: Sending SIGKILL to pgid 105416
> slurmd[node01]: Waiting for IO
> slurmd[node01]: Closing debug channel
>
> Thanks for your ideas!
>

Reply via email to