Updata: when xstata is started, it immediately return with an error code of 0 (but lets the GUI open). I suspect that srun think that the job terminated successfully and kills stata. Any clue on that?
2013/10/16 Yann Sagon <[email protected]> > In my cluster with slurm 2.6.2 I'm having a problem to run xstata (it's > the graphical version of stata). > > If I launch directly xstata on the master or on any node as normal user, > everything is fine. > > If I lauch xstata with srun (just srun xstata) nothings happens (no > output, nothing special in the slurm log) and the command terminate almost > immediately. > > I'm able to launch other graphical application. > > I have tried as well to launch xstata with --slurmd-debug : > > srun --slurmd-debug=4 xstata > slurmd[node01]: debug level = 6 > slurmd[node01]: Uncached user/gid: sagon/1000 > slurmd[node01]: IO handler started pid=105416 > slurmd[node01]: task 0 (105421) started 2013-10-16T15:44:54 > slurmd[node01]: Setting slurmstepd oom_adj to -1000 > slurmd[node01]: adding task 0 pid 105421 on node 0 to jobacct > slurmd[node01]: 105421 mem size 1008 200024 time 0(0+0) > slurmd[node01]: _get_sys_interface_freq_line: filename = > /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq > slurmd[node01]: cpu 0 freq= 2201000 > slurmd[node01]: Task average frequency = 2201000 pid 105421 mem size 1008 > 200024 time 0(0+0) > slurmd[node01]: energycounted = 0 > slurmd[node01]: getjoules_task energy = 0 > slurmd[node01]: Sending launch resp rc=0 > slurmd[node01]: auth plugin for Munge (http://code.google.com/p/munge/) > loaded > slurmd[node01]: Handling REQUEST_INFO > slurmd[node01]: Handling REQUEST_SIGNAL_CONTAINER > slurmd[node01]: _handle_signal_container for step=48997.0 uid=0 signal=995 > slurmd[node01]: Uncached user/gid: sagon/1000 > slurmd[node01]: mpi type = (null) > slurmd[node01]: Using mpi/openmpi > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_CPU no change in value: > 18446744073709551615 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_FSIZE no change in > value: 18446744073709551615 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_DATA no change in value: > 18446744073709551615 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_STACK no change in > value: 18446744073709551615 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_CORE no change in value: > 0 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_RSS no change in value: > 18446744073709551615 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_NPROC no change in > value: 18446744073709551615 > slurmd[node01]: _set_limit: RLIMIT_NOFILE : max:8192 cur:8192 req:1024 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_NOFILE succeeded > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_MEMLOCK no change in > value: 18446744073709551615 > slurmd[node01]: _set_limit: conf setrlimit RLIMIT_AS no change in value: > 18446744073709551615 > slurmd[node01]: removing task 0 pid 105421 from jobacct > slurmd[node01]: task 0 (105421) exited with exit code 0. > slurmd[node01]: Aggregated 1 task exit messages > slurmd[node01]: killing process 105424 (inherited_task) with signal 9 > slurmd[node01]: killing process 105424 (inherited_task) with signal 9 > slurmd[node01]: Sending SIGKILL to pgid 105416 > slurmd[node01]: Waiting for IO > slurmd[node01]: Closing debug channel > > Thanks for your ideas! >
