Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

Prentice Bisbal Fri, 22 Mar 2019 10:23:44 -0700

Thomas,

The GUI app writes the script to the file slurm_script.sh in the cwd. Idid exactly what you suggested as my first step in debugging check theCommand= value from the output of 'scontrol show job' to see what scriptwas actually submitted, and it was the slurm_script.sh in the cwd.

The user did provide me with some very useful information thisafternoon: The GUI app uses python to launch the job: Here's what theuser wrote to me. OMFIT is the name of the GUI application:

New clue for the mpirun issue: The following information might behelpful.


  * I modified the script to use |subprocess| submitting the job
    directly. The job was submitted, but somehow it returned
    NoneZeroError and the |mpiexec| line was skipped.

OMFITx.executable(root,
                       inputs=inputs,
                       outputs=outputs,
                       executable='echo %s',#submit_command,
                       script=(bashscript,'slurm.script'),
                       clean=True,
                       std_out=std_out,
                       remotedir=unique_remotedir,
                       ignoreReturnCode=True)

     p=subprocess.Popen('sbatch '+unique_remotedir+'slurm.script',
                        shell=True,
                        stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE)
     std_out.append(p.stdout.read())
     print(std_out[-1], p.stderr.read())

  * As I mentioned above, my standalone python script can normally
    submit jobs likewise using |subprocess.Popen| or
    |subprocess.call|. I created the following script at the working
    directory and executed it with the same python version as OMFIT.
    It works without skip.

|import sys import os.path import subprocess print(sys.version,sys.path, subprocess.__file__) p = subprocess.Popen('sbatchslurm.script', shell=True, stdout=subprocess.PIPE,stderr=subprocess.PIPE) print(p.stdout.read(), p.stderr.read()) |The question is why the same |subprocee.Popen| command worksdifferently in OMFIT and in the terminal, even if they are called bythe same version |python2.7|.

So now it's unclear whether this is a bug in Python, or Slurm 18.06.6-2.Since the user can write a python script that does work, I think this issomething specific to the application's environment, rather than anissue with the Python-Slurm interaction. The main piece of evidence thatthis might be a bug in Slurm is that this issue started after theupgrade from 18.08.5-2 to 18.08.6-2, but correlation doesn't necessarilymean causation.


Prentice


On 3/22/19 12:48 PM, Thomas M. Payerle wrote:

Assuming the GUI produced script is as you indicated (I am not surewhere you got the script you showed, but if it is not the actualscript used by a job it might be worthwhile to examine the Command=file from scontrol show job to verify), then the only thing thatshould be different from a GUI submission and a manual submission isthe submission environment. Does the manual submission work if youadd --export=NONE to the sbatch command to prevent the exporting ofenvironment variables? And maybe add a printenv to the script to seewhat environment is in both cases. Though I confess I am unable tothink of any reasonable environmental setting that might cause theobserved symptoms.
On Fri, Mar 22, 2019 at 11:23 AM Prentice Bisbal <pbis...@pppl.gov<mailto:pbis...@pppl.gov>> wrote:
    On 3/21/19 6:56 PM, Reuti wrote:
    > Am 21.03.2019 um 23:43 schrieb Prentice Bisbal:
    >
    >> Slurm-users,
    >>
    >> My users here have developed a GUI application which serves as
    a GUI interface to various physics codes they use. From this GUI,
    they can submit jobs to Slurm. On Tuesday, we upgraded Slurm from
    18.08.5-2 to 18.08.6-2,and a user has reported a problem when
    submitting Slurm jobs through this GUI app that do not occur when
    the same sbatch script is submitted from sbatch on the command-line.
    >>
    >> […]
    >> When I replaced the mpirun command with an equivalent srun
    command, everything works as desired, so the user can get back to
    work and be productive.
    >>
    >> While srun is a suitable workaround, and is arguably the
    correct way to run an MPI job, I'd like to understand what is
    going on here. Any idea what is going wrong, or additional steps I
    can take to get more debug information?
    > Was an alias to `mpirun` introduced? It may cover the real
    application and even the `which mpirun` will return the correct
    value, but never be executed.
    >
    > $ type mpirun
    > $ alias mpirun
    >
    > may tell in the jobscript.
    >
    Unfortunately, the script is in tcsh, so the 'type' command
    doesn't work
    since,  it's a bash built-in function. I did use the 'alias'
    command to
    see all the defined aliases, and mpirun and mpiexec are not
    aliased. Any
    other ideas?

    Prentice






--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads paye...@umd.edu <mailto:paye...@umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

Reply via email to