On 17.05.22 17:17, Timo Rothenpieler wrote:
On 17.05.2022 15:58, Brian Andrus wrote:
You are starting to understand a major issue with most containers.
I suggest you check out Singularity, which was built from the ground
up to address most issues. And it can run other container types (eg:
do
On 5/17/22 12:00 pm, Paul Edmon wrote:
Database upgrades can also take a while if your database is large.
Definitely recommend backing up prior to upgrade as well as running
slurmdbd -Dv and not the systemd daemon as if the upgrade takes a
long time it will kill it preemptively due to unre
Database upgrades can also take a while if your database is large.
Definitely recommend backing up prior to upgrade as well as running
slurmdbd -Dv and not the systemd daemon as if the upgrade takes a
long time it will kill it preemptively due to unresponsiveness which
will create all sort
Hi,
You can upgrade from 19.05 to 20.11 in one step (2 major releases),
skipping 20.02. When that is completed, it is recommended to upgrade
again from 20.11 to 21.08.8 in order to get the current major version.
The 22.05 will be out very soon, but you may want to wait a couple of
minor rele
So the need to go step-by-step is due to changes in the database schema.
The upgrade process is not able to upgrade if there is too big of a
difference.
That is a little gotcha: so when upgrading, you need to start slurmdbd
and let it run for a bit as it does the database update (you can watch
I think it should be, but you should be able to run a test and find out.
-Paul Edmon-
On 5/17/22 12:13 PM, byron wrote:
Sorry, I should have been clearer. I understand that with regards to
slurmd / slurmctld you can skip a major release without impacting
running jobs etc. My questions was a
Could you help me figure out why our jobs are stuck PD because of BeginTime?
e.g:
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
24458 defq cromwell smrtanal PD 0:00 1 (BeginTime)
# scontrol show job 24458
JobId=24458 JobName=c
Sorry, I should have been clearer. I understand that with regards to
slurmd / slurmctld you can skip a major release without impacting running
jobs etc. My questions was about upgrading slurmdbd and whether it was
necessary to upgrade through the intermediate major releases (which I know
underst
The slurm docs say you can do two major releases at a time
(https://slurm.schedmd.com/quickstart_admin.html):
"Almost every new major release of Slurm (e.g. 20.02.x to 20.11.x)
involves changes to the state files with new data structures, new
options, etc. Slurm permits upgrades to a new major
Thanks Brian for the speedy responce.
Am I not correct in thinking that if I just go from 19.05 to 20.11 then
there is the advantage that I can upgrade slurmd and slurmctld in one go
and it won't affect the running jobs since upgrading to a new major release
from the past two major releases doesn'
Hi Petar,
thanks for letting us know!
We will definitely look into this and will get back to you on GitHub
when technical questions/problems arise.
Just one quick question: we are neither using Telegram nor MS-Teams
here, but Matrix. In case we would like to deliver messages through
that: wh
On 17.05.2022 15:58, Brian Andrus wrote:
You are starting to understand a major issue with most containers.
I suggest you check out Singularity, which was built from the ground up
to address most issues. And it can run other container types (eg: docker).
Brian Andrus
Side-Note to this, sing
You need to step upgrade through major versions (not minor).
So 19.05=>20.x
I would highly recommend going to 21.08 while you are at it.
I just did the same migration (although they started at 18.x) with no
issues. Running jobs were not impacted and users didn't even notice.
Brian Andrus
On
Hi
I'm looking at upgrading our install of slurm from 19.05 to 20.11 in
responce to the recenty announced security vulnerabilities.
I've been through the documentation / forums and have managed to find the
answers to most of my questions but am still unclear about the following
- In upgrading t
Hi GHui,
fyi: I am not a podman-expert so my questions might be stupid. :-)
From what you told us so far you are running the podman-command as
non-root but you are root inside the container, right?
What is the output of "podman info | grep root" in your case?
How are you submitting a job fro
You are starting to understand a major issue with most containers.
I suggest you check out Singularity, which was built from the ground up
to address most issues. And it can run other container types (eg: docker).
Brian Andrus
On 5/16/2022 10:49 PM, GHui wrote:
I use podman 4.0.2. And slurm
> What is the use-case for having users need to self-limit?
Our users self limit jobs with extremely high disk IO requirements. Some batch
jobs read/write over 15TB a day and I haven't identified an effective method of
capping IOPS per user. We still have issues with the occasional user decidi
17 matches
Mail list logo