[slurm-users] LRMS error: (-1) Job missing from SLURM."

2024-08-06 Thread Felix via slurm-users
m, unix user: 1900:1900, name: "org.nordugrid.ARC-CE-result-ops", owner: "/dc=eu/dc=egi/c=hr/o=robots/o=srce/cn=robot:argo-...@cro-ngi.hr", lrms: SLURM, queue: debug, lrmsid: 274398, failure: "LRMS error: (-1) Job missing from SLURM." The jobs can not be seen in sin

[slurm-users] error

2024-01-18 Thread Felix
ory=256000 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 Feature=HyperThread Could you please help me? Thank you Felix -- Dr. Eng. Farcas Felix National Institute of Research and Development of Isotopic and Molecular Technology, IT - Department - Cluj-Napoca, Romania Mobile: +40742195323

[slurm-users] install new slurm, no slurmctld found

2023-12-15 Thread Felix
-plus.repo  egi-trustanchors.repo wlcg-el9.repo Thank you Felix -- Dr. Eng. Farcas Felix National Institute of Research and Development of Isotopic and Molecular Technology, IT - Department - Cluj-Napoca, Romania Mobile: +40742195323

[slurm-users] slurm comunication between versions

2023-11-23 Thread Felix
and slurm 22.05 was installed through dnf. Thank you Felix -- Dr. Eng. Farcas Felix National Institute of Research and Development of Isotopic and Molecular Technology, IT - Department - Cluj-Napoca, Romania Mobile: +40742195323

[slurm-users] question about configuration in slurm.conf

2023-09-26 Thread Felix
: PartitionName=debug Nodes=awn-0[01-32,46-77,95-99] awn-1[00-99] Default=YES MaxTime=INFINITE State=UP is this correct? Thank you Felix -- Dr. Eng. Farcas Felix National Institute of Research and Development of Isotopic and Molecular Technology, IT - Department - Cluj-Napoca, Romania Mobile

[slurm-users] help with canceling or deleteing a job

2023-09-19 Thread Felix
about the job, the job is still there [@arc7-node ~]# squeue | grep awn-047    1808851 debug  gridjob  atlas01 CG 4-00:00:19 1 awn-047 Can I do any other thinks to kill end the job? Thank you Felix -- Dr. Eng. Farcas Felix National Institute of Research and Development of Isotopi

[slurm-users] slurm error

2022-07-08 Thread Felix
in the text file Job: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm State: Failed Specific state: FAILED Job Error: LRMS error: (-1) Job missing from SLURM What exactly failed in the system? Were do I have to look for the error? Thank you F

Re: [slurm-users] [EXT] slurmctld.log over 500 MB

2021-07-29 Thread Felix
Hello I have two logs, one in /var/log/slurm/log and another in /var/log/slurmctld.log For the second I had to enter the restart in the lines you gave me. For /var/log/slurm/log do i have to write the same lines? Thank you Felix On 7/27/2021 3:55 PM, Sean Crosby wrote: /var/log/slurm

[slurm-users] slurmctld.log over 500 MB

2021-07-27 Thread Felix
Hello my slurmctld.log is 600 MB. I am looking for a functional method to have log rotate system for slurmctld.log. There is none for slurm now on my system. I have slurm 20.02 on my system. Is there any possibility? Thank you Felix -- Dr. Eng. Farcas Felix National Institute of Research

[slurm-users] priority for ops jobs

2020-05-01 Thread Felix Farcas
Hello can I somehow setup slurm to prioritize my ops jobs vs my atlas jobs? Or can slurm be configured to dedicate one work node for ops VO processing jobs and any other to ATLAS VO processing? Thank you Felix -- Dr. Ing. Farcas Felix National Institute of Research and Development of

Re: [slurm-users] update or fresh install slurm 20.x

2020-05-01 Thread Felix Farcas
I am upgrading slrum just on a worker node. I drained all the jobs on the old slurm. The only active service on my worker node is slurmd.service Felix On 5/1/2020 10:45 AM, Ole Holm Nielsen wrote: On 01-05-2020 09:21, Felix Farcas wrote: I did install a new server ARC-CE with slurm-20.x I

[slurm-users] update or fresh install slurm 20.x

2020-05-01 Thread Felix Farcas
can function together? Thank you Felix -- Dr. Ing. Farcas Felix National Institute of Research and Development of Isotopic and Molecular Technology, IT - Department - Cluj-Napoca, Romania yahoo id: felixfarcas skype id: felix.farcas mobile: +40-742-195323 smime.p7s Description: S/MIME

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Felix Farcas
Hello I did install  mariadb-server and mariadb-devel and all worked fine Thank you Felix On 4/17/2020 11:38 AM, Christian Anthon wrote: It would be neat to have these build requirements / install requirements built into the spec file. Cheers, Christian. On 17/04/2020 10.08, Ole Holm

[slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Felix Farcas
b64/slurm/accounting_storage_mysql.so     File not found: /root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so How may I find this file. Thank you Felix -- Dr. Ing. Farcas Felix National Institute of Research and Development of Isotopic and Molecular T

[slurm-users] Power/Cloud Plugin - Race Condition after Node Start - Wrong Job State

2019-09-02 Thread Felix Wolfheimer
Just stumbled on an issue which kicks in occasionally when Slurm starts/creates instances using the power/cloud plugin. Here is what happens: I'm using the Slurm Power/Cloud plugin to create compute instances on demand. Occasionally it happens that I run into the following situation when new inst

Re: [slurm-users] Elastic Computeuest

2018-09-12 Thread Felix Wolfheimer
it can be set per partition as >> well. >> On Tue, Sep 11, 2018 at 5:24 PM Felix Wolfheimer >> wrote: >> > >> > Thanks for the input! I tried a few more things but wasn't able to get >> the behavior I want. >> > Here's what

Re: [slurm-users] Elastic Compute

2018-09-11 Thread Felix Wolfheimer
Thanks for the input! I tried a few more things but wasn't able to get the behavior I want. Here's what I tried so far: - Set SelectTypeParameter to "CR_CPU,CR_LLN". - Set SelectTypeParameter to "CR_CPU,CR_Pack_Nodes". The documentation for this parameter seems to described the behavior I want (pa

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Felix Wolfheimer
01-558-1150, Fax: 801-585-5366 > http://bit.ly/1HO1N2C > > > From: slurm-users [slurm-users-boun...@lists.schedmd.com] on behalf of > Felix Wolfheimer [f.wolfhei...@googlemail.com] > Sent: Sunday, September 09, 2018 1:35 PM > To: slurm-users@lists.schedmd.com > Subject: [slurm

[slurm-users] Elastic Compute

2018-09-09 Thread Felix Wolfheimer
I'm using the SLURM Elastic Compute feature and it works great in general. However, I noticed that there's a bit of inefficiency in the decision about the number of nodes which SLURM creates. Let's say I've the following configuration NodeName=compute-[1-100] CPUs=10 State=CLOUD and there are non

Re: [slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-30 Thread Felix Wolfheimer
to the instance which contains the NodeName, such that I can find it easily when SLURM calls the SuspendProgram to terminate the node. Lachlan Musicman schrieb am So., 29. Juli 2018, 04:02: > On 29 July 2018 at 04:32, Felix Wolfheimer > wrote: > >> I'm experimenting with SLUR

[slurm-users] Elastic Compute on Cloud - Error Handling

2018-07-28 Thread Felix Wolfheimer
I'm experimenting with SLURM Elastic Compute on a cloud platform. I'm facing the following situation: Let's say, SLURM requests that a compute instance is started. The ResumeProgram tries to create the instance, but doesn't succeed because the cloud provider can't provide the instance type at this

[slurm-users] SLURM Elastic Compute - Unable to determine this node's NodeName

2018-07-21 Thread Felix Wolfheimer
urmctld on the command line of slurmd on the node. This works fine. -- Forwarded message ----- From: Felix Wolfheimer Date: Fr., 20. Juli 2018, 23:11 Subject: SLURM Elastic Compute - Unable to determine this node's NodeName To: Hi, I'm trying to configure a cluster

[slurm-users] SLURM Elastic Compute - Unable to determine this node's NodeName

2018-07-20 Thread Felix Wolfheimer
' on the new node to set the node name explicitly to the one expected by slurmctld, or is there something else I'm missing? Thanks for any help and best regards Felix