Quoting marie-liesse.martine...@dga.defense.gouv.fr: > Hello Nicolas, > > I know I gave a little information. But what do you want to know? > I am going to have a new cluster and to probably install SLURM on it. > But before this happens, I would like to test this ressource manager on my > computer. > > I used ps aux | grep slurm and I had this > 1806 3038 0.0 0.0 105300 924 pts/1 S+ 08:34 0:00 grep > slurm > Does it mean that the slurm controller is really not running? How can I > run it?
On compute and head nodes: /etc/init.d/slurm start You can check the SlurmctldLogFile for any errors. > Please find my slurm.conf attached. > > > -First I decided to choose MUNGE as the authentication method (there will > be many users on the new cluster). I installed it after SLURM was > installed. > Now I do not find it useful because I am the only user on my computer and > I do not especially want more security on the new cluster (there will be > an access control on it). Is it a good decision to not take any? Without this, a user could modify the SLURM code to run a job as another user. > -For the type of MPI, I chose Open MPI and I can have Mpich2-1.5 too. > -/home/myname/slurm This is the directory where I put batch job scripts. > -/opt/slurm/2.3.3 sources directory. > -I created a private and a public keys for JobCredential*. Those keys are only used if OpenSSL is used instead of Munge for security. Munge is the default. > Because I am not an user root, do I have to uninstall SLURM and install it > as a normal user? You can test as a normal user by configuring SlurmUser and SlurmdUser to your user name. > Thanks, > Marie. > > > > > > > Nicolas Bigaouette <nbigaoue...@gmail.com> > 20/06/2012 17:26 > Veuillez répondre à > "slurm-dev" <slurm-dev@schedmd.com> > > > A > "slurm-dev" <slurm-dev@schedmd.com> > cc > > Objet > [slurm-dev] Re: Slurm controller > > > > > > > On Wed 20 Jun 2012 09:38:03 AM EDT, > marie-liesse.martine...@dga.defense.gouv.fr wrote: > > Slurm controller > Hello, > > I have SLURM 2.3.3 installed on my computer (Linux 2.6 x86_64, > CPUs=16). I amnot an user root except on SLURM directory. > There was no slurm.conf. So I created one with > _https://computing.llnl.gov/linux/slurm/configurator.html_. > > Each time I execute a SLURM command, the same error message comes: > /unable to contact slurm controller (connect failure)/. > For/sinfo -vvvv/ I have even /sinfo: debug: Failed to contact primary > controller: Connection refused/. > I have already read messages on slurm-devel google groups. But I did > not find how to resolve this problem. > > Please could you help me? > > Thanks! > > > Hi Marie, > > It's hard to tell what is wrong with the information you provided. It > looks like the slurm controller is not running. You can verify that it is > indeed running using: > ps aux | grep slurm > You should see something like this: > slurm 3073 0.0 0.0 201324 3720 ? Sl Jun18 0:07 > /usr/sbin/slurmctld > root 3097 0.0 0.0 113612 1872 ? S Jun18 0:00 > /usr/sbin/slurmd > The "controller" is the first one (slurmctld). Basically, you need one per > cluster. On each machine of the cluster, you need to run slurmd. So on a > single workstation, you run both (like in my previous example). > > If you don't have root access on the machine, you'll have to compile and > install slurm as a normal user. I don't know if running it that way is > supported. Maybe you can find the right flags to pass to the executables > to point to user owned directories of your choice. > > Without more information on how you run slurm, I can't really help more... > > Regards, > > Nicolas > > > > [ENVOYE PAR INTERNET] > > Ce message électronique et tous les fichiers qui lui sont attachés sont > destinés exclusivement à l'usage de la personne à laquelle ils sont > adressés. Si vous avez reçu ce message par erreur, merci d'en avertir > immédiatement son émetteur et de supprimer ce message de votre système > sans en conserver de copie. > > This email and any attachments are intended solely for the use of the > individual to whom they are addressed.If you have received this e-mail in > error, please inform the sender immediately without keeping any copy > thereof and delete it from your system.