Hi Manuel,
Thanks for replying. Yes, I have checked the slurm.conf, they are all
the same on the server and compute nodes. I restarted the slurmd daemon
on the compute nodes and finally restarted the slurmctld service on the
server. I rebooted the machines too, but it keeps showing the same error
message on the console (Zero Bytes were...) and log files.
I have also set the PluginDir=/usr/lib64/slurm just in case it could not
find the plugins, but it does not work either.
All the partitions are active (idle), they did not turn to down or
drained state.
Regards,
Jose
El 04/10/2016 a las 20:28, Manuel Rodríguez Pascual escribió:
Re: [slurm-dev] cons_res / CR_CPU - we don't have select plugin type
102 Hi Jose,
I don't know if it's the case, but this error tends to arise after
changing configuration in slurmctld but not rebooting the compute
nodes or having there a different configuration. Have you
double-checked this?
Best regards,
Manuel
El martes, 4 de octubre de 2016, Jose Antonio
<joseantonio.berna...@um.es <mailto:joseantonio.berna...@um.es>> escribió:
Hi,
Currently I have set the SelectType parameter to "select/linear",
which
works fine. However, when a job is sent to a node, the job takes
all the
cpus of the machine, even if it only uses 1 core.
That is why I changed SelectType to "select/cons_res" and its
SelectTypeParameters to "CR_CPU", but this doesn't seem to work. If I
try to send a task to a partition, which works with select/linear, the
following message pops up:
sbatch: error: slurm_receive_msg: Zero Bytes were transmitted or
received
sbatch: error: Batch job submission failed: Zero Bytes were
transmitted
or received
The log in the server node (/var/log/slurmctld.log):
error: we don't have select plugin type 102
error: select_g_select_jobinfo_unpack: unpack error
error: Malformed RPC of type REQUEST_SUBMIT_BATCH_JOB(4003) received
error: slurm_receive_msg: Header lengths are longer than data received
error: slurm_receive_msg [155.54.204.200:38850
<http://155.54.204.200:38850>]: Header lengths are
longer than data received
There is no update in the compute node logs after this error comes up.
Any ideas?
Thanks,
Jose