Hi Manuel,

Thanks for replying. Yes, I have checked the slurm.conf, they are all the same on the server and compute nodes. I restarted the slurmd daemon on the compute nodes and finally restarted the slurmctld service on the server. I rebooted the machines too, but it keeps showing the same error message on the console (Zero Bytes were...) and log files.

I have also set the PluginDir=/usr/lib64/slurm just in case it could not find the plugins, but it does not work either. All the partitions are active (idle), they did not turn to down or drained state.

Regards,

Jose

El 04/10/2016 a las 20:28, Manuel Rodríguez Pascual escribió:
Re: [slurm-dev] cons_res / CR_CPU - we don't have select plugin type 102 Hi Jose,

I don't know if it's the case, but this error tends to arise after changing configuration in slurmctld but not rebooting the compute nodes or having there a different configuration. Have you double-checked this?

Best regards,

Manuel

El martes, 4 de octubre de 2016, Jose Antonio <joseantonio.berna...@um.es <mailto:joseantonio.berna...@um.es>> escribió:


    Hi,

    Currently I have set the SelectType parameter to "select/linear",
    which
    works fine. However, when a job is sent to a node, the job takes
    all the
    cpus of the machine, even if it only uses 1 core.

    That is why I changed SelectType to "select/cons_res" and its
    SelectTypeParameters to "CR_CPU", but this doesn't seem to work. If I
    try to send a task to a partition, which works with select/linear, the
    following message pops up:

    sbatch: error: slurm_receive_msg: Zero Bytes were transmitted or
    received
    sbatch: error: Batch job submission failed: Zero Bytes were
    transmitted
    or received

    The log in the server node (/var/log/slurmctld.log):

    error: we don't have select plugin type 102
    error: select_g_select_jobinfo_unpack: unpack error
    error: Malformed RPC of type REQUEST_SUBMIT_BATCH_JOB(4003) received
    error: slurm_receive_msg: Header lengths are longer than data received
    error: slurm_receive_msg [155.54.204.200:38850
    <http://155.54.204.200:38850>]: Header lengths are
    longer than data received

    There is no update in the compute node logs after this error comes up.

    Any ideas?

    Thanks,

    Jose


Reply via email to