Hi All

Can another advise the possibilities of me encountering the error message as 
below when submitting a job ?
sbatch: error: memory allocation failure
The same script use work perfectly fine until I include  #SBATCH 
--nodelist=(compute[015-046])  (once removed it work as it should)

The issues

  1.  For the current setup, I have specific resources available for each 
compute node
     *   (NodeName=compute[007-014] Procs=36 CoresPerSocket=18 
RealMemory=384000 ThreadsPerCore=1 Boards=1 SocketsPerBoard=2) - newer model
     *   (NodeName=compute[001-006] Procs=16 CoresPerSocket=18 
RealMemory=128000 ThreadsPerCore=1 Boards=1 SocketsPerBoard=2)
  2.  I have same resources sharing between multiple queue (working fine)
  3.  When running on parallel job, the exact same job run when assigned to the 
same node category (ie exclusively on 1a or 1b)
  4.  When running the exact same jobs but assigned between 1a and 1b, the job 
will run on 1b node but no activities on 1a

Any suggestion

Thanks
Mike

Reply via email to