Chris's recommendation for running --exclusive is great for sizing jobs. I'd just suggest using /usr/bin/time -v (on debian not sure about other distributions,) to run your job when looking to size them. Then it tells you the max RSS used by the job on completion, no need to watch it manually. This can also be compared with sacct -lj <jobid> if you have accounting setup.
On Fri, Sep 8, 2017 at 5:01 AM, Sema Atasever <s.atase...@gmail.com> wrote: > Dear Chris, > > Thank you very much for your advice and help! > Slurm has not yet generated an error, it seems to be working correctly. > > Regards. > > On Thu, Sep 7, 2017 at 2:33 PM, Chris Harwell <super...@gmail.com> wrote: >> >> Note that you were successful in changing the value on the right side of >> that error message. So, you may just need to continue increasing it to a >> number expected to fit the calculation, while, of course, checking that the >> total memory available on a node is enough. Sometimes I have done a >> representative test run and used sbatch --exclusive --mem=0 job.sh and >> closely followed the memory usage of that job - logging in and using ps or >> top and/or sacct to find the RSS value for the completed job to round up and >> use next time. I believe the --exclusive option would typically allocate an >> entire node to just that one job and --mem=0 would effectively disable slurm >> memory limits. That depends on the slurm setup though... >> >> >> On Wed, Sep 6, 2017, 03:38 Sema Atasever <s.atase...@gmail.com> wrote: >>> >>> Dear Batsirai, >>> >>> I tried the line of code what you recommended but the code still >>> generates an error unfortunately. >>> >>> On Thu, Aug 24, 2017 at 5:19 PM, Batsirai Mabvakure >>> <batsir...@nicd.ac.za> wrote: >>>> >>>> >>>> Try : >>>> >>>> >>>> >>>> sbatch -J jobname --mem=18000 -D $(pwd) submit_job.sh >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: Sema Atasever <s.atase...@gmail.com<mailto:s.atase...@gmail.com>> >>>> >>>> Reply-To: slurm-dev >>>> <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>> >>>> >>>> Date: Thursday 24 August 2017 at 15:58 >>>> >>>> To: slurm-dev <slurm-dev@schedmd.com<mailto:slurm-dev@schedmd.com>> >>>> >>>> Subject: [slurm-dev] Re: Exceeded job memory limit problem >>>> >>>> >>>> >>>> Dear Lev, >>>> >>>> >>>> >>>> I already have tried --mem parameter with different values. >>>> >>>> >>>> >>>> For example: >>>> >>>> >>>> >>>> sbatch --mem=5GB submit_job. >>>> >>>> sbatch --mem=18000 submit_job. >>>> >>>> >>>> >>>> but every time it gave the same error again unfortunately. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Aug 24, 2017 at 2:32 AM, Lev Lafayette >>>> <lev.lafaye...@unimelb.edu.au<mailto:lev.lafaye...@unimelb.edu.au>> wrote: >>>> >>>> On Wed, 2017-08-23 at 01:26 -0600, Sema Atasever wrote: >>>> >>>> >>>> >>>> > >>>> >>>> > >>>> >>>> > Computing predictions by SVM... >>>> >>>> > slurmstepd: Job 3469 exceeded memory limit (4235584 > 2048000), being >>>> >>>> > killed >>>> >>>> > slurmstepd: Exceeded job memory limit >>>> >>>> > >>>> >>>> > >>>> >>>> > How can i fix this problem. >>>> >>>> > >>>> >>>> >>>> >>>> Error messages often give useful information. In this case you haven't >>>> >>>> requested enough memory in your Slurm script. >>>> >>>> >>>> >>>> Memory can be set with `#SBATCH --mem=[mem][M|G|T]` directive (entire >>>> >>>> job) or `#SBATCH --mem-per-cpu=[mem][M|G|T]` (per core). >>>> >>>> >>>> >>>> As a rule of thumb, the maximum request per node should be based around >>>> >>>> total cores -1 (for system processes). >>>> >>>> >>>> >>>> All the best, >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Lev Lafayette, BA (Hons), GradCertTerAdEd (Murdoch), GradCertPM, MBA >>>> >>>> (Tech Mngmnt) (Chifley) >>>> >>>> HPC Support and Training Officer +61383444193<tel:%2B61383444193> >>>> +61432255208<tel:%2B61432255208> >>>> >>>> Department of Infrastructure Services, University of Melbourne >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> The views expressed in this email are, unless otherwise stated, those of >>>> the author and not those of the National Health Laboratory Service or its >>>> management. The information in this e-mail is confidential and is intended >>>> solely for the addressee. >>>> >>>> Access to this e-mail by anyone else is unauthorized. If you are not the >>>> intended recipient, any disclosure, copying, distribution or any action >>>> taken or omitted in reliance on this, is prohibited and may be unlawful. >>>> >>>> Whilst all reasonable steps are taken to ensure the accuracy and >>>> integrity of information and data transmitted electronically and to >>>> preserve >>>> the confidentiality thereof, no liability or responsibility whatsoever is >>>> accepted if information or data is, for whatever reason, corrupted or does >>>> not reach its intended destination. >>> >>> >> -- >> Chris Harwell > >