Hi Jagga,

On Tue, Jan 07, 2014 at 08:03:11PM -0800, Jagga Soorma wrote:

> Thanks for your response Paddy.  I don't see those messages anymore and it
> looks like I just needed to be a bit patient :).

Fair enough. :)

> Sacct still does not show
> the account for any user except root.  Also, sreport is not reporting
> anything at all.  Here is a example:

I'm not sure. I wonder if the date range in your sreport runs is too limited. In
your output below, you're only requesting usage from the 6th of Jan to 8pm on
the 7th. Did anything actually run in that period?

The default Start period is 00:00 on the previous day. If you run it again with
"Start=2013-12-01" for example, that should give you a report on over a month's
worth of runs, so there should be something there.


Re sacct, I think it's the MinJobAge setting which relates to this. Yours is set
to 300 seconds below (the default), so you'll only see recent jobs.

We have left it to the default as well, and just use sacct for troubleshooting
active jobs, and for printing data for the user via one of the Epilog scripts
(job has just finished, so it won't be purged yet!).

Depending on your job thoughputs, increasing the MinJobAge could lead to
slurmctld's memory usage increasing significantly, so use with caution!

Paddy

> --
> ssfslurmd01:~ # grep -i cluster /etc/slurm/slurm.conf | grep -v '#'
> ClusterName=amber
> ssfslurmd01:~ # srun hostname
> ssfslurmc01
> ssfslurmd01:~ # date
> Tue Jan  7 19:56:57 PST 2014
> ssfslurmd01:~ # sreport -a user TopUsage End=20:00 cluster=amber
> --------------------------------------------------------------------------------
> Top 10 Users 2014-01-06T00:00:00 - 2014-01-07T19:59:59 (158400 secs)
> Time reported in CPU Minutes
> --------------------------------------------------------------------------------
>   Cluster     Login     Proper Name         Account       Used
> --------- --------- --------------- --------------- ----------
> ssfslurmd01:~ # sreport -a user TopUsate End=20:00
> --------------------------------------------------------------------------------
> Top 10 Users 2014-01-06T00:00:00 - 2014-01-07T19:59:59 (158400 secs)
> Time reported in CPU Minutes
> --------------------------------------------------------------------------------
>   Cluster     Login     Proper Name         Account       Used
> --------- --------- --------------- --------------- ----------
> ssfslurmd01:~ # sacct
>        JobID    JobName  Partition    Account  AllocCPUS      State
> ExitCode
> ------------ ---------- ---------- ---------- ---------- ----------
> --------
> 57             hostname production       root          1  COMPLETED
> 0:0
> 58             hostname production       root          1  COMPLETED
> 0:0
> 59             hostname production                     1  COMPLETED
> 0:0
> --
> 
> Am I missing something here?
> 
> Thanks,
> -J
> 
> 
> On Tue, Jan 7, 2014 at 9:13 AM, Paddy Doyle <pa...@tchpc.tcd.ie> wrote:
> 
> >
> > Hi Jagga,
> >
> > Some of those messages below ('adding column...') look like what you'd see
> > occasionally when upgrading slurm -- sometimes the update changes the
> > database
> > schema. Do you still see those messages, or was it a once-off?
> >
> > Please note as well that 'sacct' only shows info for recent jobs. You
> > probably
> > want to get the man page for 'sreport' for more longer-term accounting
> > info.
> >
> >
> > Also, I'd recommend using the SlurmDBD as an interface between slurm and
> > your
> > database. It'll make your life easier in the future if you have multiple
> > clusters.
> >
> > It would involve changing your slurm.conf to use something like this:
> >
> >   AccountingStorageHost=myhost01
> >   AccountingStorageType=accounting_storage/slurmdbd
> >
> > ..and creating a slurmdbd.conf on 'myhost01'.
> >
> > More details here: http://slurm.schedmd.com/accounting.html
> >
> > Paddy
> >
> > On Sat, Jan 04, 2014 at 02:46:01PM -0800, Jagga Soorma wrote:
> >
> > > Hello,
> > >
> > > I am new to slurm and was trying to enable the accounting portion of
> > slurm
> > > for better job tracking.  I was able to get things setup but seem to be
> > > missing the account filed from the output as well have some db related
> > > output when running the sacct command which won't go away:
> > >
> > > ssfslurmd01:/etc/slurm # sacct
> > >
> > > sacct: adding column cluster after node_name in table cluster_event_table
> > >
> > > sacct: adding column period_start after state in table
> > cluster_event_table
> > >
> > > sacct: adding column period_end after period_start in table
> > > cluster_event_table
> > >
> > > sacct: dropping column time_start from table cluster_event_table
> > >
> > > sacct: dropping column time_end from table cluster_event_table
> > >
> > > sacct: Renaming old tables with _old behind them.
> > >
> > > sacct: Converting old event table for amber, this may take some time,
> > > please do not restart.
> > >
> > > sacct: Converting old event table for cluster, this may take some time,
> > > please do not restart.
> > >
> > >        JobID    JobName  Partition    Account  AllocCPUS      State
> > > ExitCode
> > >
> > > ------------ ---------- ---------- ---------- ---------- ----------
> > > --------
> > >
> > > 45             hostname production                     1  COMPLETED
> > > 0:0
> > >
> > > 46                sleep production                     1  COMPLETED
> > > 0:0
> > >
> > > ssfslurmd01:/etc/slurm #
> > >
> > >
> > > My slurm.conf:
> > >
> > >
> > > ControlMachine=ssfslurmd01
> > >
> > > ControlAddr=10.36.245.23
> > >
> > > AuthType=auth/munge
> > >
> > > CacheGroups=0
> > >
> > > CryptoType=crypto/munge
> > >
> > > MpiDefault=none
> > >
> > > ProctrackType=proctrack/pgid
> > >
> > > ReturnToService=1
> > >
> > > SlurmctldPidFile=/var/run/slurmctld.pid
> > >
> > > SlurmctldPort=6817
> > >
> > > SlurmdPidFile=/var/run/slurmd.pid
> > >
> > > SlurmdPort=6818
> > >
> > > SlurmdSpoolDir=/tmp/slurmd
> > >
> > > SlurmUser=lsfadmin
> > >
> > > StateSaveLocation=/tmp
> > >
> > > SwitchType=switch/none
> > >
> > > TaskPlugin=task/none
> > >
> > > InactiveLimit=0
> > >
> > > KillWait=30
> > >
> > > MinJobAge=300
> > >
> > > SlurmctldTimeout=120
> > >
> > > SlurmdTimeout=300
> > >
> > > Waittime=0
> > >
> > > FastSchedule=1
> > >
> > > SchedulerType=sched/backfill
> > >
> > > SchedulerPort=7321
> > >
> > > SelectType=select/cons_res
> > >
> > > SelectTypeParameters=CR_CPU_Memory
> > >
> > > GresTypes=gpu
> > >
> > > AccountingStorageHost=127.0.0.1
> > >
> > > AccountingStoragePass=slurm
> > >
> > > AccountingStorageType=accounting_storage/mysql
> > >
> > > AccountingStorageUser=simran
> > >
> > > AccountingStoreJobComment=YES
> > >
> > > ClusterName=cluster
> > >
> > > JobCompType=jobcomp/none
> > >
> > > JobAcctGatherFrequency=30
> > >
> > > JobAcctGatherType=jobacct_gather/none
> > >
> > > SlurmctldDebug=3
> > >
> > > SlurmdDebug=3
> > >
> > > NodeName=ssfslurmc0[1] Procs=2 RealMemory=2006 State=UNKNOWN
> > >
> > > PartitionName=debug Nodes=ssfslurmc0[1] Default=NO MaxTime=INFINITE
> > State=UP
> > >
> > > PartitionName=production Nodes=ssfslurmc0[1] Default=YES MaxTime=INFINITE
> > > State=UP
> > >
> > > Thanks for your assistance with this.
> > >
> > > Regards,
> > > -J
> >
> > --
> > Paddy Doyle
> > Trinity Centre for High Performance Computing,
> > Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> > Phone: +353-1-896-3725
> > http://www.tchpc.tcd.ie/
> >

-- 
Paddy Doyle
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
http://www.tchpc.tcd.ie/

Reply via email to