skip to primary navigationskip to content
 

Monitoring jobs

In SLURM, the command squeue shows what jobs are currently submitted in the queueing system and the command squeue -u spqr1 shows only those jobs belonging to the user spqr1 (other selections are possible, e.g. use -A to select on a particular project). An example output from darwin is shown below:

squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             10119 sandybrid MILCtest dc-klmn13 PD       0:00      1 (QOSResourceLimit)
           18974_7 sandybrid novoalig    abc123  R    1:47:23      2 sand-1-[1-2]
           18970_2 sandybrid novoalig    abc123  R    1:47:53      2 sand-1-[22-23]
             11206 sandybrid sf1l1000    abc123  R      50:31      8 sand-3-[35-37],sand-5-[55-59]
              7819 sandybrid    LCDMb     spqr1  R    9:13:48      4 sand-6-[39-42]
             11230 sandybrid alaR_iso     xyz12  R       8:11      4 sand-3-[22-25]
...
             18821     tesla     prop    spqr45 PD       0:00      4 (QOSResourceLimit)
...

In the above sandybridge and tesla indicate that the jobs are destined for Darwin and Wilkes respectively. If the state is PENDING (PD), i.e. the job is still waiting in the queue and not yet running, the final column lists the reason for this - in the case of job 10119 above it is because the user is already using the maximum resources permitted at any one time by their quality of service (QOS), which is determined by the service level. If the state is RUNNING (R) the same column lists which nodes have been allocated to the job.

The jobids reported as mmmm_n are elements of an array job, where mmmm is the SLURM_ARRAY_JOB_ID common to all jobs in the array, and n is the array index (SLURM_ARRAY_TASK_ID).

The command scontrol is a more powerful command allowing more detailed queries. E.g. to examine a particular job with id <jobid> in detail:

scontrol show job <jobid>

or

scontrol show node <nodename>

to see information regarding the node <nodename>.

Schematic representations of activity across the entire system can be obtained from sinfo and sview.

Further details can be found on the manual pages.