skip to primary navigationskip to content

High Performance Computing

University Information Services

Studying at Cambridge

 

Short jobs

This page describes a special scheduling feature granting increased priority to short jobs (defined as using 1 hour of wall time or less), thereby allowing test or debugging jobs to start with minimal wait.

Definition of a short job

We define a short job as one which requests no more than 1 hour of wall clock (i.e. real) time. Such jobs often arise as quick tests, or debugging sessions. They may or may not be interactive jobs. For such jobs, it is frequently desirable not to have to wait for the duration of an average batch job before the job is scheduled. There are two mechanisms by which short jobs are expedited on Darwin.

Dedicated nodes for short jobs

A certain number of nodes (currently 16) on Darwin are dedicated to jobs requesting no more than 1 hour of walltime. This increases the throughput of short jobs by creating a pool of nodes which will never be blocked for long periods by general jobs. This works transparently to decrease the wait times for short jobs without requiring any special action by the user.

The INTR QoS

The INTR quality of service (QoS) is a non-default QoS available to all users on Darwin, irrespective of whether their default QoS is that of a paying, or non-paying user (i.e. it is available at service levels 1, 2 and 3). Selecting this QoS grants a short job extremely high priority to accelerate its passage through the queue, and is intended for urgent tests or interactive jobs for which long waits are problematic.

Note that use of INTR is limited to no more than two nodes in use by any one user at any time. In addition, to prevent abuse, each user may only have one job using INTR submitted at any time. Only short jobs are eligible for INTR.

In order to submit a short job using the INTR QoS, use the --qos=INTR option to sbatch (or sintr). 

E.g.

sintr -A myproject -t 1:0:0 -N1 -n16 --qos=INTR

submits an interactive short job, one hour in length, using an entire 16-core node, with the INTR QoS.