We support several different classes of interactive use, to allow code development, debugging, job monitoring or post-processing.
Most simply, one can run programs straightforwardly on the command line of a login node. This is frequently sufficient for the purposes of compilation and job preparation. Note that provided you have an X server on your local machine, and you enable X-forwarding in your SSH connection (e.g. the -X or -Y options to ssh, see Connecting to HPCS systems), than X-windows applications launched on a login node should display on your screen (but NB, never try to use xhost to make this work).
The login nodes are similar in terms of hardware to the cluster compute nodes, except that some (login-west* and login-gfx*) do not have Infiniband cards. It is possible nevertheless to run small MPI jobs on the login nodes for testing purposes using shared memory. However, the login nodes are finite, shared resources and any such use must respect other users. In particular, parallel jobs must be short (i.e. minutes), use no more than four cores and 8GB of memory each, and should be niced (prefixed with nice -19) so as not to impact interactive responsiveness. The login nodes are not to be used to run production workload outside the batch queueing system. Note that interactive use of compute nodes is available as described below - antisocial monopolisation of a login node will probably receive harsh treatment from the system administrators.
The progress of any batch job started through PBS can be monitored simply by logging in (using ssh) to any of the compute nodes assigned to the job (use e.g. checkjob to see which these are). For example, the UNIX command top will immediately show whether there are active job processes, and what percentage of a CPU core each such process is managing to use; low percentages usually suggest a problem. Also from top one can verify the amount of memory per node that the job actually demands (see also the free command); exhausting the node memory will at minimum cause the node to start writing memory pages to swap (thus causing immediate and drastic performance degradation) and could at worst bring down the node.
The following limitations apply to compute node access:
- Access is possible only from the login nodes (not from external machines).
- SSH access is granted to a user when a node starts running jobs owned by that user (note: this is only ever one user at a time for a given node).
- Access may be revoked at any time after all jobs finish (either when a different user's job starts, or at the next automatic health check). When this occurs, all of the original user's processes are killed.
Although job monitoring allows direct access to compute nodes allocated by PBS, and one could in principle submit a job which simply sleeps when started, allowing processes to be launched manually from the command line (e.g. for debugging purposes), there are at least two more convenient methods of obtaining a set of nodes for interactive use available.
The traditional PBS method of creating an interactive job is to use the -I option to qsub. E.g. to request one node for 1 hour:
qsub -I -X -l nodes=1,walltime=1:0:0
The qsub command will pause until the job is scheduled, then log into the first node allocated to the job automatically, and log out when the job ends. The list of nodes allocated to the job are available (once logged in) in the file referenced by $PBS_NODEFILE and PBS-aware software such as Intel MPI can use this to determine the correct set of compute nodes on which to run. Note that the default queue is the sandybridge queue (which allocates Sandy Bridge nodes) but one can select Westmere nodes by adding an explicit -q westmere option to the qsub command line to specify the westmere queue, and similarly add -q tesla for tesla nodes.
Using this method one can create interactive jobs under the same priorities and restrictions of size and duration as non-interactive jobs. The -X option arranges for X11 forwarding (i.e. allows graphical X windows applications to display on your screen, assuming that this is already possible from the login node).
An alternative method that may furnish nodes more quickly is provided by the interactive queues. This is suitable for jobs of maximum length 2 hours and requesting no more than 256 cores which are desired to start quickly for the purposes of interactive work. Such jobs are scheduled using a special, high priority quality of service and PBS queues, and by virtue of a set of nodes dedicated to jobs of 2 hours or less the average wait time should also be around 2 hours (usually less).
Interactive jobs can be submitted most simply by requesting the appropriate interactive queue (sandybridge-int for Sandy Bridge nodes, westmere-int for Westmere or tesla-int for Tesla GPU) in the following way (without providing a script), e.g.:
qsub -mabe -q sandybridge-int -l nodes=2,walltime=1:0:0
for a job using 2 nodes for 1 hour. As there is no script specified, the above command will pause and wait for the possible entry of manual commands, however for an interactive job it is sufficient to press control-D to proceed. The job will then be submitted. Note that the -mabe option ensures that an email will be sent when the job starts, at which point checkjob can be used to see which nodes have been allocated. Then simply log in directly to one or more of these nodes using ssh, as described for job monitoring.
If no queue is specified, it is assumed that Sandy Bridge nodes are required; if Westmere or GPU nodes are wanted instead, add an option -q westmere-int or -q tesla-int respectively.
Please note that interactive jobs are a special feature and are charged at twice the normal rate of core hour credits (i.e. 1 hour on 16 cores costs 32 core hour credits).
Post-processing of large data sets produced on the HPCS may involve interactive visualisation software using 3D graphics. Please see the page on Remote desktops & 3D visualization for details.