We support several different classes of interactive use, to allow code development, debugging, job monitoring or post-processing.
Most simply, one can run programs straightforwardly on the command line of a login node. This is frequently sufficient for the purposes of compilation and job preparation. Note that provided you have an X server on your local machine, and you enable X-forwarding in your SSH connection (e.g. the -X or -Y options to ssh, see Connecting to HPCS systems), than X-windows applications launched on a login node should display on your screen (but NB, never try to use xhost to make this work).
The login nodes are similar in terms of hardware to the cluster compute nodes, except that some (login-gfx*) do not have Infiniband cards. It is possible nevertheless to run small MPI jobs on the login nodes for testing purposes using shared memory. However, the login nodes are finite, shared resources and any such use must respect other users. In particular, parallel jobs must be short (i.e. minutes), use no more than four cores and 8GB of memory each, and should be niced (prefixed with nice -19) so as not to impact interactive responsiveness. The login nodes are not to be used to run production workload outside the batch queueing system. Note that interactive use of compute nodes is available as described below - antisocial monopolisation of a login node will probably receive harsh treatment from the system administrators.
The progress of any batch job started through the scheduler can be monitored simply by logging in (using ssh) to any of the compute nodes assigned to the job (use e.g. squeue to see which these are). For example, the UNIX command top will immediately show whether there are active job processes, and what percentage of a CPU core each such process is managing to use; low percentages usually suggest a problem. Also from top one can verify the amount of memory per node that the job actually demands (see also the free command); exhausting the node memory will at minimum cause the node to start writing memory pages to swap (thus causing immediate and drastic performance degradation) and could at worst bring down the node.
The following limitations apply to compute node access:
- Access is possible only from the login nodes (not from external machines).
- SSH access is granted to a user when a node starts running jobs owned by that user (note: this is only ever one user at a time for a given node).
- Access may be revoked at any time after all jobs finish (either when a different user's job starts, or at the next automatic health check). When this occurs, all of the original user's processes are killed.
Although job monitoring allows direct access to compute nodes allocated by the scheduler, and one could in principle submit a job which simply sleeps when started, allowing processes to be launched manually from the command line (e.g. for debugging purposes), there is at least one more convenient method of obtaining a set of nodes for interactive use available.
The following command will request two Darwin (Sandy Bridge) nodes interactively for 2 hours, charged to the project MYPROJECT:
sintr -A MYPROJECT -p sandybridge -N2 -t 2:0:0
This command will create a new window if you have an X windows display (and X-forwarding is working to the login nodes), otherwise it will run in the current login node window. It will pause until the job starts, then create a screen terminal running on the first node allocated (cf man screen). X windows applications started inside this terminal should display properly (if they could from the original login session). Within the screen session, new terminals can be started with control-a c, with navigation between the different terminals being accomplished with control-a n and control-a p. Also srun can be used to start processes on any of the nodes in the job allocation, and SLURM-aware MPI implementations will use this to launch remote processes on the allocated nodes without the need to give them explicit host lists. Alternatively, just ssh in from any screen terminal to any of the allocated nodes.
Interactive jobs on Wilkes can be requested in a similar way (replacing -p sandybridge with -p tesla, and the project by the name of the appropriate Wilkes project).
Post-processing of large data sets produced on the HPCS may involve interactive visualisation software using 3D graphics. Please see the page on Remote desktops & 3D visualization for details.