FAQ HPC

Frequently Asked Questions for the HPC Service

Information about our HPC Services

Who can use the Research Computing Service HPC service?

We provide services to all of the following: 

  • Current members or associates of the University of Cambridge 

  • Members of other institutions who are allocated resources through one of the National facilities (Tier2, DiRAC, IRIS) 

  • External customers who purchase resources through our industrial channel

 

These are not mutually exclusive categories, but we would create a single user account per individual and associate it with all applicable resources.

I am a current member of the University of Cambridge. How do I obtain access to the HPC service?

First, you should identify a suitable Principal Investigator (PI) to support your application for an account (you may be eligible to be your own PI, please see the question below). Then please submit the form for internal users.

Please note that the named PI will be asked to confirm the application. If you also have access to a National facility allocation of resources, we recommend you apply as a Cambridge user first. Please see the general question ‘How do I apply for an account?’ below.

Can I (or my research group) use the Research Computing Service HPC service for free?

If you are a member of the University of Cambridge, yes. The first time we receive an internal user application referencing a new Principal Investigator (PI) then a new free tier project is created, attached to the new PI. All user applications referencing the same PI will be attached to the same free tier project(s). 

Please note that free tier projects have a fixed quarterly limit of CPU or GPU hours (if you intend to use the GPU cluster, we will typically create a free tier project for both CPU and GPU). These limits are refreshed at the beginning of each University quarter which run August-October, November-January, February-April, May-July). Details of the free service tier can be found in the documentation area.

The online application form for University of Cambridge users is the form for internal users.

Am I eligible to be a Principal Investigator (PI)?

A brief characterisation of a typical PI is a member of academic staff with purchasing approval privileges for a cost centre in the University Finance System. Senior academics who are principal investigators or co-investigators on grants funding research projects would be typical PIs. PhD students and PDRAs employed on someone else's grant would not be.

Leaders of research projects receiving allocations via National facilities, or of industrial projects, would also be considered PIs (but there is no free tier for PIs who are not members of the University of Cambridge).

I lead a research group. Do I need to apply for a personal account on the HPC service in order to monitor use of my resources?


Yes, at present it is necessary to obtain a user account on the HPC service in order to run the accounting commands. It is possible to delegate this and request that a user with a user account be granted coordinator privilege on the relevant projects.

How do I apply for an account?

If you are a University of Cambridge user wishing to use HPC resources, please apply for a user account using the form for internal users.

If you are entitled to use National Facility services (e.g. DiRAC, IRIS or Tier2), then please see this page. We recommend that if you are also a University of Cambridge user, you apply for access on that basis first (see the preceding paragraph), and then request National Facility resources after your user account has been created. You will receive access to any research storage affiliated with your National Facility project automatically with the related CPU/GPU hour resources.

If you are a member of an external organisation but are affiliated with a project on the HPC cluster, please apply for a user account using this external user application form.

How do I monitor my personal usage, or the usage of my research group?

At present, if you wish to monitor project HPC usage, it will be necessary to obtain a user account on the HPC service (irrespective of whether you intend to run jobs yourself).

By default, each user can only see usage information regarding their own jobs, and overall usage about the projects of which they are a member. 

The PI may nominate one or more coordinator users (perhaps including themselves) by emailing support@hpc.cam.ac.uk

Coordinator users are able to view all usage charged to the particular projects for which they have coordinator status.

Each user can monitor their personal usage, and that of their projects overall, most simply by using the local command mybalance. Coordinator users will see usage by all users for their projects. Similarly, the local command gbalance -p PROJECT-SL2-GPU will show the overall and per-user usage figures (over all time) for all users who have submitted jobs charged to PROJECT-SL2-GPU.These two commands will also display the number of (CPU or GPU) hours still available to be used.

More detailed information can be extracted using the Slurm commands sreport, sacct and also the local command gstatement. The latter is particularly useful for generating lists of jobs over a particular period for a particular project (in response e.g., to an audit request). To be useful, the command should be run by a coordinator for the project. (E.g., gstatement -p PROJECT-SL2-GPU -s 2022-01-01-00:00:00 -e 2023-02-28-23:59:59 will list all jobs charged to the PROJECT-SL2-GPU project between 1st Jan 2022 and 28th Feb 2023. Each line will include the GPU hour usage for each job (similarly for CPU projects).)

When might I consider purchasing CPU or GPU hours?

The free tier of service (Service Level 3) allows each PI a certain number of CPU core hours and GPU hours per University accounting quarter. This number refreshes at the start of each new quarter. 

Each job charged to the associated SL3 project (typically PROJECT-SL3-CPU/GPU) will receive the priority settings and per-user limits associated with Service Level 3.

If the job throughput afforded by the SL3 priority settings, job limits and quarterly usage limit is too low for your work, and you require higher priority or larger/longer jobs, and/or a greater overall consumption per quarter, then you should consider purchasing Service Level 2 (SL2) CPU or GPU hours.

How do I purchase CPU or GPU hours?

Please ask your department to raise a University purchase order (PO) to ‘Information Services’ using the instructions on the internal charges page and send it to purchases@hpc.cam.ac.uk.

If you are a member of an external organisation, it may be possible to purchase HPC hours or storage at external rates (inclusive of VAT). This will typically require a contract. Please email support@hpc.cam.ac.uk if you would like to enquire about this.

 

If you are a member of the University of Cambridge, you can view the costs of our HPC services here.

Please note that the procedure for purchasing storage is different – all storage purchases take place through the self-service portal. 

The self-service portal will generate quotes, against which departmental POs (made out to ‘Information Services’) can be raised, and POs should be uploaded via the portal (not sent by email, please).

If I purchase CPU or GPU hours, can my research group still use the free tier of service?

Yes. Purchased CPU or GPU hours are provided through a new Slurm project, typically with either SL1 or SL2 in the name (in contrast to the Slurm projects containing free resources, which have SL3 in the name indicating the service level, in this case ‘Service Level 3’). The desired project to charge is indicated in the job submission script with the -A or equivalently --account= directives. The SL3 project will remain available and it is possible to create multiple SL2 projects if it is desired to keep usage funded from different sources separately accountable.

Do I need to purchase storage in order to use the HPC service, and if so, how do I do that?

Each University of Cambridge user receives 1TB of HPC storage, in addition to the 55GiB in their home directories (although the latter should not be used for data either read or written by jobs). This 1TB of HPC storage cannot be shared with other users. Shared HPC storage can be purchased from the Research Computing Service storage services.

Users of resources granted by one of the National facilities to which we provide service will have automatic access to their own project-specific HPC storage. Such users who are not also Cambridge users will not also receive the personal 1TB.

How do I submit jobs to the HPC cluster?

All production workload should be packaged in the form of batch jobs and submitted to the job scheduler.

Please see the guidance in our documentation area

How should I submit jobs most efficiently, and minimise waiting?

It is important to bear in mind that the HPC cluster is a finite resource, with many users. It follows that it will often be necessary to wait for nodes to become available after job submission, before the job is actually launched. 

The scheduler does not operate on a ‘first come, first served’ basis but will try to fit as many jobs as possible into the available gaps. In general, smaller and briefer jobs are easier to ‘backfill’ in this way. So if your job will only take 1 hour, be sure not to ask for 12 hours (a 12 hour window is harder to find than a window a little larger than 1 hour) but request a little more than the expected time for safety (since any job which exceeds its requested time will be cancelled). We suggest a 10% safety margin. Note that a job which completes before its requested time has passed will only be charged for the (wall clock) time it has actually used.

On the other hand, a job which requests more CPU cores or GPUs than it actually uses will be charged for all the CPU cores or GPUs requested (because it will have prevented other jobs from using those resources). It is therefore important to understand (for reasons of both cost and waiting time) how many CPU cores, GPUs and nodes you really need. For non-parallel jobs, the answer is usually one node, and one CPU core. For jobs with parallel threads, but no MPI, the answer is usually one node, and as many CPU cores as you want to have threads.

For more guidance, please see https://docs.hpc.cam.ac.uk/hpc/user-guide/quickstart.html#running and https://docs.hpc.cam.ac.uk/hpc/user-guide/batch.html#submitting-jobs

Submitting many similar jobs can be time consuming to manage, and to submit to the scheduler. This can often be streamlined by packaging the set of jobs as a single ‘array job’. Please find more information about array jobs at https://docs.hpc.cam.ac.uk/hpc/user-guide/batch.html#array-jobs

Can I run HPC jobs on the login nodes?

Only short test jobs consuming a small amount of resources are allowed on the shared login nodes. All production HPC workload should be submitted via the job scheduler which will allocate resources on the compute nodes. 

Excessive use of shared login nodes may trigger automatic remedial action via the watchdog cron script. See https://docs.hpc.cam.ac.uk/hpc/user-guide/interactive.html for more details.

When I leave Cambridge, or my National facility project ends, can I continue to use the HPC service?

In general, when you leave the University of Cambridge, or when one of your projects via a National service reaches its conclusion, you should not expect to have continued access to the Research Computing Service HPC service. 

The National services will have their own guidelines on data management and in particular how much time is generally permitted for data to be migrated off each system.

I am having difficulty logging into the HPC service with multifactor authentication.

Please see https://docs.hpc.cam.ac.uk/hpc/user-guide/mfa.html#frequently-asked-questions-faq