The current Service Levels (SLs), which came into operation on 1st February 2009 and which apply to the Darwin and Wilkes clusters, are described below.
Funding units are in the form of usage credits. In all cases, 1 credit = 1 CPU core for 1 hour. Please note that in order to provide guaranteed resources, the minimum allocation to a job in all cases is a single node (i.e. 16 cores for Darwin, and 12 cores / 2 GPUs for Wilkes), and allocations consist of whole numbers of nodes.
Accounting periods or quarters are three month periods of the year running 1st February - 30th April, 1st May - 31st July, 1st August - 31st October and 1st November - 31st January.
Service level 1 (SL1) operates with the highest quality of service QOS1, designed for groups which require large amounts of computer time.
Funds paid will be converted into core hour credits. These credits will be divided over an agreed time period and allocated to quarterly (three month) accounting periods; thus the number of core hours used per quarter is defined at the start of the usage agreement.
Furthermore the quarterly allocation of core hours is set as a minimum usage guarantee. Thus SL1 users running consistent workload throughout a quarter will be guaranteed to be able to use their quarterly allocation.
Unused usage credits at the end of the accounting quarter will expire and are transferred to an expired credit account.
SL1 users are able to use more than their allotted allocation within a quarter on a best efforts basis, by either using expired credits or transferring credits from a future allocation quarter.
Expired credits are (usually) made available automatically to a project once it has exhausted its credits within the current quarter, in the manner of credits assigned under SL2 (no guaranteed rate of usage but also no further time limit on use within the lifetime of the service level agreement).
The transfer of credits from a future quarter should be arranged directly with the support personnel.
Once both normal credits and expired credits have been exhausted, further jobs submitted will be handled under the terms of SL4 (Residual Usage). It should be noted that SL4 is the lowest service level on the system and SL4 jobs will only run when there are no eligible jobs in the queue, i.e. when the system is not fully occupied with other jobs. It is not possible for SL1 users to choose SL4 when they have usable credits. SL4 is designed to help keep the system fully occupied at times of low usage, not as a free way for paying users to submit jobs.
Service level 2 (SL2) is the same as SL1 except that there is no preallocation of credits into specific quarters, no predefined minimum quarterly usage level, and usage credits do not expire at the end of the quarter. Instead credits are created at the same rate and are available for use until exhausted. This service level has the highest quality of service QOS1, and is designed for groups which require smaller amounts of computer time.
When SL2 users exhaust their credits they move down to their next eligible service level, unless more credits are purchased.
Please contact support'at'hpc.cam.ac.uk for enquiries about the cost of core hours under SL2 (internal users see this page).
Purchase orders should be emailed to Fay Hider at the UIS (fay.hider'at'uis.cam.ac.uk) copied to Stuart Rankin (sjr20'at'cam.ac.uk).
Service Level 3 (SL3) operates with the medium quality of service, QOS2. QOS2 is lower than QOS1 which is used in SL1 & SL2. This service level is designed for groups with medium usage requirements who currently do not have funding to pay for their usage, thus an immediate conversion of funds into credits is not required.
SL3 is capped with a maximum usage of 200,000 core hours per quarter. This has been introduced to promote a more even usage of the free time on the system.
There is no guaranteed minimum usage level for SL3 and there is no concept of expiry and recycling of core hours across quarters.
Once a group in SL3 has consumed all its allowed core hours in a quarter, users will default to Service Level 4.
Service Level 4 (SL4) operates with the lowest quality of service, QOS3. QOS3 is the lowest quality of service in operation on the cluster. This is the default service level for users who are not eligible for higher service levels.
SL4 jobs only run when there are no other eligible jobs in the queue and also only small core count jobs can run. Users relying on SL4 to run a job can expect very long wait times.