The Wilkes cluster
Cambridge HPC Service deploys fastest GPU supercomputer in the UK and the world’s most energy efficient air cooled HPC system as part of new "SKA Open Architecture Lab".
In November 2013, the University of Cambridge in partnership with Dell, NVIDIA and Mellanox deployed the UK’s fastest academic cluster, named Wilkes. The system consists of 128 Dell T620 servers, 256 NVIDIA K20 GPUs interconnected by 256 Mellanox Connect IB cards. This produces a computational performance of 240TF with a Top500 position of 166 in the November 2013 list. The system was part funded by STFC with industrial sponsorship from Rolls Royce and Mitsubishi Heavy Industries.
Immediately after its deployment, the system was ranked No.2 in the November 2013 Green500 list with an extremely high performance per watt of 3631 MFLOP/W. At that time, it was the most energy efficient air cooled supercomputer in the world (the No.1 position was given to a specialist full immersion oil cooled system). The high energy efficiency is due to two factors: firstly the very high performance per watt provided by the NVIDIA K20 GPU and secondly due to the industry leading energy efficiency obtained from the Dell T620 server.
The newly deployed GPU supercomputer is part funded by STFC to drive the SKA computing system development within the newly formed "SKA Open Architecture Lab". The SKA is a multinational collaboration to build the world’s largest radio telescope which at its centre is a requirement for the world’s largest streaming data processor, many times larger than the most powerful HPC system in operation today. The new GPU system will take a central role driving system development for the SKA placing STFC and the University of Cambridge at the forefront of large scale big-data science.
The network design was specifically architected to provide the highest I/O bandwidth for large scale big-data challenges and to have the highest message rate possible for large parallel application scaling. To achieve this Mellanox was chosen to help architect the interconnect network and the system was built using a dual rail Connect IB network providing a fully non-blocking node to node bandwidth of over 100Gb/s with a message rate of 137 million messages per second. The system is architected to utilise the NVIDIA RDMA communication acceleration which will significantly increase the system's parallel efficiency.
The new GPU system at 240 TF and the existing Cambridge Intel based supercomputer at 185 TF are now housed in a new ultra-efficient water cooled HPC data centre. The data centre is one of the most efficient HPC data centres in the world using a combination of evaporative coolers and back of rack water heat exchangers. This data centre has a spot PUE of 1.075 which is 30% more energy efficient than the previous University HPC data centre. This new green data centre combined with the new green GPU supercomputer produces an overall HPC facility energy efficiency increase of 150%. I.e. overall the whole HPC facility produces 2.5 times the computational output for the same energy usage. Also the Cambridge HPCS facility is one of only two academic HPC centres in the UK to have two HPC systems in the top half of the Top500 and one of only eight worldwide.