News
System maintenance on Tuesday 11th June 2013
4th June 2013There will be a full system maintenance affecting all services on Tuesday 11th June 2013 commencing at 10:00. Please save all files, quit all applications and log off before 10am.
Power outage morning of Friday 17th May
14th May 2013Estates have informed us of a partial power outage that will affect our machine room on Friday morning (17th May) between 09:30 and 12:00. Batch jobs will need to be stopped during the blackout but we will attempt to maintain login access.
Update: there will now also be a reboot of all login nodes at 09:30. The login nodes will return immediately.
Web site unavailable from Friday evening 10th May until Monday morning 13th May
7th May 2013Due to a planned power outage in Mill Lane affecting the HPCS office, the web server (only) will be unavailable during the weekend of 11th-12th (commencing on Friday evening and ending on Monday morning). Darwin service will not be affected.
System maintenance on Tuesday 30th April 2013
24th April 2013There will be a full system maintenance affecting all services on Tuesday 30th April 2013 commencing at 10:00. Please save all files, quit all applications and log off before 10am.
Login node reboots Friday 1st March 2013
28th February 2013It is necessary to reboot the login-sand and login-gfx nodes at 11:00 on Friday 1st March. Please move to login-sand5 to to avoid this, otherwise please log off the other nodes before 11:00. The compute nodes will continue to process jobs and the login nodes should be back by roughly 11:30.
System maintenance on Tuesday 19th February 2013
12th February 2013There will be a full system maintenance affecting all services on Tuesday 19th February 2013 commencing at 10:00. Please save all files, quit all applications and log off before 10am.
System maintenance on Thursday 6th December 2012
1st December 2012There will be a full system maintenance affecting all services on Thursday 6th December 2012 commencing at 10:00 (note the unusual time). Please save all files, quit all applications and log off before 10am.
System maintenance on Wednesday 21st November 2012
13th November 2012There will be a full system maintenance affecting all services on Wednesday 21st November 2012 commencing at 10:00 (note the unusual time). Please save all files, quit all applications and log off before 10am.
Login nodes will reboot at 18:00 on Wednesday 7th November 2012
7th November 2012The login nodes (only) will reboot at 18:00. Please save all files, quit all applications and log off before 6pm. Service will be otherwise unaffected.
Login nodes will reboot at 18:00 on Tuesday 16th October 2012
15th October 2012The login nodes (only) will reboot at 18:00. Please save all files, quit all applications and log off before 6pm. Service will be otherwise unaffected.
System maintenance on Tuesday 2nd October 2012
26th September 2012 (updated 1st October)There will be a full system maintenance affecting all services on Tuesday 2nd October commencing at 18:00 (note the unusual time). Please save all files, quit all applications and log off before 6pm. Service will be restored on Wednesday.
Service has been restored, system is operating normally
4th September 2012Service is now operating normally, please contact us at support@hpc.cam.ac.uk regarding any problems using your account.
Critical issue extending maintenance
28th August 2012Maintenance has uncovered a critical issue that requires outside support. We are in process of restoring the service by using a secondary filesystem. We apologise for the inconvenience caused.
Critical issue extending maintenance
22nd August 2012Maintenance has uncovered a critical issue that requires outside support. The system will remain down until we have resolved the problem. We apologise for the inconvenience caused.
System maintenance on Tuesday 21st August 2012
9th August 2012There will be a full system maintenance affecting all services on Tuesday 21st August commencing at 10:00. Please save all files, quit all applications and log off before 10am.
System maintenance on Tuesday 31st July 2012
27th July 2012There will be a full system maintenance affecting all services on Tuesday 31st July commencing at 10:00. Please save all files, quit all applications and log off before 10am.
System maintenance on Tuesday 17th July 10:00
12th July 2012There will be a full system maintenance affecting all services on Tuesday 17th July commencing at 10:00. This is not expected to take the entire day.
Darwin3 enters service
26th June 2012The Darwin3 cluster entered production service today. Some user data is still being copied and those users will be unable to login until this is completed. Please report all issues to support.
System maintenance on Monday 25th June 10:00
21st June 2012There will be a full system maintenance affecting all services on Monday 25th June commencing at 10:00 (please note the unusual day). This will be the maintenance to merge the old and new clusters and make the new system generally available.
Position 93 on the June 2012 Top500 list
18th June 2012The Darwin3 cluster has attained position 93 on the June 2012 Top500 list. This makes it currently the fastest (known) x86_64 cluster in the UK.
System maintenance on Thursday 3rd May 09:00
30th April 2012There will be a full system maintenance affecting all services on Thursday 3rd May commencing at 09:00 (please note the unusual day). This is to allow essential work towards the ongoing system upgrade. During this maintenance the last woodcrest nodes will be decommissioned.
System maintenance on Wednesday 25th April 09:00
22nd April 2012There will be a full system maintenance affecting all services on Wednesday 25th April commencing at 09:00 (please note the unusual day). This is to allow essential work towards the ongoing system upgrade.
Pre-upgrade system maintenance on Thursday 22nd March 10:00
19th March 2012There will be a full system maintenance affecting all services on Thursday 22nd March commencing at 10:00 (please note the unusual day). This will be in order to perform necessary work in preparation for the upgrade which commences next week. We expect to release most of the current system by the end of the day (some westmeres may be retained to clear outstanding benchmark requests).
Please note that this week is the last full week of full Woodcrest service.
Tuesday 13th March: Core switch reboot
12th March 2012The ethernet core switch will reboot shortly after 10am on Tuesday 13th March. This will create a brief period during which jobs may be interrupted, otherwise service will continue normally.
No service December 17-18
28th November 2011There will be no service due to urgent work on our building electricity supply during the weekend of December 17-18. All systems will be shutdown on the preceding Friday at 17:30, and restored on the following Monday.
System maintenance on Tuesday 22nd November
14th November 2011There will be full system maintenance affecting all services commencing 10:00 on Tuesday 22nd November.
/scratch2 maintenance on Friday 7th October
30th September 2011There will be a special maintenance affecting the /scratch2 filesystem only commencing 11:00 on Friday 7th October.
No service September 17-18
2nd September 2011There will be no service due to urgent work on our building electricity supply during the weekend of September 17-18. All systems will be shutdown on the preceding Friday at 17:30, and restored on the following Monday after a period of maintenance.
Limited maintenance Tuesday 23rd August 2011
22nd August 2011The /scratch filesystem will undergo a corrective action during the maintenance period beginning 10:00 on Tuesday 23rd August, designed to restore full performance following a previous hardware failure. It may be possible to perform this transparently to jobs and interactive sessions, otherwise there may be a brief hiatus affecting access to /scratch. If commands attempting to write to /scratch block, please wait for the filesystem to return.
Some Westmere nodes may also be removed temporarily from service to allow benchmarking work.
Storage maintenance Tuesday 2nd August 2011
1st August 2011The /scratch2 filesystem will be taken offline for approximately 45 minutes at 10:00 on Tuesday 2nd August for maintenance action. Jobs and commands attempting to write to /scratch2 will block until the filesystem returns, please wait for this to occur.
Compute node maintenance Tuesday 19th July
15th July 2011All compute nodes will reboot commencing 10:00 on Tuesday 19th July. Running jobs will be requeued.
Login node reboots at 18:00 - Friday 15th July 2011
15th July 2011All login nodes will reboot at 18:00.
Storage maintenance Tuesday 5th July 2011
4th July 2011The /scratch2 filesystem will be taken offline for approximately 45 minutes at 10:00 on Tuesday 5th July for maintenance action. Jobs and commands attempting to write to /scratch2 will block until the filesystem returns, please wait for this to occur.
Storage maintenance Tuesday 28th June 2011
27th June 2011The /scratch2 filesystem will be taken offline for approximately 30 minutes at 10:00 on Tuesday 28th June for maintenance action. Jobs and commands attempting to write to /scratch2 will block until the filesystem returns, please wait for this to occur.
Storage and Westmere maintenance Tuesday 14th June 2011
11th June 2011The /scratch2 filesystem will be taken offline for approximately 45 minutes at 10:00 on Tuesday 14th June for maintenance action. Jobs and commands attempting to write to /scratch2 will block until the filesystem returns, please wait for this to occur. Also all westmere nodes will be dedicated to storage benchmarks from 10:00. This will involve temporary suspension of job processing on these nodes but login access and other compute nodes will continue to operate as normal.
Reduced Westmere maintenance Tuesday 7th June 2011
7th June 2011There will be a reduced maintenance affecting two computational units of westmere nodes only during the afternoon of Tuesday 7th June. This will involve temporary suspension of job processing on these nodes but login access and other compute nodes will continue to operate as normal.
Storage maintenance Tuesday 24th May 2011
20th May 2011There will be system maintenance commencing 10:00 on Tuesday 24th May in order to perform an essential change to one storage unit. Login access will be continued throughout, however there will be a temporary suspension of access to /scratch2 and possibly a reboot of all westmere nodes. Commands and jobs attempting to access /scratch2 may hang and the batch queues will be suspended while the change is performed.
Storage maintenance Tuesday 10th May 2011
4th May 2011There will be system maintenance relating to an upgrade of the storage commencing 10:00 on Tuesday 10th May. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued. This maintenance will allow essential filesystem hardware changes.
Westmere maintenance Tuesday 22nd March 2011
21st March 2011There will be maintenance affecting the westmere nodes only commencing 10:00 on Tuesday 22nd March. This will involve temporary suspension of job processing on these nodes but login access and other compute nodes will continue as normal.
System maintenance Tuesday 22nd February 2011
17th February 2011There will be a system maintenance commencing 10:00 on Tuesday 22nd February. This will involve temporary removal of some compute nodes from service but login access and other compute nodes will continue as normal.
System maintenance Tuesday 8th February 2011
3rd February 2011There will be a full system maintenance commencing 10:00 on Tuesday 8th February. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued. This maintenance will allow necessary filesystem checks, hardware changes, and firmware updates.
Login node reboots to take place at 18:00 on Tuesday 21st December 2010
20th December 2010All login nodes will reboot commencing 6pm. This will be done in a rolling way without interrupting running jobs:
18:00 bindloe03, bindloe04, pinta02, all mostro and all planck nodes will reboot
18:30 (approx) bindloe01, bindloe02, pinta01 will reboot.
The timing of the second wave of reboots depends slightly on how long the first set take to come back (probably about 20 minutes). These reboots will implement some security related updates. There will also be a rolling reboot of the compute nodes. This will occur as jobs finish releasing nodes and should be transparent.
[CANCELLED] System maintenance Thursday 9th December 2010
5th December 2010There will be a full system maintenance commencing 10:00 on Thursday 9th December (please note the day). This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued. This maintenance has been postponed from Tuesday following the disruptive power cut over the weekend.
System maintenance Tuesday 19th October 2010
15th October 2010There will be a full system maintenance commencing 10:00 on Tuesday 19th October. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued. It is hoped that the new Westmere-based nodes (an upgrade of 1500 cores) will be available for general use by payers at the end of the maintenance.
Reboot of all login nodes Tuesday 21st September 2010
21st September 2010There will be a reboot of all login nodes at 18:00 Tuesday 21st September. Queues will continue to operate and login access should be restored after approximately 30 minutes.
System maintenance Tuesday 14th September 2010
10th September 2010There will be a full system maintenance commencing 10:00 on Tuesday 14th September. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued.
System maintenance and power shutdown Tuesday 31st August
25th August 2010There will be a full system maintenance commencing 10:00 on Tuesday 31st August. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued. In addition, due to more essential maintenance by Estates Management on our building power supply, there will be a total power shutdown of our machine room at 17:30 extending into Tuesday evening.
Power shutdown Saturday 24th July
24th June 2010Due to essential maintenance by Estates Management on our building power supply, there will be a total shutdown of Darwin on the evening of Saturday 24th July commencing at 16:30.
System maintenance Tuesday 1st June
27th May 2010There will be a full system maintenance commencing 10:00 on Tuesday 1st June. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued.
System maintenance Monday 17th May
10th May 2010There will be a full system maintenance commencing 10:00 on Monday 17th May. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued.
This has been postponed from Tuesday 11th May due to other events.
System maintenance Tuesday 23rd February
19th February 2010There will be system maintenance commencing 10:00 on Tuesday 23rd February, followed by a series of special 2000 core job runs on behalf of two projects. This will involve temporary suspension of login access and batch queue processing, and jobs running when maintenance commences will be requeued.
System maintenance Tuesday 2nd February
28th January 2010There will be system maintenance commencing 10:00 on Tuesday 2nd February. This will involve temporary suspension of login access and batch queue processing. Jobs running when maintenance commences will be requeued.
System maintenance Tuesday 15th December
10th December 2009There will be system maintenance commencing 10:00 on Tuesday 15th December. This will involve temporary suspension of login access and batch queue processing. Jobs running when maintenance commences will be requeued.
Short network interruption Tuesday 24th November
22nd November 2009There will be an interruption to remote network connectivity at 10:00 on Tuesday 24th November, for approximately 30 minutes, while the Computing Service upgrades our external network connection. Darwin will continue to function normally but new logins will not be possible and existing logins will appear to hang during the interruption.
System maintenance Tuesday 17th November
13th November 2009There will be system maintenance commencing 10:00 on Tuesday 17th November in order to upgrade all login and compute nodes to Scientific Linux 5.4. This will involve temporary suspension of login access and batch queue processing. Jobs running when maintenance commences will be requeued.
Login node reboots starting 18:00 Thursday 5th November
5th November 2009The Darwin login nodes will be rebooting starting at 18:00 tonight (5th November).
- bindloe01 and bindloe02 will reboot at 18:00
- bindloe03 and bindloe04 will reboot at 18:30
The reboots should take 15-20 minutes each.
New accounting quarter begins Sunday 1st November
30th October 2009The current July - August accounting quarter will end at midnight on Saturday 31st. The next quarter will run from 1st November until 31st January. All SL1 and SL3 projects will receive new core hour allocations at the transition (please see here for full details).
System maintenance on Tuesday 27th October
23rd October 2009There will be a brief system maintenance commencing 10:00 on Tuesday 27th October involving temporary suspension of login access and batch queue processing. Jobs running when maintenance commences will be requeued.
System maintenance on Monday 12th October
9th October 2009There will be system maintenance commencing at 08:00 on Monday 12th October (please note the unusual day and time). Login access and batch job processing will be suspended while this work takes place.
System maintenance on Monday 5th October
5th October 2009There will be system maintenance commencing 11:00 on Monday 8th October (please note the day, this replaces the maintenance previously announced for Thursday) in order to perform essential filesystem work. Login access and batch job processing will be suspended while this work takes place.
SL5 upgrades Tuesday 29th September
28th September 2009During the morning of Tuesday 29th September, bindloe03 and bindloe04 will be upgraded to Scientific Linux 5. This will temporarily interrupt access to these two login nodes (only). No other services will be affected.
System maintenance Tuesday 1st September
28th August 2009There will be system maintenance commencing 10:00 on Tuesday 1st September to perform filesystem work.
System maintenance Tuesday 18th August
5th August 2009There will be system maintenance commencing 10:00 on Tuesday 18th August to allow further benchmarking and upgrade preparation work.
System maintenance Tuesday 11th August
5th August 2009There will be system maintenance commencing 10:00 on Tuesday 11th August to allow benchmarking and upgrade preparation work.
System maintenance Wednesday 22nd July
20 July 2009There will be system maintenance commencing 10:00 on Wednesday 22nd July to allow filesystem work. Please note that this has been moved from Tuesday.
Web pages update
9 July 2009The HPCS web pages have received some much needed attention this week. Please send feedback to support.
