Scheduled maintenance

October 27th at 14:00 will hexagon be unavailable due a scheduled maintenance. Some faulty hardware will be replaced and some software will be updated. The maintenance will probably take 5 hours.

Update: The scheduled maintenance have been moved to 10:00 at November the 5th.

Update Nov 5th 10:00: Maintenance is now started.

Update 18:30: We have some problems with the file system /work. We are working on solving this problem.

Update 23:00: Hexagon is now running again.

As previously noted, we will have a scheduled downtime from 16:00 Tuesday July 8th. We will replace a faulty module and do some I/O-benchmarking which requires a reserved system. It is estimated that the machine will be available for login at 19:00.

Update, 16:00: hexagon is shutdown for hw replacement
Update, 16:45: hexagon is up.
Update, 19:10: hexagon is up and allowing users to login.

There will be a planned maintenance on hexagon for software upgrade on Monday June 16th starting at 14:00 and expected to last approximately 3 hours.

The Cray software release will be upgraded from 2.0.44 to 2.0.53.
This release will have more quad-core optimizations as well as a new version of the MPI library. We therefore recommend that you recompile your programs and libraries after the upgrade. We will notify when we have re-compiled the libraries/modules installed by us.

Update 16th, 14:40 System taken down.
Update 16th, 19:30 System back online with version 2.0.53 and MPT 3.0

Look for update on when we have re-compiled libraries:

All compute-node (cnl) software has been re-compiled.
Most login node software has been recompiled, except GNUPLOT.
UPC is not re-compiled yet.

The previously postponed maintenance of hexagon (http://www.parallaw.uib.no/syslog/154) is now scheduled for Thursday April 24th from 14:00 to approximately 18:00.

This note will be updated as we know more about the maintenance.

Update, Thursday 24th:

14:10: System is taken down for diagnostics and init change.
14:30: Hardware work begins.
16:00: Hardware work ends.
17:10: System is up and running.

Hexagon will have planned downtime on Wednesday April 9th from 13:00 to approximately 16:00.
The maintenance will replace bad CPUs after the quad-core upgrade. A number of CPU-replacements is expected after a major CPU upgrade and the current failure-rate is within expected levels.

We will update this note with more information.

Update, Wednesday 9th 13:00: The scheduled maintenance will be postponed to a not yet determined time. We will update this note when we know when we are ready to do the maintenance.

Update, Friday 11th 19:00: The maintenance Monday the 14th is related to this postponed issue, which will reduce the number of failed nodes.

Early on March 26th hexagon will be shutdown for the initial quad-core upgrade. We hope to be able to have parts of the machine up while the second half is upgraded. It will nevertheless mean that the entire machine will be taken down first, before being booted to a smaller size.The physical upgrade will probably take three days. There will then be some more days with tuning and reconfiguring.

One very important part of this is that ALL programs and libraries will have to be re-compiled when hexagon is booted up after the finished upgrade.

Wednesday, 09:00: Upgrade has started. Machine is now down for a while for diagnostics.

Wednesday, 12:30: Half of the machine is now running again, while the other half is being upgraded to quad-core. We expect to take the entire machine down Friday morning. Please consider the machine to be in testing state, so unannounced downtime might occure.

Wednesday, 16:45: The upgrade is ahead of schedule, therefore the machine will be taken down tomorrow around 10am.

Thursday, 12:00: Two racks are now running, which will run till tomorrow morning, Friday 28th, and then the entire machine will be shutdown at 8am. The machine will then stay down untill, at least, Monday.

Friday, 08:00: Hardware part of upgrade is now finished. The machine is now unavailable until the software, diagnostics and testing has finished.

Saturday, 17:00: Main part of software upgrade is finished. The machine is running, but is unavailable due to testing.

Tuesday, April the 1st, 18:00: Hexagon is now available again, see http://www.parallaw.uib.no/syslog/153 for more details.

Because of a faulty fc-switch, we are scheduling maintenance on hexagon Wednesday the 6th of February at 14:00.
The mainenance should not take longer than an hour.

Update, Tuesday, 11:00: Due the hexagon reboot we replaced the fc-switch today. The maintenance tomorrow is therefore canceled.

Update Tuesday 12:15, The scheduled maintenance is completed and the machine is running again.