The global file system on fimm got a hang around 17:30 today. We are working on solving the problem.
Update: 18:15: File system is now working again. All jobs running at the time of the hang have crashed and has to be resubmitted.
Author Archives: lsz075
Scheduled downtime on hexagon on Monday
Monday 14th at 14:00, hexagon will be shutdown.
An upgrade of hexagon's firmware will solve the problem with all the failing nodes on the system. The machine should be online within two to three hours.
Update: 15:45: Hexagon is now up again. The mppmem bug is also fixed.
An upgrade of hexagon's firmware will solve the problem with all the failing nodes on the system. The machine should be online within two to three hours.
Update: 15:45: Hexagon is now up again. The mppmem bug is also fixed.
Failure on io-node on hexagon
A failure on io-node for the /work filesystem on hexagon means we will have to stop hexagon briefly to fix the issue.
Scheduled maintenance on hexagon
Hexagon will have planned downtime on Wednesday April 9th from 13:00 to approximately 16:00.
The maintenance will replace bad CPUs after the quad-core upgrade. A number of CPU-replacements is expected after a major CPU upgrade and the current failure-rate is within expected levels.
We will update this note with more information.
Update, Wednesday 9th 13:00: The scheduled maintenance will be postponed to a not yet determined time. We will update this note when we know when we are ready to do the maintenance.
Update, Friday 11th 19:00: The maintenance Monday the 14th is related to this postponed issue, which will reduce the number of failed nodes.
The maintenance will replace bad CPUs after the quad-core upgrade. A number of CPU-replacements is expected after a major CPU upgrade and the current failure-rate is within expected levels.
We will update this note with more information.
Update, Wednesday 9th 13:00: The scheduled maintenance will be postponed to a not yet determined time. We will update this note when we know when we are ready to do the maintenance.
Update, Friday 11th 19:00: The maintenance Monday the 14th is related to this postponed issue, which will reduce the number of failed nodes.
Quad-core upgrade of hexagon finished
The quad-core upgrade is now mosly complete.
We have not managed to recompile all the software yet.
REMEMBER: Your software has to be recompiled to work on the quad-cores.
We have not managed to recompile all the software yet.
REMEMBER: Your software has to be recompiled to work on the quad-cores.
Quad-core upgrade of hexagon
Early on March 26th hexagon will be shutdown for the initial quad-core upgrade. We hope to be able to have parts of the machine up while the second half is upgraded. It will nevertheless mean that the entire machine will be taken down first, before being booted to a smaller size.The physical upgrade will probably take three days. There will then be some more days with tuning and reconfiguring.
One very important part of this is that ALL programs and libraries will have to be re-compiled when hexagon is booted up after the finished upgrade.
Wednesday, 09:00: Upgrade has started. Machine is now down for a while for diagnostics.
Wednesday, 12:30: Half of the machine is now running again, while the other half is being upgraded to quad-core. We expect to take the entire machine down Friday morning. Please consider the machine to be in testing state, so unannounced downtime might occure.
Wednesday, 16:45: The upgrade is ahead of schedule, therefore the machine will be taken down tomorrow around 10am.
Thursday, 12:00: Two racks are now running, which will run till tomorrow morning, Friday 28th, and then the entire machine will be shutdown at 8am. The machine will then stay down untill, at least, Monday.
Friday, 08:00: Hardware part of upgrade is now finished. The machine is now unavailable until the software, diagnostics and testing has finished.
Saturday, 17:00: Main part of software upgrade is finished. The machine is running, but is unavailable due to testing.
Tuesday, April the 1st, 18:00: Hexagon is now available again, see http://www.parallaw.uib.no/syslog/153 for more details.
One very important part of this is that ALL programs and libraries will have to be re-compiled when hexagon is booted up after the finished upgrade.
Wednesday, 09:00: Upgrade has started. Machine is now down for a while for diagnostics.
Wednesday, 12:30: Half of the machine is now running again, while the other half is being upgraded to quad-core. We expect to take the entire machine down Friday morning. Please consider the machine to be in testing state, so unannounced downtime might occure.
Wednesday, 16:45: The upgrade is ahead of schedule, therefore the machine will be taken down tomorrow around 10am.
Thursday, 12:00: Two racks are now running, which will run till tomorrow morning, Friday 28th, and then the entire machine will be shutdown at 8am. The machine will then stay down untill, at least, Monday.
Friday, 08:00: Hardware part of upgrade is now finished. The machine is now unavailable until the software, diagnostics and testing has finished.
Saturday, 17:00: Main part of software upgrade is finished. The machine is running, but is unavailable due to testing.
Tuesday, April the 1st, 18:00: Hexagon is now available again, see http://www.parallaw.uib.no/syslog/153 for more details.
Shutdown of hexagon due to a short cooling stop
We will need to shut down hexagon due to a short cooling stop to fix a minor water-leakage in the new cooling system.
The downtime is scheduled for Tuesday March 4th from 0800 to 0900.
Update: The machine was down from 08:15 to 09:15.
The downtime is scheduled for Tuesday March 4th from 0800 to 0900.
Update: The machine was down from 08:15 to 09:15.
Power shutdown of fimm Thursday the 28th
Because of some additional power installation in our machine room, fimm has to be shutdown Thursday, February the 28th at 07:00.
The shutdown should not last more than an hour.
We are very sorry for the inconvenience caused by this.
Update Feb. 26th: Shutdown has been moved to 07:00, which means fimm will be shutdown shortly before then.
Update Feb. 28.th, 8:00: Fimm should now be running as normal.
The shutdown should not last more than an hour.
We are very sorry for the inconvenience caused by this.
Update Feb. 26th: Shutdown has been moved to 07:00, which means fimm will be shutdown shortly before then.
Update Feb. 28.th, 8:00: Fimm should now be running as normal.
Power shutdown on all systems Monday the 18th
All power to the Data block at HiB will be shutdown on Monday the 18th of Feb. at 18:00 till 19:00. We therefore have to shutdown all our systems prior the power shutdown. We will start to shutdown systems at 17:30.
Update, Monday, 20:25: All systems are now running again.
Update, Monday, 20:25: All systems are now running again.
File system crash on fimm
A file system crash occured at 19:50 today. We are working on fixing the problem.
Update 22:00: All file systems are now up and running again. All jobs that were running at the time of the system crash has to be resubmitted.
Update 22:00: All file systems are now up and running again. All jobs that were running at the time of the system crash has to be resubmitted.