Due to a power blink this morning at 08:00, the non-UPS part of hexagon went down. We will do the maintenance scheduled for Wednesday before turning the machine on again. Further information will be posted here about the progress.
Update 12:15, The scheduled maintenance is completed and the machine is running again.
Author Archives: lsz075
Scheduled maintenance for hexagon
Because of a faulty fc-switch, we are scheduling maintenance on hexagon Wednesday the 6th of February at 14:00.
The mainenance should not take longer than an hour.
Update, Tuesday, 11:00: Due the hexagon reboot we replaced the fc-switch today. The maintenance tomorrow is therefore canceled.
Update Tuesday 12:15, The scheduled maintenance is completed and the machine is running again.
The mainenance should not take longer than an hour.
Update, Tuesday, 11:00: Due the hexagon reboot we replaced the fc-switch today. The maintenance tomorrow is therefore canceled.
Update Tuesday 12:15, The scheduled maintenance is completed and the machine is running again.
The new Cray XT4 machine “hexagon” is now available
The new machine "hexagon.bccs.uib.no", a Cray XT4 MPP, is now available for users. The machine was installed in the beginning of January this year and has had a few test users until now. The machine will be upgraded from its current configuration with dual-cores to the final, quad-core, configuration later this spring. After the upgrade the formal acceptance test will be executed.
More information about the machine can be found on the documentation pages.
The machine is available for "local" users from the University of Bergen, IMR and NERSC. These users can apply for an account by using the form found here
In addition, users that have acquired quota from the NOTUR consortium will have access to the NOTUR part of the machine.
Any questions regarding support, the documentation or other matters related to this can be sent to hpc-support@hpc.uib.no or the alias support-uib@notur.no
More information about the machine can be found on the documentation pages.
The machine is available for "local" users from the University of Bergen, IMR and NERSC. These users can apply for an account by using the form found here
In addition, users that have acquired quota from the NOTUR consortium will have access to the NOTUR part of the machine.
Any questions regarding support, the documentation or other matters related to this can be sent to hpc-support@hpc.uib.no or the alias support-uib@notur.no
New scheduled downtime of fimm due to cooling stop
There will be a scheduled downtime on fimm and its filesystems starting Monday January 21st at 0800. This is due to a replacement of the cooler in the machineroom.
We are very sorry for the inconvenience this very short notice may cause you.
The workmen responsible for the replacement have not given us an estimated time for when the cooling will be in place, but the work is expected to take about 2 days. More information will be posted here when we know it.
Update Monday, 1730: The coolers are in place and we are waiting for the plumbers to connect them to the cold-water system.
Update Tuesday, 1115: The plumbers have started the work connecting the coolers. Due to complications in this it is expected to be at least until Thursday afternoon before we can turn on the cooling again.
Update Friday, 1130: Fimm login node will be rebooted and upgraded shortly. This upgrade should take about 30 minuttes.
The plumbers are still working with the cooling, but we hope to be up and running later today.
Update Friday, 1140: Fimm login node has now been upgraded.
Update Friday, 1515: The cooling is now in place and fimm is again operational.
We are very sorry for the inconvenience this very short notice may cause you.
The workmen responsible for the replacement have not given us an estimated time for when the cooling will be in place, but the work is expected to take about 2 days. More information will be posted here when we know it.
Update Monday, 1730: The coolers are in place and we are waiting for the plumbers to connect them to the cold-water system.
Update Tuesday, 1115: The plumbers have started the work connecting the coolers. Due to complications in this it is expected to be at least until Thursday afternoon before we can turn on the cooling again.
Update Friday, 1130: Fimm login node will be rebooted and upgraded shortly. This upgrade should take about 30 minuttes.
The plumbers are still working with the cooling, but we hope to be up and running later today.
Update Friday, 1140: Fimm login node has now been upgraded.
Update Friday, 1515: The cooling is now in place and fimm is again operational.
Scheduled downtime of fimm due to cooling stop
Wednesday, January the 2nd fimm has to be shutdown for a period because the cooler has to be moved from old power to new power.
This shutdown will start at 7:00 and last till 14:00.
This shutdown will start at 7:00 and last till 14:00.
Cooling failure
The old cooler (soon to be replaced) has failed.
Therefore the fimm cluster had to shut down to avoid overheating.
Update, 14:45: We now have the fimm frontend with filesystems operational for the moment. The queue will accept jobs, but they will not start since the nodes are not up. See note about scheduled downtime due to cooling system upgrade.
Update, Dec. 13th, 09:10: Due to limited cooling the frontend will be unavailable until the scheduled downtime has been completed.
Update, 19th 16:30: The cooling is now started again (see also updates on the schedule downtime note).
Therefore the fimm cluster had to shut down to avoid overheating.
Update, 14:45: We now have the fimm frontend with filesystems operational for the moment. The queue will accept jobs, but they will not start since the nodes are not up. See note about scheduled downtime due to cooling system upgrade.
Update, Dec. 13th, 09:10: Due to limited cooling the frontend will be unavailable until the scheduled downtime has been completed.
Update, 19th 16:30: The cooling is now started again (see also updates on the schedule downtime note).
Filesystem instability on fimm
The global filesystems (GPFS) on fimm has some instability at the moment. We are investigating the underlaying cause of this problem.
Sorry for the inconvenience this causes you.
Update, 12th 14:45: We now expect the instability problem to be solved (but see other notes for downtime information).
Sorry for the inconvenience this causes you.
Update, 12th 14:45: We now expect the instability problem to be solved (but see other notes for downtime information).
Scheduled downtime due to cooling upgrade
There will be a scheduled downtime on fimm and associated services starting on Thursday 13th Dec. The length of the downtime is uncertain at this time, but we expect to be up again on Tuesday 18th Dec.
The downtime is due to a major cooling system upgrade in the machineroom. We expect to need a second, smaller, downtime to finalize this upgrade in January.
Further information will be posted here when we know more.
Update, 12th 14:45: Because of the cooling failure (see other note above) the work on the cooling upgrade will start earlier.
Update, 13th 09:10: The frontend will now be unavailable until upgrade is completed.
Update, 17th 19:00: The expected startup of the cooling system on Tuesday has been delayed. We estimate the startup of the power and cooling systems around 12:00 Wednesday.
Update, 19th 12:00: The expected startup of the cooling system has been delayed for a couple of hours.
Update, 19th 16:30: Finally fimm is up and running again.
The downtime is due to a major cooling system upgrade in the machineroom. We expect to need a second, smaller, downtime to finalize this upgrade in January.
Further information will be posted here when we know more.
Update, 12th 14:45: Because of the cooling failure (see other note above) the work on the cooling upgrade will start earlier.
Update, 13th 09:10: The frontend will now be unavailable until upgrade is completed.
Update, 17th 19:00: The expected startup of the cooling system on Tuesday has been delayed. We estimate the startup of the power and cooling systems around 12:00 Wednesday.
Update, 19th 12:00: The expected startup of the cooling system has been delayed for a couple of hours.
Update, 19th 16:30: Finally fimm is up and running again.
Downtime of /migrate, /bcmhsm, /bjerknes* on Monday
On Monday December the 10th we will have maintenance on the backup server and Bregne from 8am to 4pm. This will have a impact on some filesystems mounted on fimm:
/bcmhsm
/migrate
/bjerknes1
/bjerknes2
/bjerknes3
If the maintenance goes quicker than planned we will report this here.
Update: Dec. 10, 16:45: All filesystemes are now online again.
/bcmhsm
/migrate
/bjerknes1
/bjerknes2
/bjerknes3
If the maintenance goes quicker than planned we will report this here.
Update: Dec. 10, 16:45: All filesystemes are now online again.
Fimm login node will be rebooted
The login node on fimm will be rebooted today at 16:30.
This is related to the switch failure. A reboot is needed to get the system running correctly again. The machine should be up within minutes.
All jobs, except interactive, should be unaffected by the reboot.
Interactive jobs will be killed.
Update: 17:15: Fimm booted and running as normal.
This is related to the switch failure. A reboot is needed to get the system running correctly again. The machine should be up within minutes.
All jobs, except interactive, should be unaffected by the reboot.
Interactive jobs will be killed.
Update: 17:15: Fimm booted and running as normal.