Linux

New scheduled downtime of fimm due to cooling stop

lsz075 • January 18, 2008

There will be a scheduled downtime on fimm and its filesystems starting Monday January 21st at 0800. This is due to a replacement of the cooler in the machineroom.

We are very sorry for the inconvenience this very short notice may cause you.

The workmen responsible for the replacement have not given us an estimated time for when the cooling will be in place, but the work is expected to take about 2 days. More information will be posted here when we know it.

Update Monday, 1730: The coolers are in place and we are waiting for the plumbers to connect them to the cold-water system.

Update Tuesday, 1115: The plumbers have started the work connecting the coolers. Due to complications in this it is expected to be at least until Thursday afternoon before we can turn on the cooling again.

Update Friday, 1130: Fimm login node will be rebooted and upgraded shortly. This upgrade should take about 30 minuttes.
The plumbers are still working with the cooling, but we hope to be up and running later today.

Update Friday, 1140: Fimm login node has now been upgraded.

Update Friday, 1515: The cooling is now in place and fimm is again operational.

Scheduled downtime of fimm due to cooling stop

lsz075 • December 20, 2007

Wednesday, January the 2nd fimm has to be shutdown for a period because the cooler has to be moved from old power to new power.

This shutdown will start at 7:00 and last till 14:00.

Cooling failure

lsz075 • December 12, 2007

The old cooler (soon to be replaced) has failed.
Therefore the fimm cluster had to shut down to avoid overheating.

Update, 14:45: We now have the fimm frontend with filesystems operational for the moment. The queue will accept jobs, but they will not start since the nodes are not up. See note about scheduled downtime due to cooling system upgrade.

Update, Dec. 13th, 09:10: Due to limited cooling the frontend will be unavailable until the scheduled downtime has been completed.

Update, 19th 16:30: The cooling is now started again (see also updates on the schedule downtime note).

Filesystem instability on fimm

lsz075 • December 11, 2007

The global filesystems (GPFS) on fimm has some instability at the moment. We are investigating the underlaying cause of this problem.
Sorry for the inconvenience this causes you.

Update, 12th 14:45: We now expect the instability problem to be solved (but see other notes for downtime information).

Switch failure on fimm

lsz075 • November 22, 2007

One of the switches has failed on fimm. Because the file-servers were connected to this switch the filesystems went down.
We have now re-routed the fileservers and frontend to one of the other switches and changed the switch-interconnect. Most of fimm is now back up again, including filesystems, but jobs have failed due to missing filesystems.

Scheduled downtime on fimm/filesystems

lsz075 • November 5, 2007

Due to a necessary re-configuration of the power in the machine room, there will be a downtime of fimm cluster plus filesystems (/migrate, /bcmhsm, /bjerknes*, as well as fimm filesystems /work, /work2, /home).
All running jobs will of course be killed when the nodes are shut down.

The planned downtime is on Monday November 12th from 09:00 to 14:00. It *may* be somewhat shorter if all goes well.

Questions can be sent to support-uib@notur.no

We are sorry for the inconvenience this may cause you.

Update, Mon. 12th 14:50: fimm is now back up again. In addition to the power downtime we did a necessary kernel and gpfs upgrade.

Scheduled maintenance / upgrade of fimm

lsz075 • September 4, 2007

There will be a upgrade of fimm from Rocks 4.1 to Rocks 4.3 on Wed. Sep. 12th. Expected downtime is from 08:00 to 14:00. The new system will have updated OS, compilers and software and will be integrated with the grid-related activities.

This notice may be updated with more information at a later time.

GPFS filesystem problem on fimm

lsz075 • July 23, 2007

The GPFS filesystem went into a hang at 01:30, related to a power outage, and at 08:00 was forced unmounted/mounted. All running jobs were killed. The filesystem is now OK again.

Reboot of fimm frontend

lsz075 • January 23, 2007

Reboot of fimm frontend to clear filesystem hang.

Scheduled maintenance on /net/bcmhsm and /net/bjerknes1

lsz075 • January 4, 2007

We will need to do a scheduled maintenance (firmware upgrade) of the disksystem for /net/bcmhsm (for users from BCCR symlinked from /migrate) and /net/bjerknes1. Note that /net/bcmhsm is mounted as /bcmhsm on fimm.

/net/bcmhsm and /net/bjerknes1 will be unavailable on Monday 15. from 09:00 to 11:00 (if all goes well possibly earlier)

Update (11:00): /net/bcmhsm and /net/bjerknes1 is now up again. The downtime was also used to apply a security update on the backup-server (where /net/bcmhsm is).

HPC Syslog

Log over changes and events on UiB's HPC systems