Author Archives: lsz075

About lsz075

IT-avdelingen

Maintenance summary

lsz075 • June 14, 2005

Regatta node TO and TRE had downtime from 08:00 to 12:45
for update of firmware.

Regatta node EN had downtime from 08:00 to 16:00
for update of firmware and change of 32GB memory module.
This node had problem booting from root-disks after hardware changes.
Moving the disks to TO and back again made EN bootable (unclear why).

Linux cluster FIRE had downtime from 08:00 to 16:00 due to dependancy on disks on EN.

Scheduled downtime on TRE+FIRE

lsz075 • June 7, 2005

The regatta cluster TRE will be down Tuesday June 14 08:00-14:00 for firmware upgrades, and replacement of a failed memory module on one of the nodes. Running jobs will be killed, and will have to be resubmitted after the maintenance stop.

Also the linux cluster FIRE will be down this periode, because it's depending on the regatta as file server.

Switch-problems on fimm

lsz075 • May 13, 2005

12-ports on one of the switches in the cluster stopped working at 02:00 this night, so we lost connection to 12 of the nodes for ~7 hours.

Affected nodes:

compute-0-18 compute-0-16 compute-0-11 compute-0-8 compute-0-7 compute-0-6 compute-0-5 compute-0-4 compute-0-3 compute-0-2 compute-0-1 compute-0-0

To resolve the problem, the failing switch had to be rebooted. This lead to a short (~30s) failure/unmount of the /work* and /home/fimm filesystems on all nodes. Uncertain how this affected running jobs. Most seems to have handled it without problems...

FIMM downtime

lsz075 • May 9, 2005

FIMM was down for scheduled maintanance 2005/05/09 08:00-10:00 = 2 hours of the full cluster.

The work that was done was:

o upgraded firmware on SATABlades

o move /local from the local disk of each node, to a shared disk, to save precious space for local /scratch usage.

NOTUR 2005 conference, http://www.notur.no/notur2005

lsz075 • May 3, 2005

The 5th anual gathering on High Performance Computing in Norway will be held in Trondheim, May 30-31, 2005. Please see http://www.notur.no/notur2005 for details.

Scheduled downtime on fimm

lsz075 • April 29, 2005

Fimm will be down monday May 9th. 08:00-12:00 for firmware upgrades on the SATABlade disk solution, and possibly other minor changes. This is to fix the bug that triggered the disk crashes on March 30th.

http://www.parallaw.uib.no/syslog/56

Intel compilers upgraded on fimm

lsz075 • April 20, 2005

The intel fortran and C/C++ compilers were upgraded from v8.1.023 to v8.1.027. This should fix a couple of internal compiler-errors we've been triggering.