Author Archives: lsz075

About lsz075

IT-avdelingen

tre reboot

lsz075 • June 12, 2006

Due to the new security updates installed, tre must be rebooted. This will hopefully also solve problems with totalview debugger.
Expected downtime: 1h (starting from Mon, 10:00)

Update: Mon, 12:45 - disk import problem caused a longer dowtime. Everything should be up and running again

Downtime: 2h 45'

bjerknes fileserver bregne down for os upgrade

lsz075 • May 2, 2006

bregne will be upgraded to centos 4. During this upgrade /net/bjerknes1 will be unavailable till approx. 13:00.

13:10 Update: The upgrade is complete and filesystem is back.

New NOTUR cpu-quota

lsz075 • April 3, 2006

CPU-quota for the period 2006-1 has now been activated on tre,fire (and fimm). Send a request to support-uib@notur.no if you (incorrectly) have wrong quota access. Please note that according to prior agreements the projects nn1118k, nn2343k, nn2701k and nn2980k on fire has been transfered to fimm with a cpu-factor of 4:1. Other projects on fire need to send a request to move any quota.

Memory-hang on TRE

lsz075 • March 15, 2006

Some process managed to use up all memory on tre around 16:03. The node is currently rebooting.

Update 16:48: Tre is now up again. Jobs running on tre were lost (but not to and en). 24 cpuhours downtime (0.75*32).

NFS problem on tre,to,en

lsz075 • February 19, 2006

Regatta nodes has nfs problems. NFS hangs from regatta to jambu (/net/bcmhsm) and to /migrate (on "to") - as well as from en,to to tre.
Seems like a nfs-client issue. I am working to resolve the problem.

15:45 Update: Everything is up again. Had to reboot "tre" and "to" as well as jambu. Jobs were lost (25% load at the time of reboot).

NB! Due to problems with NFS-export of /migrate we have unmounted /migrate on "tre" and "en". Do all copying to and from /migrate on to (as stated in /migrate/README). For copying to (and from) /migrate from fimm use
scp something.tar.gz to:/migrate/myusername/

(Note that Bjerknes has symlink from /migrate/username to /net/bcmhsm/username which is nfs-exported from jambu).

Cpuhours downtime: approx. 384

Tape robot and /migrate filesystem down for tapedrive upgrade

lsz075 • February 16, 2006

The taperobot is getting 2 new tapedrives installed and will be unavailable from 09:45 to approx. 11:00 16. Feb.
Files in /migrate (and /net/bcmhsm) will be unavailable.
This entry will be updated with more information later.

11:20 Update: The upgrade takes somewhat longer than planned.

12:45 Update: The upgrade is complete and filesystem back.

Software update on backup server (jambu)

lsz075 • February 13, 2006

The backup server has been updated with latest OS-maintenance release for AIX (5200-08) and latest tape-device drivers. In addition TSM backup server was updated to version 5.2.7 and TSM client to version 5.2.4. Downtime for restore and /net/bcmhsm (/migrate for Bjerknes) was only a few minutes during reboot.

Rebalancing of /work and /home/fimm on fimm

lsz075 • February 2, 2006

The GPFS filesystems /work and /home/fimm on fimm has become unbalanced. The needed filesystem-balancing was started last night and is still running. It will increase the IO load untill finished - hopefully sometime later today.

Matlab upgrade

lsz075 • January 18, 2006

Matlab on fimm upgraded to version 7.1.0.183 (R14) Service Pack 3

Memory and disk problem on regatta node “en”

lsz075 • January 10, 2006

Regatta node "en" had a memory fault at 0923 10.01.06. The node was rebooted. After reboot the node rejected one of the disks in /work filesystem. We are working to correct the problem. The other nodes are unaffected by this.

Update 13:45: node "en" is now up again.

HPC Syslog

Log over changes and events on UiB's HPC systems