Due to the new security updates installed, tre must be rebooted. This will hopefully also solve problems with totalview debugger.
Expected downtime: 1h (starting from Mon, 10:00)
Update: Mon, 12:45 - disk import problem caused a longer dowtime. Everything should be up and running again
Downtime: 2h 45'
Author Archives: lsz075
bjerknes fileserver bregne down for os upgrade
bregne will be upgraded to centos 4. During this upgrade /net/bjerknes1 will be unavailable till approx. 13:00.
13:10 Update: The upgrade is complete and filesystem is back.
13:10 Update: The upgrade is complete and filesystem is back.
New NOTUR cpu-quota
CPU-quota for the period 2006-1 has now been activated on tre,fire (and fimm). Send a request to support-uib@notur.no if you (incorrectly) have wrong quota access. Please note that according to prior agreements the projects nn1118k, nn2343k, nn2701k and nn2980k on fire has been transfered to fimm with a cpu-factor of 4:1. Other projects on fire need to send a request to move any quota.
Memory-hang on TRE
Some process managed to use up all memory on tre around 16:03. The node is currently rebooting.
Update 16:48: Tre is now up again. Jobs running on tre were lost (but not to and en). 24 cpuhours downtime (0.75*32).
Update 16:48: Tre is now up again. Jobs running on tre were lost (but not to and en). 24 cpuhours downtime (0.75*32).
NFS problem on tre,to,en
Regatta nodes has nfs problems. NFS hangs from regatta to jambu (/net/bcmhsm) and to /migrate (on "to") - as well as from en,to to tre.
Seems like a nfs-client issue. I am working to resolve the problem.
15:45 Update: Everything is up again. Had to reboot "tre" and "to" as well as jambu. Jobs were lost (25% load at the time of reboot).
NB! Due to problems with NFS-export of /migrate we have unmounted /migrate on "tre" and "en". Do all copying to and from /migrate on to (as stated in /migrate/README). For copying to (and from) /migrate from fimm use
scp something.tar.gz to:/migrate/myusername/
(Note that Bjerknes has symlink from /migrate/username to /net/bcmhsm/username which is nfs-exported from jambu).
Cpuhours downtime: approx. 384
Seems like a nfs-client issue. I am working to resolve the problem.
15:45 Update: Everything is up again. Had to reboot "tre" and "to" as well as jambu. Jobs were lost (25% load at the time of reboot).
NB! Due to problems with NFS-export of /migrate we have unmounted /migrate on "tre" and "en". Do all copying to and from /migrate on to (as stated in /migrate/README). For copying to (and from) /migrate from fimm use
scp something.tar.gz to:/migrate/myusername/
(Note that Bjerknes has symlink from /migrate/username to /net/bcmhsm/username which is nfs-exported from jambu).
Cpuhours downtime: approx. 384
Tape robot and /migrate filesystem down for tapedrive upgrade
The taperobot is getting 2 new tapedrives installed and will be unavailable from 09:45 to approx. 11:00 16. Feb.
Files in /migrate (and /net/bcmhsm) will be unavailable.
This entry will be updated with more information later.
11:20 Update: The upgrade takes somewhat longer than planned.
12:45 Update: The upgrade is complete and filesystem back.
Files in /migrate (and /net/bcmhsm) will be unavailable.
This entry will be updated with more information later.
11:20 Update: The upgrade takes somewhat longer than planned.
12:45 Update: The upgrade is complete and filesystem back.
Software update on backup server (jambu)
The backup server has been updated with latest OS-maintenance release for AIX (5200-08) and latest tape-device drivers. In addition TSM backup server was updated to version 5.2.7 and TSM client to version 5.2.4. Downtime for restore and /net/bcmhsm (/migrate for Bjerknes) was only a few minutes during reboot.
Rebalancing of /work and /home/fimm on fimm
The GPFS filesystems /work and /home/fimm on fimm has become unbalanced. The needed filesystem-balancing was started last night and is still running. It will increase the IO load untill finished - hopefully sometime later today.
Matlab upgrade
Matlab on fimm upgraded to version 7.1.0.183 (R14) Service Pack 3
Memory and disk problem on regatta node “en”
Regatta node "en" had a memory fault at 0923 10.01.06. The node was rebooted. After reboot the node rejected one of the disks in /work filesystem. We are working to correct the problem. The other nodes are unaffected by this.
Update 13:45: node "en" is now up again.
Update 13:45: node "en" is now up again.