Due to the new security updates installed, tre must be rebooted. This will hopefully also solve problems with totalview debugger.
Expected downtime: 1h (starting from Mon, 10:00)
Update: Mon, 12:45 - disk import problem caused a longer dowtime. Everything should be up and running again
Downtime: 2h 45'
Downtime
bjerknes fileserver bregne down for os upgrade
bregne will be upgraded to centos 4. During this upgrade /net/bjerknes1 will be unavailable till approx. 13:00.
13:10 Update: The upgrade is complete and filesystem is back.
13:10 Update: The upgrade is complete and filesystem is back.
Memory-hang on TRE
Some process managed to use up all memory on tre around 16:03. The node is currently rebooting.
Update 16:48: Tre is now up again. Jobs running on tre were lost (but not to and en). 24 cpuhours downtime (0.75*32).
Update 16:48: Tre is now up again. Jobs running on tre were lost (but not to and en). 24 cpuhours downtime (0.75*32).
NFS problem on tre,to,en
Regatta nodes has nfs problems. NFS hangs from regatta to jambu (/net/bcmhsm) and to /migrate (on "to") - as well as from en,to to tre.
Seems like a nfs-client issue. I am working to resolve the problem.
15:45 Update: Everything is up again. Had to reboot "tre" and "to" as well as jambu. Jobs were lost (25% load at the time of reboot).
NB! Due to problems with NFS-export of /migrate we have unmounted /migrate on "tre" and "en". Do all copying to and from /migrate on to (as stated in /migrate/README). For copying to (and from) /migrate from fimm use
scp something.tar.gz to:/migrate/myusername/
(Note that Bjerknes has symlink from /migrate/username to /net/bcmhsm/username which is nfs-exported from jambu).
Cpuhours downtime: approx. 384
Seems like a nfs-client issue. I am working to resolve the problem.
15:45 Update: Everything is up again. Had to reboot "tre" and "to" as well as jambu. Jobs were lost (25% load at the time of reboot).
NB! Due to problems with NFS-export of /migrate we have unmounted /migrate on "tre" and "en". Do all copying to and from /migrate on to (as stated in /migrate/README). For copying to (and from) /migrate from fimm use
scp something.tar.gz to:/migrate/myusername/
(Note that Bjerknes has symlink from /migrate/username to /net/bcmhsm/username which is nfs-exported from jambu).
Cpuhours downtime: approx. 384
Tape robot and /migrate filesystem down for tapedrive upgrade
The taperobot is getting 2 new tapedrives installed and will be unavailable from 09:45 to approx. 11:00 16. Feb.
Files in /migrate (and /net/bcmhsm) will be unavailable.
This entry will be updated with more information later.
11:20 Update: The upgrade takes somewhat longer than planned.
12:45 Update: The upgrade is complete and filesystem back.
Files in /migrate (and /net/bcmhsm) will be unavailable.
This entry will be updated with more information later.
11:20 Update: The upgrade takes somewhat longer than planned.
12:45 Update: The upgrade is complete and filesystem back.
Memory and disk problem on regatta node “en”
Regatta node "en" had a memory fault at 0923 10.01.06. The node was rebooted. After reboot the node rejected one of the disks in /work filesystem. We are working to correct the problem. The other nodes are unaffected by this.
Update 13:45: node "en" is now up again.
Update 13:45: node "en" is now up again.
Problem with /work on fimm
10:15 There is some problem with /work on fimm. We are working on it.
13:45 Update: /work is now accessible. The frontend had to be restarted, and gpfs restarted on one of the NAS boxes. All the compute nodes were OK and thus no running jobs were affected by this.
13:45 Update: /work is now accessible. The frontend had to be restarted, and gpfs restarted on one of the NAS boxes. All the compute nodes were OK and thus no running jobs were affected by this.
Crash on fimm frontend by excessive interactive use
A user crashed the fimm frontend by using up all available memory to a memory intensive interactive process. The frontend was unavailable for login from 10.11.05 23:50 to 11.11.05 08:40. No jobs were affected.
Maintenance summary (fimm)
fimm was down Tuesday Sep. 13 from 08:00 to 12:15 for filesystem-check (mmfsck) on gpfs filesystem, upgrade of gpfs, and reboot of satablade2 disk-cabinet (due to failure to accept new disk).
Scheduled downtime on fimm
Fimm will be down on Tuesday Sep. 13 from 08:00 to 12:00
One of the SATABlade disk-enclosures needs to be rebooted, and the /home/fimm gpfs filesystem needs to be unmounted for a filesystemcheck.
N.B.: Please delete any and all unnecessary files you may have on /home/fimm or /work* filesystems before the downtime to hasten the filesystem fixes.
One of the SATABlade disk-enclosures needs to be rebooted, and the /home/fimm gpfs filesystem needs to be unmounted for a filesystemcheck.
N.B.: Please delete any and all unnecessary files you may have on /home/fimm or /work* filesystems before the downtime to hasten the filesystem fixes.