A user crashed the fimm frontend by using up all available memory to a memory intensive interactive process. The frontend was unavailable for login from 10.11.05 23:50 to 11.11.05 08:40. No jobs were affected.
Automatic removal of old files in /work on fimm
/work on fimm now has automatic removal (delete) of files when usage is over 80% for the filesystem. The script similar to that on tre: it deletes files older than 21 days, then older than 14 days and so on... untill usage is below 80%. Do NOT touch your files to keep it new, it will only cause the filesystem to go 100% full and jobs to crash. /work2 will be added to the script later.
/work on tre was 100% full
A user managed to generate a 800GB large file in /work on tre during the night - causing jobs to fail when the filesystem went 100% full. The file is now deleted. /work on tre had to be remounted (OK on to and en).
Support address has changed
NOTUR has changed domain from notur.org to notur.no
The email support address of support-uib@notur.org has therefore changed to support-uib@notur.no (hpc-support@hpc.uib.no can also be used)
Web-access to support has changed accordingly from https://support.notur.org to https://support.notur.no
NOTURs webpages can now be found on http://www.notur.no
Maintenance summary (fimm)
fimm was down Tuesday Sep. 13 from 08:00 to 12:15 for filesystem-check (mmfsck) on gpfs filesystem, upgrade of gpfs, and reboot of satablade2 disk-cabinet (due to failure to accept new disk).
Scheduled downtime on fimm
Fimm will be down on Tuesday Sep. 13 from 08:00 to 12:00
One of the SATABlade disk-enclosures needs to be rebooted, and the /home/fimm gpfs filesystem needs to be unmounted for a filesystemcheck.
N.B.: Please delete any and all unnecessary files you may have on /home/fimm or /work* filesystems before the downtime to hasten the filesystem fixes.
backup and HSM-problems
The Tivoli Storage Manager database recovery log ran full, and then could no longer process backup or HSM-requests. The problem was noted at about 09:30, and resolved by 10:20.
Matlab on fimm now has spm toolbox
The matlab toolbox spm version 5b is now installed on fimm
http://www.fil.ion.ucl.ac.uk/spm/software/spm5b/
Fimm frontend hang
The fimm frontend was non-responsive from 19:56 to 20:40 due to excessive memory usage by a interactive user process causing swap-storm and oom-killing. Frontend rebooted.
Tape robot got new gripper 1
Tape robot was offline for 30 min. for change of a faulty gripper.
