10:15 There is some problem with /work on fimm. We are working on it.
13:45 Update: /work is now accessible. The frontend had to be restarted, and gpfs restarted on one of the NAS boxes. All the compute nodes were OK and thus no running jobs were affected by this.
Author Archives: lsz075
Fire cluster upgraded to Rocks 4.1 OS
Fire cluster upgraded to Rocks 4.1 OS. It was therefore unavailable from 13:00 to 17:00 (no users were currently using fire, and no jobs were running).
Vim updated to version 6.4 on tre
Vim was updated to version 6.4 on tre (run "vim --version" to check which version you use).
Crash on fimm frontend by excessive interactive use
A user crashed the fimm frontend by using up all available memory to a memory intensive interactive process. The frontend was unavailable for login from 10.11.05 23:50 to 11.11.05 08:40. No jobs were affected.
Automatic removal of old files in /work on fimm
/work on fimm now has automatic removal (delete) of files when usage is over 80% for the filesystem. The script similar to that on tre: it deletes files older than 21 days, then older than 14 days and so on... untill usage is below 80%. Do NOT touch your files to keep it new, it will only cause the filesystem to go 100% full and jobs to crash. /work2 will be added to the script later.
/work on tre was 100% full
A user managed to generate a 800GB large file in /work on tre during the night - causing jobs to fail when the filesystem went 100% full. The file is now deleted. /work on tre had to be remounted (OK on to and en).
Support address has changed
NOTUR has changed domain from notur.org to notur.no
The email support address of support-uib@notur.org has therefore changed to support-uib@notur.no (hpc-support@hpc.uib.no can also be used)
Web-access to support has changed accordingly from https://support.notur.org to https://support.notur.no
NOTURs webpages can now be found on http://www.notur.no
The email support address of support-uib@notur.org has therefore changed to support-uib@notur.no (hpc-support@hpc.uib.no can also be used)
Web-access to support has changed accordingly from https://support.notur.org to https://support.notur.no
NOTURs webpages can now be found on http://www.notur.no
Maintenance summary (fimm)
fimm was down Tuesday Sep. 13 from 08:00 to 12:15 for filesystem-check (mmfsck) on gpfs filesystem, upgrade of gpfs, and reboot of satablade2 disk-cabinet (due to failure to accept new disk).
Scheduled downtime on fimm
Fimm will be down on Tuesday Sep. 13 from 08:00 to 12:00
One of the SATABlade disk-enclosures needs to be rebooted, and the /home/fimm gpfs filesystem needs to be unmounted for a filesystemcheck.
N.B.: Please delete any and all unnecessary files you may have on /home/fimm or /work* filesystems before the downtime to hasten the filesystem fixes.
One of the SATABlade disk-enclosures needs to be rebooted, and the /home/fimm gpfs filesystem needs to be unmounted for a filesystemcheck.
N.B.: Please delete any and all unnecessary files you may have on /home/fimm or /work* filesystems before the downtime to hasten the filesystem fixes.
backup and HSM-problems
The Tivoli Storage Manager database recovery log ran full, and then could no longer process backup or HSM-requests. The problem was noted at about 09:30, and resolved by 10:20.