A user crashed the fimm frontend by using up all available memory to a memory intensive interactive process. The frontend was unavailable for login from 10.11.05 23:50 to 11.11.05 08:40. No jobs were affected.
/work on fimm now has automatic removal (delete) of files when usage is over 80% for the filesystem. The script similar to that on tre: it deletes files older than 21 days, then older than 14 days and so on... untill usage is below 80%. Do NOT touch your files to keep it new, it will only cause the filesystem to go 100% full and jobs to crash. /work2 will be added to the script later.
A user managed to generate a 800GB large file in /work on tre during the night - causing jobs to fail when the filesystem went 100% full. The file is now deleted. /work on tre had to be remounted (OK on to and en).
fimm was down Tuesday Sep. 13 from 08:00 to 12:15 for filesystem-check (mmfsck) on gpfs filesystem, upgrade of gpfs, and reboot of satablade2 disk-cabinet (due to failure to accept new disk).
Fimm will be down on Tuesday Sep. 13 from 08:00 to 12:00
One of the SATABlade disk-enclosures needs to be rebooted, and the /home/fimm gpfs filesystem needs to be unmounted for a filesystemcheck.
N.B.: Please delete any and all unnecessary files you may have on /home/fimm or /work* filesystems before the downtime to hasten the filesystem fixes.
The Tivoli Storage Manager database recovery log ran full, and then could no longer process backup or HSM-requests. The problem was noted at about 09:30, and resolved by 10:20.
The fimm frontend was non-responsive from 19:56 to 20:40 due to excessive memory usage by a interactive user process causing swap-storm and oom-killing. Frontend rebooted.