Crash on fimm frontend by excessive interactive use

lsz075 • November 11, 2005

A user crashed the fimm frontend by using up all available memory to a memory intensive interactive process. The frontend was unavailable for login from 10.11.05 23:50 to 11.11.05 08:40. No jobs were affected.

Automatic removal of old files in /work on fimm

lsz075 • October 21, 2005

/work on fimm now has automatic removal (delete) of files when usage is over 80% for the filesystem. The script similar to that on tre: it deletes files older than 21 days, then older than 14 days and so on... untill usage is below 80%. Do NOT touch your files to keep it new, it will only cause the filesystem to go 100% full and jobs to crash. /work2 will be added to the script later.

/work on tre was 100% full

lsz075 • October 20, 2005

A user managed to generate a 800GB large file in /work on tre during the night - causing jobs to fail when the filesystem went 100% full. The file is now deleted. /work on tre had to be remounted (OK on to and en).

Support address has changed

lsz075 • October 18, 2005

NOTUR has changed domain from notur.org to notur.no

The email support address of support-uib@notur.org has therefore changed to support-uib@notur.no (hpc-support@hpc.uib.no can also be used)

Web-access to support has changed accordingly from https://support.notur.org to https://support.notur.no

NOTURs webpages can now be found on http://www.notur.no

Maintenance summary (fimm)

lsz075 • September 13, 2005

fimm was down Tuesday Sep. 13 from 08:00 to 12:15 for filesystem-check (mmfsck) on gpfs filesystem, upgrade of gpfs, and reboot of satablade2 disk-cabinet (due to failure to accept new disk).

Scheduled downtime on fimm

lsz075 • September 5, 2005

Fimm will be down on Tuesday Sep. 13 from 08:00 to 12:00
One of the SATABlade disk-enclosures needs to be rebooted, and the /home/fimm gpfs filesystem needs to be unmounted for a filesystemcheck.

N.B.: Please delete any and all unnecessary files you may have on /home/fimm or /work* filesystems before the downtime to hasten the filesystem fixes.

backup and HSM-problems

lsz075 • August 26, 2005

The Tivoli Storage Manager database recovery log ran full, and then could no longer process backup or HSM-requests. The problem was noted at about 09:30, and resolved by 10:20.

Matlab on fimm now has spm toolbox

lsz075 • August 24, 2005

The matlab toolbox spm version 5b is now installed on fimm

http://www.fil.ion.ucl.ac.uk/spm/software/spm5b/

Fimm frontend hang

lsz075 • August 15, 2005

The fimm frontend was non-responsive from 19:56 to 20:40 due to excessive memory usage by a interactive user process causing swap-storm and oom-killing. Frontend rebooted.

Tape robot got new gripper 1

lsz075 • August 11, 2005

Tape robot was offline for 30 min. for change of a faulty gripper.

HPC Syslog

Log over changes and events on UiB's HPC systems