One of the switches has failed on fimm. Because the file-servers were connected to this switch the filesystems went down.
We have now re-routed the fileservers and frontend to one of the other switches and changed the switch-interconnect. Most of fimm is now back up again, including filesystems, but jobs have failed due to missing filesystems.
Downtime
Fimm file system hang
The file system on fimm hung around 14:00 today.
Update 15:00: File systems should now run as normal.
Update 15:00: File systems should now run as normal.
Scheduled downtime on fimm/filesystems
Due to a necessary re-configuration of the power in the machine room, there will be a downtime of fimm cluster plus filesystems (/migrate, /bcmhsm, /bjerknes*, as well as fimm filesystems /work, /work2, /home).
All running jobs will of course be killed when the nodes are shut down.
The planned downtime is on Monday November 12th from 09:00 to 14:00. It *may* be somewhat shorter if all goes well.
Questions can be sent to support-uib@notur.no
We are sorry for the inconvenience this may cause you.
Update, Mon. 12th 14:50: fimm is now back up again. In addition to the power downtime we did a necessary kernel and gpfs upgrade.
All running jobs will of course be killed when the nodes are shut down.
The planned downtime is on Monday November 12th from 09:00 to 14:00. It *may* be somewhat shorter if all goes well.
Questions can be sent to support-uib@notur.no
We are sorry for the inconvenience this may cause you.
Update, Mon. 12th 14:50: fimm is now back up again. In addition to the power downtime we did a necessary kernel and gpfs upgrade.
IBM regatta p690 “tre” decommissioned
As previously announced, the IBM p690 Regatta system "tre" is now decommissioned.
Questions regarding access to files etc. must quickly be sent to
support-uib@notur.no
Questions regarding access to files etc. must quickly be sent to
support-uib@notur.no
Power failure
Power failure on fimm and tre occured around 19:00.
Update 23:44: Most machines are up, however some filesystems are still down.
Update 23:44: Most machines are up, however some filesystems are still down.
Password file on fimm nodes corrupted
The password file on the fimm nodes has been corrupted so no new jobs will run. We are currently fixing the problem.
Update 14:51: Most nodes have now been reinstalled. Users should be able to submit jobs again.
Update 14:51: Most nodes have now been reinstalled. Users should be able to submit jobs again.
Important notice: tre.bccs.uib.no will be taken out of service
The IBM p690 Regatta tre.bccs.uib.no / tre.ii.uib.no will be shut down and decommissioned in the morning of Monday October 1st 2007 at 08:00.
All jobs must be finished, and all data and personal files must be copied out of the machine before this time. The only exception would of course be data on external disk like /migrate, /net/bcmhsm and /net/bjerknes*.
Any questions regarding this can be sent to support-uib@notur.no.
All jobs must be finished, and all data and personal files must be copied out of the machine before this time. The only exception would of course be data on external disk like /migrate, /net/bcmhsm and /net/bjerknes*.
Any questions regarding this can be sent to support-uib@notur.no.
GPFS hang on node “en”
Node en had a GPFS hang. GPFS restartet. Downtime approx. 30 min.
Hang of node “en”
Node "en" of the regattas has a hang. We are investigating.
Update 09:45: Node rebooted. No jobs lost.
Update 09:45: Node rebooted. No jobs lost.
/home/fimm crashed
The filesystem /home/fimm crashed for unknown reason on fimm. We are currently investigating.
Users will not be able to log in until this is fixed.
Update 16.40: The filesystem is now up and working again.
Users will not be able to log in until this is fixed.
Update 16.40: The filesystem is now up and working again.