Downtime

We will have very short down time for work file system on fimm, Tuesday at 12:00, we have to unmount work file system from all cluster nodes, which means all running jobs which is using work file system has to be stopped.

Down time will take about 10-15 minutes as we estimated. We will keep you updated.

All running jobs will be checked on the cluster, individual notice will be given.

09/02 12:11 Update

Down time is finished , work file system is mounted back to all cluster nodes.

/bcmhsm filesystem expirience software problem. To avoid data corruption or unexpected results we have to temporary stop it. Case is under investigation with high priority. Updates will be posted.

NB: If there are users which demands highly urgent files from that filesystem, please post a service request to support email address with list of files which you would like to restore and location to where.

Update: 17/12 20:55 filesystem is back online.

Due to hardware update on fimm login node and master node , we will have short down time on fimm cluster coming Wednesday, 9th of December, fimm login node will not be available from 13:00~16:00, all the running jobs which is not be able to finish until that time will crash , and has to be resubmitted, reservation set on fimm cluster, so that jobs will not finish before downtime will not be able to run.

We will keep information updated.

Hi,
Due to firmware update on the storage system, We have to take down work file system on fimm.

We will start update firmware from 12:00 Monday (23th NOV), it will last for 3-4 hours, during that time fimm will be accessible without work file system. All the compute nodes reserved from now for update, job which can not finish before the update will not run.

We will keep information updated as it goes.

12:30 UPDATE work file system unmounted from cluster, preparing for
firmware update .

18:00 UPDATE firmware update on storage system failed some of the disc firmware update , we are working on it.

20:45 UPDATE firmware update finished. work file system mounted back to the cluster.

20:50 UPDATE reservation is canceled, all jobs will start to run.

Hexagon will have a scheduled maintenance on Monday Nov. 23rd from 13:00 to approx. 19:00. Some software updates and hardware replacements will be made. The queue have a reservation in place such that only jobs that can complete (according to asked for walltime) before the maintenance will start.
This note will be updated when we have more information.

Update: 19:08 Maintenance finished, system is up and open for users.

Work file system crashed on fimm Friday night, all jobs using work file system also crashed. We blocked login node for maintenance and working on it. We will keep you updated.

Update 2009-09-13 16:19

There are some disk failed on work file system. We are investigating the issue.

Update 13:00 2009-09-14

Work file system is mounted back. All jobs which were using work file system before the file system crash has to be resubmitted. Fimm login node updated to the new kernel and latest version of GPFS.

Sorry for all inconvenience.




Due to a needed security update that requires a reboot we will be forced to do the next maintenance of hexagon earlier than planned. We will therefore have a scheduled maintenance starting on Thursday Sep. 10th at 13:00.

Job-scheduler reservation is now in place so that only jobs that can finish (according to requested walltime) before the scheduled maintenance will be allowed to start.

During the maintenance we will install a security update as well as replacing a few faulty hardware components.

We will update this note when we have more information about expected length or ongoing progress for the maintenance.

As usual, send any questions to support-uib@notur.no.

Update 16:30: Machine is now up again and ready for use.