Hardware

We have to unmount /work-common from Hexagon because of HW issues with the /work-common MDS server.

We are working to fix this problem ASAP.

Update 21:35: We have moved MDS to one of the OSTs until MDS HW is fixed. The file system can have slightly degraded performance. It should be available on all compute nodes and login nodes, except login5. We are working to make it available on login5 as well.

Update 14.05 13:00 : /work-common is back on login5

One of the OST /work FS nodes crashed. We are working on it. /work fs currently is unavailable.

Update:13:12 OST was recovered , /work FS should be back online
Update:25.11 15:44 new crash of the same node in filesystem. We are working to fix FS ASAP.
Update:25.11 16:15 /work is back alive. We had to disable quota.
Update:26.11 5:30 This time another OST crashed, fs is online, we are investigating root cause for OST crashes.

Due to a needed security update that requires a reboot we will be forced to do the next maintenance of hexagon earlier than planned. We will therefore have a scheduled maintenance starting on Thursday Sep. 10th at 13:00.

Job-scheduler reservation is now in place so that only jobs that can finish (according to requested walltime) before the scheduled maintenance will be allowed to start.

During the maintenance we will install a security update as well as replacing a few faulty hardware components.

We will update this note when we have more information about expected length or ongoing progress for the maintenance.

As usual, send any questions to support-uib@notur.no.

Update 16:30: Machine is now up again and ready for use.