Hexagon work filesystem crash

saerda • December 12, 2018

Dear Hexagon users,

work filesystem crashed yesterday night again due to hardware errors from some compute nodes and service nodes. For clean up errors, we have to shutdown hexagon and restart it again.

Update 12:16 Hexagon is restarted and back online now.

shared filesystem crash.

saerda • December 4, 2018

Dear shared filesystem users :

Today around 16:00, /shared filesystem mounted itself read-only automatically due to a bug in the version of the Lustre filesystem we are running.

This made whole /shared filesystem read-only.

We had to unmount /shared filesystem and eliminated error to avoid the bug.

We apologize for any inconvenience and appreciate your understanding.

Hexagon: urgent reboot is needed

lsz075 • December 3, 2018

Update 2018-12-03 12:36:

Hexagon is up now.
Interconnect errors are cleared now and /work file system is up and functional again.
Unfortunately the previously submitted jobs had to be canceled. Please resubmit your jobs.

Dear Hexagon User,

We must reboot Hexagon due to repeated errors on the interconnect.
Will update this case when Hexagon is up and functional again.

/work filesystem crash

saerda • October 25, 2018

/work filesystem on hexagon is crashed due to failed MDS server, we are working on it.

Scheduled maintenance for /shared file system on 5th of November

lsz075 • October 22, 2018

Update 12_11 21:30:

Migration is over, we manage to take up Lustre filesystem with new MDS server. /shared and /work filesystem is mounted on cyclone.hpc.uib.no and grunch.hpc.uib.no. Hexagon is up and running again. Samba and NFS exports are also running on Leo.hpc.uib.no.

Update 12_11 15:00 :

Migration is still ongoing, we will keep you posted.

Update 02_11 09:30 :

Due to the delayed delivery of physical parts, we have to postpone our downtime to 12th November. Corresponding node reservation on the hexagon is also postponed to 12th November.

Thank you for your consideration!

Dear HPC User,

The metadata server for the /shared file system has to be replaced/upgraded and therefore it must be unmounted from all the clients.

This will result in scheduled downtime for Hexagon, Grunch and Cyclone machines. We start at 08:00 AM on the 5th of November and expect to be ready by the end of the working day.

Thank you for your consideration!

Hexagon /work file system crashed

saerda • October 11, 2018

One of the fileserver for /work on hexagon crashed. we are working on the issue.

Hexagon downtime for /work filesystem maintenance

saerda • August 2, 2018

Hexagon will have planned maintenance on 15th August from 08:00.

Currently /work filesystem is running on reduced performance due to broken storage controller.

During the maintenance, we will replace the broken storage controller for the storage system where /work filesystem resides. Due to the high risk of data loss, we urge all /work filesystem users to backup their important, not reproducible data.
Please keep it in mind that work is not in backedup and work is scratch filesystem.

After the maintenance we expect /work filesystem will be back on full performance.

We appreciate your understanding.

Update 15.08.2018 11:00

Hexagon maintenance is over, we have successfully replaced the broken, controller. Work file-system is back to it's expected performance.

Work file-system crash on Hexagon

saerda • July 9, 2018

work file-system crashed Sunday afternoon, we manage to take it online again late Sunday. Jobs that are running on work file-system is crashed and has to be resubmitted.

Hexagon stop

saerda • June 18, 2018

Due to problem on the shared file-system we have to stop hexagon.

All running jobs will be killed.

15:50: Hexagon is up.

Hexagon: shared filesystem crashed