Dear Hexagon Users,
Hexagon has been running without maintenance contract with Cray for more than a year now, moreover it is very difficult to get spare parts for it while hardware failures occur more often. Therefore we plan to switch off Hexagon on 28.12.2018. In consequence, job execution will not be possible after that date. Regarding data, there will be a grace period of 2 months (until the end of February 2019) to allow users to move their data out of the hexagon filesystem. At the end of the grace period, the following will happen:
• /home will be reformatted and only data for users of grunch and cyclone retained,
• /work will disappear as it is per today,
• /shared file system will be reconfigured and shrunk and only paid project spaces will be retained (as per today uniklima, gfi and skd subfolders),
• scientific applications installed in /shared/apps will be retained.
It is very important that you plan for this as soon as possible. Please do not hesitate to contact me or firstname.lastname@example.org
if you have any questions regarding this process.
Update 10:40: Access to cyclone is reopened now. Delay was caused by missing kernel modules and old Lustre packages.
Cyclone will be sharply rebooted at 09:00 to apply new filesystem settings.
Machine room will have power maintenance on February 3rd.
Following servers/services will be down during this time:
Everything under /shared/ and /Data will not be accessible. NFS and SMB exports will be offline.
The maintenance will start from 08:00 and will hopefully finish at 14:00. We kindly ask you to save all your work on mentioned servers and log out safely before servers are going down.
And we will keep you updated on this page.
Dear Hexagon users,
work filesystem crashed yesterday night again due to hardware errors from some compute nodes and service nodes. For clean up errors, we have to shutdown hexagon and restart it again.
Update 12:16 Hexagon is restarted and back online now.
Dear shared filesystem users :
Today around 16:00, /shared filesystem mounted itself read-only automatically due to a bug in the version of the Lustre filesystem we are running.
This made whole /shared filesystem read-only.
We had to unmount /shared filesystem and eliminated error to avoid the bug.
We apologize for any inconvenience and appreciate your understanding.
Update 2018-12-03 12:36
- Hexagon is up now.
- Interconnect errors are cleared now and /work file system is up and functional again.
- Unfortunately the previously submitted jobs had to be canceled. Please resubmit your jobs.
Dear Hexagon User,
We must reboot Hexagon due to repeated errors on the interconnect.
Will update this case when Hexagon is up and functional again.
/work filesystem on hexagon is crashed due to failed MDS server, we are working on it.
Update 12_11 21:30:
Migration is over, we manage to take up Lustre filesystem with new MDS server. /shared and /work filesystem is mounted on cyclone.hpc.uib.no and grunch.hpc.uib.no. Hexagon is up and running again. Samba and NFS exports are also running on Leo.hpc.uib.no.
Update 12_11 15:00 :
Migration is still ongoing, we will keep you posted.
Update 02_11 09:30 :
Due to the delayed delivery of physical parts, we have to postpone our downtime to 12th November. Corresponding node reservation on the hexagon is also postponed to 12th November.
Thank you for your consideration!
Dear HPC User,
The metadata server for the /shared file system has to be replaced/upgraded and therefore it must be unmounted from all the clients.
This will result in scheduled downtime for Hexagon, Grunch and Cyclone machines. We start at 08:00 AM on the 5th of November and expect to be ready by the end of the working day.
Thank you for your consideration!
One of the fileserver for /work on hexagon crashed. we are working on the issue.
Hexagon will have planned maintenance on 15th August from 08:00.
Currently /work filesystem is running on reduced performance due to broken storage controller.
During the maintenance, we will replace the broken storage controller for the storage system where /work filesystem resides. Due to the high risk of data loss, we urge all /work filesystem users to backup their important, not reproducible data.
Please keep it in mind that work is not in backedup and work is scratch filesystem.
After the maintenance we expect /work filesystem will be back on full performance.
We appreciate your understanding.
Update 15.08.2018 11:00
Hexagon maintenance is over, we have successfully replaced the broken, controller. Work file-system is back to it's expected performance.