Dear Hexagon Users,
Hexagon has been running without maintenance contract with Cray for more than a year now, moreover it is very difficult to get spare parts for it while hardware failures occur more often. Therefore we plan to switch off Hexagon on 28.12.2018. In consequence, job execution will not be possible after that date. Regarding data, there will be a grace period of 2 months (until the end of February 2019) to allow users to move their data out of the hexagon filesystem. At the end of the grace period, the following will happen:
• /home will be reformatted and only data for users of grunch and cyclone retained,
• /work will disappear as it is per today,
• /shared file system will be reconfigured and shrunk and only paid project spaces will be retained (as per today uniklima, gfi and skd subfolders),
• scientific applications installed in /shared/apps will be retained.
It is very important that you plan for this as soon as possible. Please do not hesitate to contact me or firstname.lastname@example.org
if you have any questions regarding this process.
Dear Hexagon users,
work filesystem crashed yesterday night again due to hardware errors from some compute nodes and service nodes. For clean up errors, we have to shutdown hexagon and restart it again.
Update 12:16 Hexagon is restarted and back online now.
Dear shared filesystem users :
Today around 16:00, /shared filesystem mounted itself read-only automatically due to a bug in the version of the Lustre filesystem we are running.
This made whole /shared filesystem read-only.
We had to unmount /shared filesystem and eliminated error to avoid the bug.
We apologize for any inconvenience and appreciate your understanding.
One of the fileserver for /work on hexagon crashed. we are working on the issue.
work file-system crashed Sunday afternoon, we manage to take it online again late Sunday. Jobs that are running on work file-system is crashed and has to be resubmitted.
Due to problem on the shared file-system we have to stop hexagon.
All running jobs will be killed.
15:50: Hexagon is up.
The shared filesystem on hexagon crashed around 14:00 today.
we are working on the issue.
Hexagon crashed today around 09:30, We are working on resolving the problem and taking up hexagon.
12:45 Update : hexagon is up, but we have hardware problem with fileserver which is
providing work file system.
Work filesystem has crashed again on Hexagon. We are having a severe problem with work filesystem on hexagon and Grunch. We are working on to find out the root cause of the problem, meanwhile work filesystem will be unstable on Hexagon, we will get all users updated about the process.
We are sorry for the inconvenience and appreciate your understanding.
Hexagon work filesystem is down due to crashed lustre mds server. We are working on that issue.
Update 15:00 : hexagon work filesystem is back online. Jobs that are running during the crash probably died. We looking in to the root cause of the problem.