We will have a planned maintenance on Hexagon, starting on May 22nd at 09:00 AM. The maintenance is expected to last one day. During the maintenance we will carry out software and firmware upgrades as well service the hardware.
The job submission system has reservation in place, thus jobs which are not able to finish before maintenance start, will not be started.
/work-common will be unavailable during the maintenance period and will be unmounted from Grunch and Fimm.
UPDATES:
2017-05-22 09:00: Maintenance has started.
2017-05-22 14:16: /work-common is available again and remounted on Grunch.
2017-05-22 15:59: Maintenance has finished and access to Hexagon is re-opened.
There is a new three hours scheduled maintenance for the storage serving /migrate and /bcmhsm. This will take place on 6th of April starting from 13:00 o'clock.
There is a three hours scheduled maintenance for the storage serving /migrate and /bcmhsm. This will take place on 28th of March starting from 13:00 o'clock.
login5 ran out of memory yesterday (27.02.2017) around 18:16 and took about 15 minutes to recover. During this time the compute nodes were unable to contact the application scheduler running on login5 and some jobs might have crashed. A typical error message for this case is: "aprun: Apid nnnnnnn: close of the compute node connection after app startup barrier".
Login will be closed after 30.06.2017 so please make sure that all your data is transferred prior to that. Please plan this well in advance so that we avoid overload of the filesystem.
Four cabinets went down due to power issues caused by the storm. Storage controllers for /work-common are also affected. Hexagon was started without /work-common filesystem.
We are trying to fix issues with the filesystem controllers and get back the filesystem in production as soon as possible.
Update 2016-12-27 14:50: Troubles with /work-common storage controllers were mitigated and filesystem is taken back online. Hexagon had to be rebooted today at 14:15. All systems are up and functional again.
We are having problems with the high speed network on Hexagon. We are working on the problem. Update 11:31: Hexagon is up again. We had to disable one of the compute nodes due to hardware issues.