Author Archives: lsz075

About lsz075

IT-avdelingen

Monday, August 11th at 14:00 is hexagon scheduled for maintenance. The current failed nodes, and a module will be replaced. The machine will be unavailable for approximately two hours.

Update: August 11th 14:05, hexagon is shutdown for maintenance

Update: 15:20, hardware part is finished, fw-update, diagnostics and checking starts.

Update: 17:35, hexagon is now up and running. Note that due to reserved time for benchmarking (final part of Acceptance test) it will take some hours before jobs will start (but the queue will accept new jobs).

Due to a GPFS file system hang on the fimm frontend, for a short period of time (hopefully 10-15 minutes), the frontend will not be available, all users need to log in again after this.

Update, 10:12: fimm frontend needs to be rebooted to clear the hang.
Update, 10:26: fimm frontend is now rebooted and up again.

As previously noted, we will have a scheduled downtime from 16:00 Tuesday July 8th. We will replace a faulty module and do some I/O-benchmarking which requires a reserved system. It is estimated that the machine will be available for login at 19:00.

Update, 16:00: hexagon is shutdown for hw replacement
Update, 16:45: hexagon is up.
Update, 19:10: hexagon is up and allowing users to login.

4 compute nodes (1 module) on hexagon have stopped responding and due to this also some of the login nodes and lustre filesystem. We will unfortunately need to reboot hexagon to clear the issue, jobs will need to be re-submitted.

Update 20:30, hexagon is now up again. Note that 4 high-mem nodes are now unavailable due to hardware errors.

Wednesday July 16th 08:00, will Fimm be unavailable while the file system and the queuing system is upgraded. This upgrade will most likely last until 17:00.

Please note that a reservation has been set on the system. Jobs must finish before July 16th, if not they will stay in the queue until the upgrade has been completed.

Update, July 16th 08:00: Upgrade is started. Machine will be unavailable until upgrade is complete.

Update, July 16th 15:30: Starting to reinstalling compute nodes. Hopefully the upgraded will be completed within few hours.

Update, July 16th 20:30: Fimm is now available. All global file systems has been upgraded. Queuing system has not been upgraded.