Scheduled maintenance

/work and /work-common filesystems will be unavailable on Grunch on 18th of October starting from 09:00 o'clock. This downtime is part of the scheduled maintenance advertised at
http://syslog.hpc.uib.no/2016/09/21/hexagon-planned-maintenance-18-10-19-10/.

Length of downtime is up to 8 hours for /work-common and up to 2 days for /work.

Please make sure that by this time there are no jobs using /work or /work-common, to avoid data-loss and/or data corruption.

We will keep you updated here.

Update: 2016-10-19 11:07 /work-common is back online and re-mounted on grunch.

We will have a two day planned maintenance on hexagon starting on 18th of October 09:00.

During the maintenance we will carry out filesystem upgrade, firmware upgrades as well service the hardware.

The job submission system has reservation in place, thus jobs which are not able to finish before maintenance start, will not be started.


Update: 2016-10-18 09:35 Maintenance has started, slightly delayed due to traffic jam.

Update: 2016-10-19 12:00 Maintenance has been finished, Hexagon is up and accessible again.

There is a scheduled maintenance on UPS and UPS power lines in HPC server room on Saturday, 20th Feb. All HPC resources will be stopped at 8:30, we are expecting this maintenance to finish before 17:00 same day.

Hexagon, FImm, Grunch and other connected to them resources will be unavailable. Hexagon queuing system has reservation in place, so that jobs which are not able to finish before the maintenance will not be started.

Update:
2016-02-20 07:45
 System maintenance has started.
2016-02-20 16:30 /work-common filesystem storage got damaged, recovery progress is ongoing.
2016-02-21 07:30 System maintenance has finished, HPC systems are functional again.

During the maintenance we have:

  • applied different firmware updates and patches
  • installed newer libraries, compilers and tools

Please note that all libraries compiled with previous version of PGI will have to be recompiled.

Below you will find the complete list of the newly installed software:

  • CCE 8.4.0
  • Chapel 1.12.0
  • Craype 2.4.2
  • GCC 5.1.0
  • FFTW 3.3.4.5
  • HDF5 1.8.14
  • PGI 15.3.0
  • PerfTools 6.3.0
  • MPI 7.2.5
  • NetCDF 4.3.3.1
  • Totalview 8.15.7

There will be a maintenance on Hexagon on October, 20th from 9:00. We are planning to finish by the end of the same day.
Queue system has reservation in place. It will not allow to run jobs which will not finish before the maintenance start.
During this maintenance slot we will:
  • Apply Cray SW patches to improve stability, especially of the /work filesystem.
  • Add qsub filter, it will replace email notifications when the job can’t start or has suboptimal parameters and instead it will provide output to terminal when one submits the job.
P.S. During the maintenance /work-common will not be available on GRUNCH and FIMM.


Update:
 2015-10-20 09:00 - Scheduled maintenance has started.

Update: 2015-10-20 17:57 - Maintenance is finished. Please see changes at 

http://syslog.hpc.uib.no/2015/10/20/hexagon-updated-software/

There will be a scheduled maintenance on Hexagon on June 16th starting from 9:00. We are expecting to finish on the evening of the same day. During this maintenance slot we are going to upgrade queue system and perform some extra tasks, including replacing IO card on the metadata server. Access to the machine will be closed and all running jobs will be terminated during this maintenance window. The queuing system has reservation in place so that the jobs which are not able to finish before the maintenance will not start. We are expecting that the idle jobs in the scheduler will not be affected. Update: 2015-06-16 09:15 - Scheduled maintenance has started. Update: 2015-06-16 23:48 - Maintenance has finished. We had to cleanup queue system from all jobs including idle and blocked. Please resubmit.

The disk space /work-common/shared/imr will not be available from 8:30 for a few hours. We will send a separate notice to affected users when the file system will be  available. We encourage users having data there  to copy data necessary for your runs during this maintenance to /work file system. All jobs referencing to /work-common/shared/imr will be stopped before the maintenance.

System maintenance is still ongoing, during the whole day today.

Update 2014.11.25 18:00 Due to unexpected behaviour during update we regret to inform that the maintenance has to be extended. Will will come later with further updates.

Update 2014.11.25 21:27 We have to postpone opening of hexagon due to issues with the scheduling system. We are working tightly with Cray to fix this issue.

Update 2014.11.26 20:33 Issues with the job submission system requires us to delay opening. It well can be that system will not be opened before next week. We try to fix it as soon as possible.

Update 2014.11.27 11:24 The majority of issues were resolved and Hexagon is now available. One of the main remaining issues is interactive job submission, which will be handled during next week, without stopping machine for an extra maintenance.