Scheduled maintenance

There will be a maintenance on Hexagon on October, 20th from 9:00. We are planning to finish by the end of the same day.
Queue system has reservation in place. It will not allow to run jobs which will not finish before the maintenance start.
During this maintenance slot we will:
  • Apply Cray SW patches to improve stability, especially of the /work filesystem.
  • Add qsub filter, it will replace email notifications when the job can’t start or has suboptimal parameters and instead it will provide output to terminal when one submits the job.
P.S. During the maintenance /work-common will not be available on GRUNCH and FIMM.


Update:
 2015-10-20 09:00 - Scheduled maintenance has started.

Update: 2015-10-20 17:57 - Maintenance is finished. Please see changes at 

http://syslog.hpc.uib.no/2015/10/20/hexagon-updated-software/

There will be a scheduled maintenance on Hexagon on June 16th starting from 9:00. We are expecting to finish on the evening of the same day. During this maintenance slot we are going to upgrade queue system and perform some extra tasks, including replacing IO card on the metadata server. Access to the machine will be closed and all running jobs will be terminated during this maintenance window. The queuing system has reservation in place so that the jobs which are not able to finish before the maintenance will not start. We are expecting that the idle jobs in the scheduler will not be affected. Update: 2015-06-16 09:15 - Scheduled maintenance has started. Update: 2015-06-16 23:48 - Maintenance has finished. We had to cleanup queue system from all jobs including idle and blocked. Please resubmit.

The disk space /work-common/shared/imr will not be available from 8:30 for a few hours. We will send a separate notice to affected users when the file system will be  available. We encourage users having data there  to copy data necessary for your runs during this maintenance to /work file system. All jobs referencing to /work-common/shared/imr will be stopped before the maintenance.

System maintenance is still ongoing, during the whole day today.

Update 2014.11.25 18:00 Due to unexpected behaviour during update we regret to inform that the maintenance has to be extended. Will will come later with further updates.

Update 2014.11.25 21:27 We have to postpone opening of hexagon due to issues with the scheduling system. We are working tightly with Cray to fix this issue.

Update 2014.11.26 20:33 Issues with the job submission system requires us to delay opening. It well can be that system will not be opened before next week. We try to fix it as soon as possible.

Update 2014.11.27 11:24 The majority of issues were resolved and Hexagon is now available. One of the main remaining issues is interactive job submission, which will be handled during next week, without stopping machine for an extra maintenance.

We will have a planned hexagon maintenance on November, 24th and 25th.
The maintenance will start on 24th at 9:00 and we expect it will take
around 2 days.

The job submission system has reservation in place such as the jobs
which are not able to finish before the maintenance will not be started.

We are planning to do the following during the maintenance window:
* Update Cray Management Software to 7.2.UP02
* Update Cray Linux Environment to 5.2.UP02
* Update Lustre file system on /work and /work-common
* Apply different firmware updates and patches
* Install newer libraries, compilers and tools

IMPORTANT!: After the maintenance, the default MPT will be 7.x (MPICH
3.1). All software will have to be recompiled. We will come with the
details and options after the maintenance.

The maintenance will start at 9:00 and we expect it will take around 12 hours.

The job submission system has reservation in place such as the jobs which are not able to finish before the maintenance will not be started.

During this timeslot we are going to do the following:

* Install newer libraries, compilers and tools
* Update Cray Management Software to 7.2.UP00
* Update Cray Linux Environment to 5.2.UP00
* Apply different firmware updates and patches

IMPORTANT!: After the maintenance, the default MPT will be 7.0.x (MPICH 3.1). All software will have to be recompiled. We will come with the details and options after the maintenance.

The detailed list of the new software being installed:
* CCE 8.3.3
* MPT 7.0.3
* PMI 5.0.5
* Perftools 6.2.1
* PAPI 5.3.2
* LibSci 13.0.1
* Trilinos 3.5.1.0
* GCC 4.9.1
* PGI 14.7.0
* HDF5 1.8.13
* Netcdf 4.3.2
* Parallel-NetCDF 1.5.0
* Craype 2.2.0
* ATP 1.7.5
* LGDB 2.3.2
* Stat 2.1.0.1
* Dwarf 14.2.0
* CCDB 1.0.3
* TotalView 8.14

Hexagon is going to have a scheduled maintenance on April, 23rd. The
maintenance will start at 9:30. The expected downtime is about 12 hours.

During the maintenance we are going to do the following:

* Upgrade the compute node Linux to 4.2UP02.
* Upgrade the management station base OS and Cray software release.
* Apply different security patches.
* Upgrade the storage firmware.

All running jobs will terminated. The job submission system has a
reservation in place, it will not allow to start jobs which will not
be able to finish before the maintenance start.

Update 20:10 The maintenance is over. The machine is back online.