The disk space /work-common/shared/imr will not be available from 8:30 for a few hours. We will send a separate notice to affected users when the file system will be available.
We encourage users having data there to copy data necessary for your runs during this maintenance to /work file system. All jobs referencing to /work-common/shared/imr will be stopped before the maintenance.
We had to reboot login3 because of some issues with the processes stuck in uninterruptible state. The following jobs were terminated and needs to be resubmitted:
1654462.sdb
1657052.sdb
1650122.sdb
1654844.sdb
1657054.sdb
1655817.sdb
1653859.sdb
1655140.sdb
Our apologies for any inconvenience this could cause.
We have installed new versions of the following packages:
CCE 8.3.7
Cray Message Passing Toolkit - MPT 7.1.1
MPT 7.1.1 GA 5.3.0.1
Cray Debugging Support Tools - CDST 15.01
CCDB 1.0.5 lgdb 2.4.0
Cray Scientific and Math Libraries - CSML 15.01
PETSc 3.5.2.1 Trilinos 11.12.1.0 TPSL 1.4.3
cray-modules 3.2.10.2
Please find details here.
We are introducing a new software and libraries update routine. We will install new versions as not default and will switch them to be default in 1 month period.
Due to a cooling failure, Hexagon was forced to shut down. We are investigating the issue and will keep you updated.
Update:08:45 - Service is on-site trying to fix the cooling system. Will get back as soon as issue is remediated.
10:50 - Machine is up again.
Due to important security update we will shortly reboot above mentioned systems.
Our apologies for any inconvenience caused by this.
Update: Hexagon and Grunch were stopped at 11:45 and again available at 12:35. Fimm login nodes were rebooted in the background.
Again thunderstorm and power went down for a short moment, but long enough to stop Hexagon. We are working on bringing it up. The forecast is that it could be more lightnings in the next 24 hours.
These 2 last months were plenty of power interrupts due to weather, they were preventing stable runs.
Update: 22:10 Hexagon is up.
Hexagon went down because of power blink. There could be more power blinks, we will keep Hexagon down until storm Nina is over.
We expect to start it on Sunday morning.
Update: Hexagon is started and is up again since 11:30.