Login2 was rebooted due to the hardware errors with the Ethernet card, rendering login2 unavailable from the network. The problem should be resolved now.
Hardware
Grunch: down
Both operating system disks failed in a short timeframe in Grunch making the system unoperational. We are trying to recover from the failure ASAP.
Update 14:00_06.10.2017: grunch server is up again. both os disks are replaced and grunch server are reinstalled.
/migrate and /bcmhsm offline on September 4th
Due to physical rearrangements in the server room the tape robot hosting /migrate and /bcmhsm will be unavailable today after 12:00 for several hours. Updates will be posted here.
Update 2017-09-11:
Uni Computing is experiencing troubles with the backend holding /migrate and /bcmhsm and it is unknown yet when this will be fixed. As these file systems were supposed to be already decommissioned earlier this year in June, we will not mount those back in ordinary place even after the file systems are healthy. However, we will finish transfer of IMR/HI files as it was agreed as soon as the filesystem is healthy. We will issue a separate update for this.
Other users than IMR/HI needing files from those file systems are advised to contact Uni Computing helpdesk at trouble@computing.uni.no.
Hexagon: IMR volumes offline
The network equipment connecting Hexagon and IMR has to be changed and needs a maximum two hours downtime.
Therefore IMR volumes will be unmounted on Tuesday, 29th of August from 09:00 AM for approximately two hours. By that time, please stop all your processes on Hexagon which are using the IMR volumes.
Hexagon & Grunch: Planned downtime for 25th of August
On Friday, 25th of August maintenance on electric lines in the server room will be carried out. Therefore Hexagon must be switched off. All related file systems (/work, /work-common) will be also off.
The maintenance will start at 07:00 and according to the plan should last until 13:00 o'clock.
During this time work-common will not be available on Grunch .
Update:
- 25.08.2017 07:00: Maintenance has started.
- 25.08.2017 12:50: Storage controller issues are delaying startup of the machine. We are working on the fix.
- 25.08.2017 15:05: Storage controller issues were remediated. Some disks are rebuilding for /work-common filesystem, thus performance impact might be expected for a couple of days.
- 25.08.2017 15:20: Hexagon is up again.
/migrate and /bcmhsm unavailable on April 6th
There is a new three hours scheduled maintenance for the storage serving /migrate and /bcmhsm.
This will take place on 6th of April starting from 13:00 o'clock.
/migrate and /bcmhsm unavailable on March 28th
There is a three hours scheduled maintenance for the storage serving /migrate and /bcmhsm.
This will take place on 28th of March starting from 13:00 o'clock.
Hexagon: decommissioned end of June 2017
All local CPU quotas will cease after 01.04.2017.
Login will be closed after 30.06.2017 so please make sure that all your data is transferred prior to that. Please plan this well in advance so that we avoid overload of the filesystem.
Grunch: Lustre filesystems offline on October 18th
/work and /work-common filesystems will be unavailable on Grunch on 18th of October starting from 09:00 o'clock. This downtime is part of the scheduled maintenance advertised at
http://syslog.hpc.uib.no/2016/09/21/hexagon-planned-maintenance-18-10-19-10/.
Length of downtime is up to 8 hours for /work-common and up to 2 days for /work.
Please make sure that by this time there are no jobs using /work or /work-common, to avoid data-loss and/or data corruption.
We will keep you updated here.
Update: 2016-10-19 11:07 /work-common is back online and re-mounted on grunch.
/work-common storage capacity extended
We have added additional 74TB storage capacity to /work-common.
