Please see details at https://docs.hpc.uib.no/wiki/HPC_course_2015
Author Archives: Alexander Oltu
Hexagon: scheduled maintenance on Oct. 20th
There will be a maintenance on Hexagon on October, 20th from 9:00. We are planning to finish by the end of the same day.
Queue system has reservation in place. It will not allow to run jobs which will not finish before the maintenance start.
During this maintenance slot we will:
- Apply Cray SW patches to improve stability, especially of the /work filesystem.
- Add qsub filter, it will replace email notifications when the job can’t start or has suboptimal parameters and instead it will provide output to terminal when one submits the job.
Update: 2015-10-20 09:00 - Scheduled maintenance has started.
Update: 2015-10-20 17:57 - Maintenance is finished. Please see changes at
http://syslog.hpc.uib.no/2015/10/20/hexagon-updated-software/
Hexagon: MDS server crash
A new MDS server crash. Some jobs may fail.
Hopefully the MDS crashes will be eliminated after the maintenance we are planing later this year (a separate announcement will come).
Hopefully the MDS crashes will be eliminated after the maintenance we are planing later this year (a separate announcement will come).
Hexagon: MDS server crash
Today at 8:23 primary MDS serving /work has crashed. This resulted that all IO to /work was suspended.
The failover MDS is up from 10:50 and serving /work fs. All IO should be recovered.
We will investigate cause of primary MDS crash on Monday.
The failover MDS is up from 10:50 and serving /work fs. All IO should be recovered.
We will investigate cause of primary MDS crash on Monday.
Hexagon: 2 login nodes crashed in the last 20 hours
2 login nodes were crashed by a process from the user space, asking too much memory. The jobs running from these nodes have stopped.
The following jobs were affected:
1780945
1781097
1781123
1781528
1781848
1782040
1782089
1782093
1782097
1782101
1782121
1782155
1782280
The following jobs were affected:
1780945
1781097
1781123
1781528
1781848
1782040
1782089
1782093
1782097
1782101
1782121
1782155
1782280
Hexagon: scheduled maintenance on June 16th 9:00-24:00
There will be a scheduled maintenance on Hexagon on June 16th starting from 9:00. We are expecting to finish on the evening of the same day.
During this maintenance slot we are going to upgrade queue system and perform some extra tasks, including replacing IO card on the metadata server.
Access to the machine will be closed and all running jobs will be terminated during this maintenance window. The queuing system has reservation in place so that the jobs which are not able to finish before the maintenance will not start. We are expecting that the idle jobs in the scheduler will not be affected.
Update: 2015-06-16 09:15 - Scheduled maintenance has started.
Update: 2015-06-16 23:48 - Maintenance has finished. We had to cleanup queue system from all jobs including idle and blocked. Please resubmit.
Hexagon: updated Cray compiler
We have updated Cray compiler on the short notice. The main reason is that version 8.3.11 fixes this issue:
- CCE/8.3.x gives wrong result for CPMD
- 818417 Request for support of array boundary checks on GPU
- 819430 Bounds issue with ASSOCIATE
- 823032 Compiler fails with OpenMP and named loop when optimisation is applied
- 823076 CCE ICE with OpenACC code
- 823268 CESM job hangs with CCE on XC30
- 823328 CCE/8.3.x gives wrong result for CPMD
- 823498 Reveal scopes OpenMP loop incorrectly UKMET-2558
- 823694 Fortran compiler too restrictive on function prefix order
- 823913 ICE - CALL with PDT and assumed length type parameters
- 823929 Error ftn-521 introduced into CCE 8.3.8
- 820220 unexpected floating point exception
- 824408 assignment to complex var from parameter pair in a subroutine
- 824557 Fortran compiler use association error
- 825045 INTERNAL COMPILER ERROR: "Invalid construct" (v_mt_util_pdg.c v98357, line 797)
Hexagon: removed older software
The following software have been removed from Hexagon:
- trilinos 11.6.1.0 11.12.1.0
- libsci 11.1.00 11.1.01 13.0.1
- tpsl/cray-tpsl 1.2.01 1.3.01 1.3.02 1.3.03
- petsc 3.2.02 3.4.3.0
- MPT (cray-mpich2, cray-mpich, cray-shmem) 5.5.1 5.5.2 5.5.3 5.5.4 5.6.1 5.6.2 5.6.3 5.6.4 6.0.0 6.0.1 6.1.0 6.2.1 6.3.0
- Cray Compiler (cce) 8.2.1 8.2.3 8.2.4 8.2.5 8.3.4 8.3.5 8.3.7
- gcc 4.6.1 4.8.0 4.8.1
- libsci (xt-libsci, cray-libsci) 11.1.00 12.1.01 12.1.2
- perftools / perftools-lite 5.3.0 5.3.1 6.0.0 6.1.0 6.1.1 6.1.2 6.1.3
- papi 4.2.0 5.1.0.2 5.1.1 5.2.0
- cray-ga / ga 5.0.2 5.1.0 5.1.0.1 5.1.0.2 5.1.0.3 5.1.0.4
Hexagon: change in default versions
The following modules are now default:
- CCE 8.3.10
- Chapel - 1.11.0
- Perftools 6.2.3, PAPI 5.4.0.1
- MPT 7.1.2 (cray-mpich-7.1.2, cray-shmem-7.1.2, cray-mpich-abi-7.1.2)
- cray-lgdb-2.4.2
- cray-libsci-13.0.3
- atp-1.8.1
- modules-3.2.10.3
- parallel-netcdf-1.5.0
- craypkg-gen-1.3.1
- cray-ccdb-1.0.6
Hexagon: new software and libraries available
The following new versions of software and libraries were installed on Hexagon:
- Cray Compiling Environment - CCE 8.3.10
- Chapel - 1.11.0
- Cray Performance Tools - Perftools 6.2.3, PAPI 5.4.0.1
- Cray Message Passing Toolkit - MPT 7.2.0
- Cray Debugging Support Tools - CDST 15.04 - ATP 1.8.1, CCDB 1.0.6, lgdb 2.4.2
- Cray Scientific and Math Libraries - CSML 15.04 - PETSc 3.5.3.0, Trilinos 11.12.1.2, TPSL 1.4.4
-
Cray Environment Setup and Compiling support - CENV 15.04 - craypkg-gen 1.3.1, craype 2.3.0, cray-modules 3.2.10.3