Hexagon went down at 12:35 due to cabinet7 power problem.
Update: 15:10 the machine is up without cabinet 7. We will have downtime at Thursday May 06 at 12:00 to fix cabinet 7. The queue have a reservation in place such that only jobs that can complete (according to asked for walltime) before the maintenance will start.
Author Archives: lsz075
Hexagon: Updated software/libraries
Hexagon has updates libraries.
MPI
xt-mpt 4.0.3 -> 4.1.0.1
Math-libs
ACML 4.3.0 -> 4.4.0
Compilers
xt-asyncpe 3.7 -> 3.8
NOTES:
xt-mpt
Features:
The algorithms used for shmem_set_lock and shmem_clear_lock have
been improved for much better scaling. In a basic test of calls to set_lock
and clear_lock by a set of PEs all competing for the same lock, MPT
4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond
just a few, the time per PE for MPT 4.0.2 steadily increases with
the number of PEs whereas the time per PE for MPT 4.0.3 stays level.
At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2
and the difference keeps increasing. In addition, the new algorithm
grants the lock in the same order as the lock was requested whereas
with the old algorithm it was somewhat random which PE waiting for
the lock would get it next.
Adds support for dynamic libraries when using the cce compiler.
Bugs Fixed:
Bug 755075 MPICH2 threads/comm/ctxdup.c fails with "Too many communicators" in 4.0.0.3 vs 3.5.1"
Bug 755698 MPI_Allgatherv hangs when using thread-safety
Bug 755490 SHMEM performance over Seastar needs improvements
Bug 755426 Divide by zero by MPIIO if file is not a Lustre file
ACML
See ACML documentation at AMD
MPI
xt-mpt 4.0.3 -> 4.1.0.1
Math-libs
ACML 4.3.0 -> 4.4.0
Compilers
xt-asyncpe 3.7 -> 3.8
NOTES:
xt-mpt
Features:
The algorithms used for shmem_set_lock and shmem_clear_lock have
been improved for much better scaling. In a basic test of calls to set_lock
and clear_lock by a set of PEs all competing for the same lock, MPT
4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond
just a few, the time per PE for MPT 4.0.2 steadily increases with
the number of PEs whereas the time per PE for MPT 4.0.3 stays level.
At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2
and the difference keeps increasing. In addition, the new algorithm
grants the lock in the same order as the lock was requested whereas
with the old algorithm it was somewhat random which PE waiting for
the lock would get it next.
Adds support for dynamic libraries when using the cce compiler.
Bugs Fixed:
Bug 755075 MPICH2 threads/comm/ctxdup.c fails with "Too many communicators" in 4.0.0.3 vs 3.5.1"
Bug 755698 MPI_Allgatherv hangs when using thread-safety
Bug 755490 SHMEM performance over Seastar needs improvements
Bug 755426 Divide by zero by MPIIO if file is not a Lustre file
ACML
See ACML documentation at AMD
Hexagon: login6 is going to be rebooted
Hexagon login6 node has been evicted from ost8 Lustre /work filesystem. Files located on ost8 on /work filesystem are not available from login6.
Please logoff from login6 and use other hexagon login nodes. Login6 is going to be rebooted as soon as all jobs started from it will be finished.
17/04 22:00 login6 has been rebooted and is available.
Please logoff from login6 and use other hexagon login nodes. Login6 is going to be rebooted as soon as all jobs started from it will be finished.
17/04 22:00 login6 has been rebooted and is available.
Hexagon: job checkpoint available
To use checkpointing feature application must be compiled with blcr and Cray MPT version 3.0.1 and up:
module load blcr
With loaded module all necessary options will be automatically added to the compiler wrapper. Only MPI and SHMEM programming models are supported.
Job script must have at least the following parameter:
#PBS -c enabled
See man qsub for more parameters.
To checkpoint and hold the job user executes:
qhold JOBID
To continue:
qrls JOBID
The Cray checkpoint/restart solution uses BLCR software from Berkley Lab's and inherits its limitations. For more information, refer to the BLCR documentation: http://upc-bugs.lbl.gov/blcr/doc/html/index.html.
module load blcr
With loaded module all necessary options will be automatically added to the compiler wrapper. Only MPI and SHMEM programming models are supported.
Job script must have at least the following parameter:
#PBS -c enabled
See man qsub for more parameters.
To checkpoint and hold the job user executes:
qhold JOBID
To continue:
qrls JOBID
The Cray checkpoint/restart solution uses BLCR software from Berkley Lab's and inherits its limitations. For more information, refer to the BLCR documentation: http://upc-bugs.lbl.gov/blcr/doc/html/index.html.
Fimm:downtime for whole cluster
For reconfiguring home file system setup on Fimm cluster and avoid missing home folder issue on all computer nodes , we will have downtime for whole Fimm cluster on 6th of April.
All Fimm cluster is reserved for maintenance from 11:00 on 6th of April, New submitted jobs which will not be able to finish before that time will not be able to run. All jobs which is already running and will not be able to finish before that time will be killed.
We will come with more information regarding to new configuration of home file system on Fimm cluster and keep you updated of the maintenance.
If you have any question please contact hpc-support@hpc.uib.no or support-uib@notur.no.
All Fimm cluster is reserved for maintenance from 11:00 on 6th of April, New submitted jobs which will not be able to finish before that time will not be able to run. All jobs which is already running and will not be able to finish before that time will be killed.
We will come with more information regarding to new configuration of home file system on Fimm cluster and keep you updated of the maintenance.
If you have any question please contact hpc-support@hpc.uib.no or support-uib@notur.no.
Hexagon: Updated software/libraries
Several libraries and compilers have been updated on Hexagon.
MPI:
xt-mpt 4.0.2 -> 4.0.3
Math libs:
xt-libsci 10.4.2 -> 10.4.3
PETSc 3.0.0.9 -> 3.0.0.10
libfast 1.0.6 -> 1.0.7
Compilers:
PGI 10.2.0 -> 10.3.0
Intel 11.1.064 -> 11.1.069
NOTES:
xt-mpt:
Features:
The algorithms used for shmem_set_lock and shmem_clear_lock have been improved for much better scaling. In a basic test of calls to set_lock and clear_lock by a set of PEs all competing for the same lock, MPT 4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond just a few, the time per PE for MPT 4.0.2 steadily increases with the number of PEs whereas the time per PE for MPT 4.0.3 stays level. At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2 and the difference keeps increasing. In addition, the new algorithm grants the lock in the same order as the lock was requested whereas with the old algorithm it was somewhat random which PE waiting for the lock would get it next.
xt-libsci:
Bugs fix in Libsci 10.4.3 release
757748 LIBSCI - */lib/libsci_mc12.so missing for all compilers.
757785 libsci_m12.a missing in gnu/lib/44 and gnu/lib/43 formats
757821 Libsci 10.4.2 is not compatible with PGI 9.0 and earlier
libfast:
This release of libfast_mv 1.0.7 contains two new routines
* frda_sqrt(), an array version of the square root function, sqrt();
* frda_rsqrt(), an array version of the inverse square root function, 1/sqrt().
PETSc:
New hypre-2.6.0b https://computation.llnl.gov/casc/hypre/software.html
PGI:
The following bugs are fixed in the PGI 10.3.0 release.
754306 pgcc compiling #include with -Xa compiler option yields 968 lines of error messages [TPR 16276]
754847 SLES 11 missing macro def for __CPU_ISSET [TPR 16594]
755699 PGI pgf90 OpenMP doesn't issue message for missing SAVE attribute for var in THREADPRIVATE [16504]
756213 On XT the PGI (10.0.0) compiler fails with 'asm' instruction in [TPR 16620]
756425 PGF90-F-0000-Internal compiler error. [16527]
757047 PGI OpenMP pgf90 should give msg if ALLOCATABLE array in THREADPRIVATE doesn't have SAVE attribute [16504]
757169 PGI OpenMP pgf90 ignores task to create a file when task appears in sequential part of program [16602]
757662 PGI 10.2.0 incompatible with glibc >=2.7 CPU_SET [TPR 16594]
MPI:
xt-mpt 4.0.2 -> 4.0.3
Math libs:
xt-libsci 10.4.2 -> 10.4.3
PETSc 3.0.0.9 -> 3.0.0.10
libfast 1.0.6 -> 1.0.7
Compilers:
PGI 10.2.0 -> 10.3.0
Intel 11.1.064 -> 11.1.069
NOTES:
xt-mpt:
Features:
The algorithms used for shmem_set_lock and shmem_clear_lock have been improved for much better scaling. In a basic test of calls to set_lock and clear_lock by a set of PEs all competing for the same lock, MPT 4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond just a few, the time per PE for MPT 4.0.2 steadily increases with the number of PEs whereas the time per PE for MPT 4.0.3 stays level. At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2 and the difference keeps increasing. In addition, the new algorithm grants the lock in the same order as the lock was requested whereas with the old algorithm it was somewhat random which PE waiting for the lock would get it next.
xt-libsci:
Bugs fix in Libsci 10.4.3 release
757748 LIBSCI - */lib/libsci_mc12.so missing for all compilers.
757785 libsci_m12.a missing in gnu/lib/44 and gnu/lib/43 formats
757821 Libsci 10.4.2 is not compatible with PGI 9.0 and earlier
libfast:
This release of libfast_mv 1.0.7 contains two new routines
* frda_sqrt(), an array version of the square root function, sqrt();
* frda_rsqrt(), an array version of the inverse square root function, 1/sqrt().
PETSc:
New hypre-2.6.0b https://computation.llnl.gov/casc/hypre/software.html
PGI:
The following bugs are fixed in the PGI 10.3.0 release.
754306 pgcc compiling #include with -Xa compiler option yields 968 lines of error messages [TPR 16276]
754847 SLES 11 missing macro def for __CPU_ISSET [TPR 16594]
755699 PGI pgf90 OpenMP doesn't issue message for missing SAVE attribute for var in THREADPRIVATE [16504]
756213 On XT the PGI (10.0.0) compiler fails with 'asm' instruction in [TPR 16620]
756425 PGF90-F-0000-Internal compiler error. [16527]
757047 PGI OpenMP pgf90 should give msg if ALLOCATABLE array in THREADPRIVATE doesn't have SAVE attribute [16504]
757169 PGI OpenMP pgf90 ignores task to create a file when task appears in sequential part of program [16602]
757662 PGI 10.2.0 incompatible with glibc >=2.7 CPU_SET [TPR 16594]
Hexagon: nice +5 for all login nodes
All users when logging into hexagon login nodes automatically will be "niced" to +5, each session on login node is limited to 100 running processes. This is not anyhow reflects on compute nodes. Jobs will not be affected.
This is done primarily to remove effect of one user high CPU tasks affects another users on the same login node.
Please give a feedback via support-uib@notur.no
This is done primarily to remove effect of one user high CPU tasks affects another users on the same login node.
Please give a feedback via support-uib@notur.no
Hexagon: cdo-login merged into cdo-cnl
cdo-login module has been merged with cdo-cnl module.
Please update your scripts!
Please update your scripts!
Hexagon: crash of HSN, March 11th
We got HSN (High Speed Network) link error between 2 cabinets and machine crashed. We are working to bring machine up.
Update: 17:30 Machine is now running again. Jobs which were running must be resubmitted.
Update: 17:30 Machine is now running again. Jobs which were running must be resubmitted.
Fimm: uibkvant file system
We will stop providing uibkvant file system from 12th of March (next
Friday ). File system uibkvant will be unmounted from fimm.bccs.uib.no
at 12:00.
Please make sure you backup all your necessary file/data.
contact support :support-uib@notur.no if you have any difficulties to do so.
Friday ). File system uibkvant will be unmounted from fimm.bccs.uib.no
at 12:00.
Please make sure you backup all your necessary file/data.
contact support :support-uib@notur.no if you have any difficulties to do so.