Author Archives: lsz075

About lsz075

IT-avdelingen

Hexagon has updates libraries.

MPI
xt-mpt 4.0.3 -> 4.1.0.1

Math-libs
ACML 4.3.0 -> 4.4.0

Compilers
xt-asyncpe 3.7 -> 3.8

NOTES:

xt-mpt
Features:

The algorithms used for shmem_set_lock and shmem_clear_lock have
been improved for much better scaling. In a basic test of calls to set_lock
and clear_lock by a set of PEs all competing for the same lock, MPT
4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond
just a few, the time per PE for MPT 4.0.2 steadily increases with
the number of PEs whereas the time per PE for MPT 4.0.3 stays level.
At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2
and the difference keeps increasing. In addition, the new algorithm
grants the lock in the same order as the lock was requested whereas
with the old algorithm it was somewhat random which PE waiting for
the lock would get it next.

Adds support for dynamic libraries when using the cce compiler.

Bugs Fixed:
Bug 755075 MPICH2 threads/comm/ctxdup.c fails with "Too many communicators" in 4.0.0.3 vs 3.5.1"
Bug 755698 MPI_Allgatherv hangs when using thread-safety
Bug 755490 SHMEM performance over Seastar needs improvements
Bug 755426 Divide by zero by MPIIO if file is not a Lustre file

ACML
See ACML documentation at AMD

To use checkpointing feature application must be compiled with blcr and Cray MPT version 3.0.1 and up:

module load blcr

With loaded module all necessary options will be automatically added to the compiler wrapper. Only MPI and SHMEM programming models are supported.

Job script must have at least the following parameter:
#PBS -c enabled

See man qsub for more parameters.

To checkpoint and hold the job user executes:
qhold JOBID

To continue:
qrls JOBID

The Cray checkpoint/restart solution uses BLCR software from Berkley Lab's and inherits its limitations. For more information, refer to the BLCR documentation: http://upc-bugs.lbl.gov/blcr/doc/html/index.html.

For reconfiguring home file system setup on Fimm cluster and avoid missing home folder issue on all computer nodes , we will have downtime for whole Fimm cluster on 6th of April.
All Fimm cluster is reserved for maintenance from 11:00 on 6th of April, New submitted jobs which will not be able to finish before that time will not be able to run. All jobs which is already running and will not be able to finish before that time will be killed.

We will come with more information regarding to new configuration of home file system on Fimm cluster and keep you updated of the maintenance.

If you have any question please contact hpc-support@hpc.uib.no or support-uib@notur.no.

Several libraries and compilers have been updated on Hexagon.

MPI:
xt-mpt 4.0.2 -> 4.0.3

Math libs:
xt-libsci 10.4.2 -> 10.4.3
PETSc 3.0.0.9 -> 3.0.0.10
libfast 1.0.6 -> 1.0.7

Compilers:
PGI 10.2.0 -> 10.3.0
Intel 11.1.064 -> 11.1.069

NOTES:

xt-mpt:

Features:
The algorithms used for shmem_set_lock and shmem_clear_lock have been improved for much better scaling. In a basic test of calls to set_lock and clear_lock by a set of PEs all competing for the same lock, MPT 4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond just a few, the time per PE for MPT 4.0.2 steadily increases with the number of PEs whereas the time per PE for MPT 4.0.3 stays level. At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2 and the difference keeps increasing. In addition, the new algorithm grants the lock in the same order as the lock was requested whereas with the old algorithm it was somewhat random which PE waiting for the lock would get it next.

xt-libsci:

Bugs fix in Libsci 10.4.3 release
757748 LIBSCI - */lib/libsci_mc12.so missing for all compilers.
757785 libsci_m12.a missing in gnu/lib/44 and gnu/lib/43 formats
757821 Libsci 10.4.2 is not compatible with PGI 9.0 and earlier

libfast:

This release of libfast_mv 1.0.7 contains two new routines
* frda_sqrt(), an array version of the square root function, sqrt();
* frda_rsqrt(), an array version of the inverse square root function, 1/sqrt().

PETSc:

New hypre-2.6.0b https://computation.llnl.gov/casc/hypre/software.html

PGI:

The following bugs are fixed in the PGI 10.3.0 release.
754306 pgcc compiling #include with -Xa compiler option yields 968 lines of error messages [TPR 16276]
754847 SLES 11 missing macro def for __CPU_ISSET [TPR 16594]
755699 PGI pgf90 OpenMP doesn't issue message for missing SAVE attribute for var in THREADPRIVATE [16504]
756213 On XT the PGI (10.0.0) compiler fails with 'asm' instruction in [TPR 16620]
756425 PGF90-F-0000-Internal compiler error. [16527]
757047 PGI OpenMP pgf90 should give msg if ALLOCATABLE array in THREADPRIVATE doesn't have SAVE attribute [16504]
757169 PGI OpenMP pgf90 ignores task to create a file when task appears in sequential part of program [16602]
757662 PGI 10.2.0 incompatible with glibc >=2.7 CPU_SET [TPR 16594]

All users when logging into hexagon login nodes automatically will be "niced" to +5, each session on login node is limited to 100 running processes. This is not anyhow reflects on compute nodes. Jobs will not be affected.

This is done primarily to remove effect of one user high CPU tasks affects another users on the same login node.

Please give a feedback via support-uib@notur.no