Software

The following software/libraries have been updated on hexagon. Users are encouraged to recompile their programs to gain any performance and stability benefits.

MPI
xt-mpt 5.0.1 -> 5.1.0

Math-libs
xt-libsci 10.4.7 -> 10.4.8
PETSc 3.1.01 -> 3.1.02
GA 4.3.1

Compilers and wrappers
PGI 10.6.0 -> 10.8.0
GCC 4.5.0 -> 4.5.1
xt-asyncpe 4.2 -> 4.3

Performance tools and debuggers
xt-craypat/apprentice2 5.1.1 -> 5.1.2
MRNet 2.2.0.1 -> 3.0.0
stat 1.0.0 -> 1.1.0

NOTES:

xt-craypat
General
* more complete support for PGAS, UPC and CAF programs
pat_build
* the 'caf' and 'upc' trace groups now include tracing the 'pgas' group
* some shared objects of executable files that pat_build executed could
not be found - fixed truncation of LD_LIBRARY_PATH
* update Global Array trace wrappers
* fix tracing pthread functions using GNU PrgEnv
* fix tracing user-defined functions for GNU 4.5.0 Fortran

Bugs closed since 5.1.1 release
763880 App2/5.1.2: HW counters overview shows negative numbers
763877 App2/5.1.2: HW counters overview shows more than 100%
763439 tracing user-defined functions results in seg fault for GNU
Fortran 4.5.0
762905 No CAF specific info is provided when instrumenting a CAF code with -g caf
762904 craypat provides misleading info on PGAS codes running on XE6

MPI
The following features were added to MPT 5.1.0:

- Totalview message queue debugging is now supported on XE systems.
This feature can be enabled by setting the MPICH_MSG_QUEUE_DBG
environment variable. For more information about this feature see
the intro_mpi man page.

- SHMEM users can now get the SHMEM version displayed by setting
the SHMEM_VERSION_DISPLAY environment variable when running on
either XT or XE systems.

In addition the following features were added in MPT updates since
the MPT 5.0.0 release and are also included:

- The MPI single copy optimization feature is now enabled by default.
The feature can be disabled by using the MPICH_SMP_SINGLE_COPY_OFF
env variable. For more information about this feature see the
intro_mpi man page.

- Additional information is also now emitted with MPI tracebacks to
more easily correlate them with console messages. Also certain
memory
registration errors will now have the virtual memory maps dumped to
aid in users debugging their programs.

- Cray SHMEM users also can use the new SHMEM_BTE_THRESHOLD env
variable to adjust when SHMEM uses the Block Transfer Engine(BTE)
for off-node transfers.

Several bugs were also fixed in this release and are listed below.

Bugs fixed:
761094,760949,763512,762928 - Fix bytes_transferred field in the gni
netmod LMT RDMA write path
763677,760868 - Fix for LMT path for MPI_Issend hangs


xt-asyncpe
Bugs fixed in this release
760967 MPICH2 tests failing for -sdefault64 -dynamic calling
MPI_Initialized( flag, ierr ) - bad flag returned
761681 Cannot override PGI -tp option from xtpe-* module
763605 mpi applications using pgi and default64 fail
763724 "ftn -default64" is no longer adding "-i8 -r8" to PGI
compilations.
763787 empty parameter reference causes syntax error

xt-libsci
LibSci 10.4.8 includes bugfixes.
761427 xt-libsci/10.4.3 with dynamic libraries
763728 Remove "conflict xt-libsci-gemini" from xt-libsci modulefiles

PETSc
The PETSc 3.1.02 release contains PETSc 3.1-p4 patch set and provides
performance improvements in ATF-CASK SpMV Multiply Kernels PETSc for
AIJ format.

Bugs fixed in this release:
763445 Problem with PetscSynchronizedFlush
763494 Problem with variable definition files (PETSC_LIB_DIR is not
set in the conf/variables)

More detailed information about the official PETSc-3.1 release is
available at
http://www.mcs.anl.gov/petsc/petsc-as/documentation/changes/31.html
SuperLU-4.0 information can be found at
http://crd.lbl.gov/~xiaoye/SuperLU/#superlu

Known problem:
--------------
Missing library entry points in the PETSc complex dynamic libraries,
for non-GNU libraries only.
763837 petsc-complex will not link -dynamic

MRNet
Update to the 3.0 release of the MRNet library.

MRNet 3.0 release notes:
http://www.paradyn.org/mrnet/release_3.0/README

MRNet 3.0 Programmer's Guide:
http://www.paradyn.org/mrnet/release_3.0/mrnet_3.0.html

GA
This is the initial release of Global Arrays Library.
Global Arrays is a portable Non-Uniform Memory Access (NUMA) shared-
memory programming environment for distributed and shared memory
computers. It augments the message-passing model by providing
a shared-memory like access to distributed dense arrays.

For more information visit:
http://www.emsl.pnl.gov/docs/global/ga.html

GCC
The following bugs are fixed in the gcc 4.5.1 release.
760192 GNU gfortran OpenMP - untied task accesses threadprivate -
non-conforming but no msg [44085]
763453 seg fault in gcc 4.5.0

PGI
Features of PGI 10.8.0 are documented at:
http://www.pgroup.com/doc/pgiwsrn108.pdf
The following bugs are fixed in the PGI 10.8.0 release.
761844 internal compiler error [17079]

Several software libraries and programs have been updated on hexagon.
The full pdf-announcement from Cray can be read from http://docs.cray.com/books/S-9401-1008//S-9401-1008.pdf a summary is given below.

MPI and wrappers
xt-mpt 5.0.1 -> 5.0.2
xt-asyncpe 4.1 -> 4.2

Compilers and debug tools
gcc 4.4.4 -> 4.5.0
ATP 1.0.2 -> 1.0.3
gcc-mpc 0.8.1
gcc-gmp 4.3.2
gcc-mpfr 2.3.1 -> 2.4.2

Scientific libraries
netcdf 4.0.1.3 -> 4.1.1.0
hdf5 1.8.4.1 -> 1.8.5.0
petsc 3.1.00 -> 3.1.01
xt-libsci 10.4.6 -> 10.4.7

Performance tools
xt-craypat 5.1.0 -> 5.1.1
apprentice2 5.1.0 -> 5.1.1
xt-papi 3.7.2.0.5 -> 4.1.0

Notes for xt-mpt
The following features were added to MPT 5.0.2:

- The MPI single copy optimization feature is now enabled by default.
The feature can be disabled by using the MPICH_SMP_SINGLE_COPY_OFF
env variable. For more information about this feature see the
intro_mpi man page.

- Additional information is also now emitted with MPI tracebacks to
more easily correlate them with console messages. Also certain
memory
registration errors will now have the virtual memory maps dumped to
aid in users debugging their programs.

- Cray SHMEM users also can use the new SHMEM_BTE_THRESHOLD env
variable to adjust when SHMEM uses the Block Transfer Engine(BTE)
for off-node transfers.

Several bugs were also fixed in this release.

The following software have been updated on hexagon:

xt-mpt MPI
5.0.0 -> 5.0.1: Bug fixes

xt-libsci math lib
10.4.5 -> 10.4.6:
Bug fixes.
LibSci 10.4.6 includes new CRAFFT routines to compute
real-to-complex/complex-to-real distributed 3d FFTs of any size.
These routines are crafft_pd2z3d and crafft_pz2d3d. Also,
two routines, crafft_total_size_2d_r2c and crafft_total_size_3d_r2c,
were added to assist users in calculating the local size of
the distributed data on each process.
Users requiring more information on usage should see the
intro_crafft manpage.

xt-asyncpe compiler wrappers
Differences:
Both the content and logic of driver script generated INFO messages is changed.
INFO messages are handled as additions to "verbose" output and do not display
unless
1)the user specifies "-v",
2)the user specifies "-V" or
3)the user sets XTPE_INFO_MESSAGE_ON to some value.
Otherwise, beginning with xt-asyncpe 4.1.7
(release 4.1), the INFO messages are not displayed by default.

The old environment variable, XTPE_INFO_MESSAGE_OFF, which was
used to turn off INFO messages is deprecated at xt-asyncpe/4.1. A new
environment variable, XTPE_INFO_MESSAGE_ON, can be set to "something"
to make INFO messages display by default.


Modules
3.1.6.5 -> 3.1.6.6: Bug fixes

PGI compiler
10.5.0 -> 10.6.0: Bug fixes

Totalview debugger
8.8.0 -> 8.8.0a: Activate Replay Engine

Chapel
1.0.2 -> 1.1.1
See /opt/chapel/1.1.1/CHANGES for more information

xt-craypat Performance Tools
5.0.1 -> 5.1.0

* PAPI has been updated to 3.7.2.0.5

* Beginning with the 5.1 release, CrayPat includes license check
support through the FLEXnet license server. Sites installing the 5.1
performance tools software will need to obtain and install a license key
before use.

* New imbalance calculation in Call Tree (imbalance for all functions
except for those represented by MPI collective sync time is calculated
as MAX-AVE, sync time is calculated as AVE - MIN)

* Support for the following predefined trace groups has been added:
aio (functions that perform asynchronous IO)
adios (Adaptable I/O System API)
armci (Aggregate Remote Memory Copy)
chapel (Chapel language compile and runtime library API)
dmapp (Distributed Memory Application API for Gemini)
ga (Global Array API)
pblas (Parallel Basic Linear Alegbra Subroutines)
petsc (Portable Extensible Toolkit for Scientific Computation)
pgas (Parallel Global Address Space)
realtime (POSIX realtime extensions)

* Support for dynamically linked applications. Dynamically linked programs
can be instrumented and use all of the experiments and features that are
supported for statically linked programs.

* Back button in Cray Apprentice2 Call Tree display - accessed by right
clicking in the display background, allows the user to revert to previous
displays after filtering tree.

* Path to maximum load imbalance or "hot path" now highlighted in Cray
Apprentice2 Call Tree display

* Performance improvement to PE sort in Cray Apprentice2 Load Balance
display (off of Overview)

* Add program wallclock time added in Cray Apprentice2 caliper area for
files containing RTS data

* Faster load of initial data into Cray Apprentice2

The following software and libraries have been updated on hexagon:

MPI
xt-mpt 4.1.1 -> 5.0.0
MPI 2.2 compliance, except Dynamic Process Management

Compilers
PGI 10.4 -> 10.5
GCC 4.4.3 -> 4.4.4
xt-asyncpe 3.9 -> 4.0
java (security update) jdk1.6.0_20


Math/libs

xt-libsci 10.4.4 -> 10.4.5

LibSci 10.4.5 includes minor increases in functionality/support
of distributed CRAFFT routines. The enhancement improves the
coverage of 2d and 3d FFT routines by allowing real type work
arrays and input arguments for all transform types.

CRAFFT offers a simpler interface to improve application developer
productivity. In some cases the distributed CRAFFT 2.1 transforms
exhibit up to 10% speedup over comparable FFTW2 distributed
transforms.

Users requiring more information on usage should see the
intro_crafft manpage.


PETSc 3.0.0.10 -> 3.1.00

This new version of PETSc includes several changes including
performance enhancements of the sparse kernels used in the
incomplete LU preconditioning for AIJ and BAIJ matrix formats.
In Cray PETSc, these new kernels are further improved through
the new routines from Cray Adaptive Sparse Kernels (CASK).
In addition, the latest SuperLU-4.0 is included in this new
PETSc product.

More detailed information about the official PETSc-3.1 release is
available at
http://www.mcs.anl.gov/petsc/petsc-as/documentation/changes/31.html
SuperLU-4.0 information can be found at
http://crd.lbl.gov/~xiaoye/SuperLU/#superlu

Hexagon has updates libraries.

* MPT 4.1.1
Bug fixes.
* xt-asyncpe 3.9
Bug fixes.
* Cray Scientific and Math Libraries 4.13
LibSci 10.4.4
CRAFFT update
Trilinos 10.2.0
CASK Update
* PGI 10.4
Bug fix release update from PGI.
* Cray Debugger Supporting Tools 1.0.2
ATP 1.0.2
Bug fixes
* TotalView 8.8.0
Replay Engine Feature release.

Hexagon has got NCL version which is capable to run with aprun. Latest module version 5.2.0 is aprun compatible. This version is loaded by default if you do module load ncl_ncarg.

If you miss some features and you want to run ncl on login node, then module load ncl_ncarg/5.2.0-login shall be used.

Hexagon has updates libraries.

MPI
xt-mpt 4.0.3 -> 4.1.0.1

Math-libs
ACML 4.3.0 -> 4.4.0

Compilers
xt-asyncpe 3.7 -> 3.8

NOTES:

xt-mpt
Features:

The algorithms used for shmem_set_lock and shmem_clear_lock have
been improved for much better scaling. In a basic test of calls to set_lock
and clear_lock by a set of PEs all competing for the same lock, MPT
4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond
just a few, the time per PE for MPT 4.0.2 steadily increases with
the number of PEs whereas the time per PE for MPT 4.0.3 stays level.
At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2
and the difference keeps increasing. In addition, the new algorithm
grants the lock in the same order as the lock was requested whereas
with the old algorithm it was somewhat random which PE waiting for
the lock would get it next.

Adds support for dynamic libraries when using the cce compiler.

Bugs Fixed:
Bug 755075 MPICH2 threads/comm/ctxdup.c fails with "Too many communicators" in 4.0.0.3 vs 3.5.1"
Bug 755698 MPI_Allgatherv hangs when using thread-safety
Bug 755490 SHMEM performance over Seastar needs improvements
Bug 755426 Divide by zero by MPIIO if file is not a Lustre file

ACML
See ACML documentation at AMD

Several libraries and compilers have been updated on Hexagon.

MPI:
xt-mpt 4.0.2 -> 4.0.3

Math libs:
xt-libsci 10.4.2 -> 10.4.3
PETSc 3.0.0.9 -> 3.0.0.10
libfast 1.0.6 -> 1.0.7

Compilers:
PGI 10.2.0 -> 10.3.0
Intel 11.1.064 -> 11.1.069

NOTES:

xt-mpt:

Features:
The algorithms used for shmem_set_lock and shmem_clear_lock have been improved for much better scaling. In a basic test of calls to set_lock and clear_lock by a set of PEs all competing for the same lock, MPT 4.0.2 and MPT 4.0.3 perform about the same for a few nodes, but beyond just a few, the time per PE for MPT 4.0.2 steadily increases with the number of PEs whereas the time per PE for MPT 4.0.3 stays level. At just 128 PEs, MPT 4.0.3 is about 4 times faster than MPT 4.0.2 and the difference keeps increasing. In addition, the new algorithm grants the lock in the same order as the lock was requested whereas with the old algorithm it was somewhat random which PE waiting for the lock would get it next.

xt-libsci:

Bugs fix in Libsci 10.4.3 release
757748 LIBSCI - */lib/libsci_mc12.so missing for all compilers.
757785 libsci_m12.a missing in gnu/lib/44 and gnu/lib/43 formats
757821 Libsci 10.4.2 is not compatible with PGI 9.0 and earlier

libfast:

This release of libfast_mv 1.0.7 contains two new routines
* frda_sqrt(), an array version of the square root function, sqrt();
* frda_rsqrt(), an array version of the inverse square root function, 1/sqrt().

PETSc:

New hypre-2.6.0b https://computation.llnl.gov/casc/hypre/software.html

PGI:

The following bugs are fixed in the PGI 10.3.0 release.
754306 pgcc compiling #include with -Xa compiler option yields 968 lines of error messages [TPR 16276]
754847 SLES 11 missing macro def for __CPU_ISSET [TPR 16594]
755699 PGI pgf90 OpenMP doesn't issue message for missing SAVE attribute for var in THREADPRIVATE [16504]
756213 On XT the PGI (10.0.0) compiler fails with 'asm' instruction in [TPR 16620]
756425 PGF90-F-0000-Internal compiler error. [16527]
757047 PGI OpenMP pgf90 should give msg if ALLOCATABLE array in THREADPRIVATE doesn't have SAVE attribute [16504]
757169 PGI OpenMP pgf90 ignores task to create a file when task appears in sequential part of program [16602]
757662 PGI 10.2.0 incompatible with glibc >=2.7 CPU_SET [TPR 16594]