Hexagon: Updated software/libraries, Mar. 2nd

Several libraries and compilers have been updated on hexagon.

NOTE: We have found that the module xtpe-barcelona was not loaded by default for a time. If you have not loaded this manually your programs will not be fully optimized for hexagon. Please log out and in again and re-compile your programs.

Note also that "xt-atp" have changed name to "atp".

Updated libraries/compilers:

* xt-asyncpe 3.7
Bug Fixes and support for the CCE 7.2 compilers with DSLs.
* Libsci 10.4.2
OpenMP/SMP support and Dynamic share libraries support for
the CCE compiler.
* Trilinos 10.0.1
Performance enhancements.
* hdf5-netcdf 1.7
Support the CCE C++ ABI compliant compiler.
* MPT 4.0.2
Support the CCE C++ ABI compliant compiler.
* Cray Debugger tools
ATP 1.0.1
STAT 1.0.0
MRNet 2.2.0.1
Initial release of statview as part of STAT. Bug fixes to
ATP and MRNet.
* PGI 10.1.0 and 10.2.0
Bug Fix releases of PGI.
* GCC 4.4.3
Bug Fix releases of GNU.

More information:

xt-libsci:

Xt-libsci 10.4.2 contains dynamic shared libraries for Cray compiler.
This release also contains new dynamic shared libraries for barcelona,
istanbul and mc12 hardware.

The multi-threaded libsci implementation has been significantly enhance
for the Shared Memory Parallel programs. The new implementation uses
OpenMP, therefore, the previous environment variable GOTO_NUM_THREADS is
no longer used.
Performance improvements of 2X or more are common on multi-threaded
Level 2 BLAS routines, and significantly improved on Level 3 BLAS
routines, when running with OMP_NUM_THREADS greater than 1.

Loader Options for OpenMP Support.
To use the OpenMP libraries, you need to use the link-time options as
specified below. The examples below are for the Istanbul processor.

module load xtpe-barcelona
PGI
cc -mp foo.c *.o -lsci_quadcore_mp
ftn -mp foo.f90 *.o -lsci_quadcore_mp
GNU
cc -fopenmp foo.c *.o -lsci_quadcore_mp
ftn -fopenmp foo.f90 *.o -lsci_quadcore_mp
INTEL
cc -openmp foo.c *.o -lsci_quadcore_mp
ftn -openmp foo.f90 *.o -lsci_quadcore_mp
PATHSCALE
cc -mp foo.c *.o -lsci_quadcore_mp
ftn -mp foo.f90 *.o -lsci_quadcore_mp

Trilinos:

Trilinos is an object-oriented and componentized framework for
scientific computation, and as such allows greater flexibility,
control, portability and performance than a collection of custom
or independent solvers. The CASK library (Cray Adaptive Sparse
Kernels) is integrated with Trilinos to provide extra performance
with no additional involvement required by the user. The Cray
Trilinos package therefore enables the full productivity advantages
of the Trilinos framework while providing solvers tuned specifically
to the Cray XT hardware.

The Trilinos release 10.0.1 includes improved Cray Adaptive Sparse
Kernels (CASK) routines for sparse matrix vector multiplication with multiple vectors. Applications using Epetra will gain some performance benefits from this improvement.