Several libraries have been updated on hexagon.

MPI:
xt-mpt 3.1.1 -> 3.1.2

Libs/math:
xt-libsci 10.3.2 -> 10.3.3
petsc 3.0.0 -> 3.0.0.1
hdf5 1.8.2 -> 1.8.2.1
netcdf_hdf5parallell 4.0 -> 4.0.0.1
netcdf 4.0 -> 4.0.0.1

Compiler:
xt-asyncpe 2.1 -> 2.3 (wrapper)
pgi 8.0.3 -> 8.0.4

NOTES:

xt-mpt:

MPI_Reduce has been optimized to be SMP aware and this optimization is
enabled by default. The SMP aware algorithm performs significantly better
than the default algorithm for most message sizes. Performance improvements of over 3x for message sizes below 128K have been observed. A new environment variable MPICH_REDUCE_LARGE_MSG can be used to adjust the cutoff for when this optimization is enabled. See the man page for more info.

xt-libsci:

- libGoto 1.29 includes moderate performance improvements in BLAS and LAPACK.

- CRAFFT1.1 (Cray Adaptive FFT) is a productivity enhancement for the efficient use of Fast Fourier transforms with little programming effort. CRAFFT 1.1 adds single precision support. See intro_crafft for a description of the double precision API. Replace the "z" and "d" in the double precision
routine names by "c" or "s" to use the single precision routines.
E.g.crafft_d2z1d in double precision would be crafft_s2c1d in single
precision.
The fftw/3.2.0 module must be loaded to use CRAFFT1.1. If the FFTW module is not loaded, then the user's link stage will fail with unresolved references to FFTW routines.
Prior to running a CRAFFT-linked executable, users must copy the correct
FFTW wisdom files into their current run directory. The wisdom files are
fftw_wisdom-3.2 for double precision and fftw_wisdom_single-3.2 for single
precision, and are found at the following location: /opt/xt-libsci/10.3.3/

netcdf_hdf5parallell:

Known problem:
Use of the '-fsecond-underscore' compiler option with pathscale compilers is needed. Failure to do so will result in a link error.

Several libraries and programs have been updated on hexagon. Users are encouraged to recompile their programs to get fixes and performance-increases. In particular, codes that use MPI_Bcast will see an improvement with the new xt-mpt release, see notes below.

MPI:
xt-mpt 3.1.0 -> 3.1.1

COMPILER and tools:
pgi 8.0.2 -> 8.0.3
xt-asyncpe 2.0 -> 2.1 (compiler wrapper)
java 1.6.0-7 -> 1.6.0-11

LIBRARIES:
hdf5 and hdf5-parallell 1.6.7a -> 1.8.2
netCDF 3.6.2 -> 4.0
fftw 3.1.1 -> 3.2.0
PetSC 2.3.3a -> 3.0.0
ACML 4.1.0 -> 4.2.0 (previously installed but not listed)
xt-libsci 10.3.1 -> 10.3.2 (previously installed but not listed)
libfast 1.0 -> 1.0.2 (previously installed but not listed)

NEW LIBRARIES:
netcdf-hdf5parallell 4.0 (combined netcdf-hdf5-parallell)

NEW TOOLS:
xt-lgdb 1.1 (Cray version of gdb to use for MPI debugging on XT)

NOTES FOR XT-MPT:

- MPI_Bcast has been optimized to be SMP aware and this optimization is enabled by default. The performance improvement varies depending on message size and number of ranks but improvements of between 10% and 35% for messages below 128K bytes have been observed.

- Improvements have been made to the MPICH_COLL_OPT_OFF environment variable by allowing a finer-grain switch to enable/disable the optimized collectives.
The user may now:
- Enable all of the optimized collectives (this is the default)
- Disable all the opt collectives (export MPICH_COLL_OPT_OFF=0)
- Disable a selected set of the optimized collectives by providing
a comma-separated list of the collective names
e.g. export MPICH_COLL_OPT_OFF=MPI_Allreduce,MPI_Bcast,MPI_Alltoallv
If a user chooses to disable any Cray-optimized collective, they will get the standard MPICH2 algorithm.

Hexagon crashed 12:15 today. We are working on getting the system up and running again.

Update 13:30: Hexagon is now running again. Most probably the crash was caused by overuse of memory on several login nodes.

All jobs running when it crashed has to be resubmitted. We are sorry for the inconvenience.

Due to a double cooling failure (primary plus backup) for the building-provided chilled water supply, we were forced to shutdown hexagon due to over-temperature in the room.

Update 17:30: we hope to have the cooling back Thursday morning.

Update Thursday 10:00: we have now partial cooling and have started the machine and allowed logins. Until we know more about when we will get full cooling we have a system reservation on all nodes, you can add jobs to the queue but they will not start until we remove the reservation.

Update Thursday 11:00: we have now restored 1 of the cooling machines to operation so we have now full cooling, and reservation is removed.

The /work file system on hexagon hangs, we are doing debug dumps and will restart the system. Existing jobs will have to be re-submitted.

Update 15:00, one of the disk controllers have problems, the downtime will be longer than anticipated. We will update this note when we have more information.

Update 20:00, we will need to wait for support on Monday before continuing the work to fix the controller.

Update Monday Dec 29th, 12:00, we are currently waiting for a new controller.

Update Monday Dec 29th, 17:15, the shipment with the controller is expected to arrive on Wednesday 31st. We are sorry for this delay.

Update Wednesday Dec 31st, 14:50, we have got a notice that the expected delivery of the replacement controller is delayed even further, to Monday Jan. 5th. We are looking to other ways to get the file system working.

Update Thursday Jan 1st, 04:00, the system is running again with a workaround. We will have to reboot the system again when the replacement controller arrives (so long-running jobs will have to be resubmitted).

Update Monday Jan 5th, 13:50, the new controller has now arrived we have scheduled this to be replaced on Monday the 12th at 13:30.

Filesystem /home/fimm on fimm cluster crashed this morning,We are working on solving the problem.

12:48 Update: File system is up again. All running jobs before file system crash has to be resubmitted. If any user experiencing file lost, please contact support-uib@notur.no.

we are sorry for the inconvenience.