Upgrade

Several libraries have been updated on hexagon.
For maximum performance and stability users are encouraged to log out and in again and then recompile their programs and libraries.

Updates:

MPI:
xt-mpt 3.3.0 -> 3.4.0

Libs/math:
xt-libsci 10.3.6 -> 10.3.7
hdf5 1.8.2.3 -> 1.8.3.0
netcdf_hdf5parallell 4.0.0.3 -> 4.0.1.0
netcdf 4.0.0.3 -> 4.0.1.0
petsc 3.0.0.3 -> 3.0.0.4
acml 4.2.0 -> 4.3.0

Compiler:
xt-asyncpe (wrapper) 3.0 -> 3.1
pgi 8.0.6 -> 9.0.1

NOTES:

xt-mpt
Bug fixes related to SHMEM as well as Intel compiler.

hdf5/netcdf
New features and bugfixes.

See here for more information:
http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.3-RELEASE.txt

http://www.unidata.ucar.edu/software/netcdf/release-notes-4.0.1.html

petsc
Feature update: CASK 1.1, plus bugfixes.

CASK 1.1 includes new sparse matrix vector multiplication for
transposed matrices improves the performance by 5-30% depending
on nonzero pattern of the matrix.

Support for the Intel compiler.

ACML
New feature release. Note that this release does not support the pathscale compiler.

See ACML site for more information:
http://developer.amd.com/cpu/Libraries/acml/features/pages/default.aspx

PGI
New major release.

See PGI site for more information:
http://www.pgroup.com/doc/pgiwsrn901.pdf

Fimm will be upgraded from Rocks 4.3 to Rocks 5.1 on June 9th at 08:00. This includes an upgrade from CentOS 4.6 to CentOS 5.3. All programs should be recompiled after the upgrade. Current jobs in the idle queue which will not be able to finish in time will not start. These must probably be resubmitted after the upgrade.

Update: we started fimm update at 9:00, expected down time is until tomorrow 22:00.

Update: 9th June 23:05 fimm login node is available for login. Fimm is not ready to receive jobs yet. We will continue tomorrow.

Update : 10th June 06:40 All I/O nodes are updated, We are currently working on queue migration from old cluster, estimated completion of update is before 12:00 today.

Update : 11th June 14:00 We currently still working on file system issue, New estimated completion time of fimm update is tonight 22:00.Sorry for inconvenience.

Update : 11th June 14:45 Fimm login node and all file system is available again. We are working on queuing system.

Update : 11th June 21:30 Fimm login node and file systems are up. We are currently running some online file systems checks and working on getting the queuing system to work correctly. We are sorry for the inconvenience this may cause you.

Update : 12th June 17:00 Fimm is updated to new Cent OS 5.3 final, queuing system is starting to receive jobs, currently we are installing new software. All the user program/code has to be recompiled.
some user specific software will be installed according to the request, you can send email to support-uib@notur.no to request new software installation.

We improved module function on fimm, according to basic compiler you already loaded in your environment, if you load some other software module, module will load right version(compiler) itself.

We will installing missing software, and will complete it as soon as possible, we are sorry for all inconvenience, and will be appreciated your patience during the software install.

If you encounter any problem , please contact us by email to support-uib@notur.no.

26 May, 08:30 is planned time for Hexagon software upgrade. It will be upgraded from UNICOS/lc 2.0 to CLE 2.1UP02. It is an major software upgrade and will take from several hours to several days. We will use our bests to minimize the downtime.
Lustre FS is going to be upgraded from version 1.4 to 1.6 which will need /work filesystem check for several hours. We kindly ask hexagon users to remove all unused files from /work filesystem, this will result in shortening downtime.
ALL programs which are going to be used after software upgrade MUST be recompiled! This is very important! Running application compiled for current OS release (2.0) can create unexpected results after upgrade.

Update: Upgrade time moved to 26.05.2009 08:30. Therefore hexagon reserved from 08:30, 26th of May. Long jobs which are not able to complete before the downtime will not start.

Update: May 27th 00:00: The upgrade will continue tomorrow. The machine will be unavailable until the upgrade is finished.

Update: May 27th 16:00: We have started recompiling software on Hexagon.

Update: May 27th 21:30 Software upgrade finished. Hexagon is back online.

As it was mentioned before this is MAJOR software upgrade. Now hexagon is running CLE 2.1UP02, with Lustre 1.6 filesystem

Notes:

* All programs MUST be RECOMPILED!.

* The following programs/modules was removed as they are not supported anymore:
gmalloc
gnet
iobuf
libscifft
openGLUT

* This software was replaced by Cray versions:
all hdf5:
hdf5
hdf5-parallel
all netcdf:
netCDF (for version 3.6.2)
netcdf (for version 4.0.1)
java/jdk 1.6.0

* The following software will be shortly recompiled.
amber
antlr
berkley-upc
cdo
coreutils
git
gnuplot
grads
grib_api
gsl
ncl_ncarg
nco
ncview
nedit
nwchem
imagemagick
jasper
libdap
libnc-dap
matlab
pgplot
python (static)
subversion
vim 7.2 or newer
WPS
WRF

* The libraries like:
zlib
libxml2
libpng
glib2.4.2
are available by default without modules

* Module name: changed program name and version number structure, like:
%ProgramName/%Version
eg. nwchem-cnl/5.1.1
netCDF/3.6.2
While loading modules, users are advised to use as much as possible only the program name, the optimal version will be loaded by default:
module load nwchem

* Please update your PBS scripts as well as environment to load correct modules.

Several libraries have been updated on hexagon. For maximum performance and stability users are encouraged to log out and in again and then recompile their programs and libraries.

MPI
xt-mpt 3.1.2 -> 3.2.0

Libs/math:
xt-libsci 10.3.3 -> 10.3.4
hdf5 1.8.2.1 -> 1.8.2.2
netcdf_hdf5parallell 4.0.0.1 -> 4.0.0.2
netcdf 4.0.0.1 -> 4.0.0.2

Compiler:
xt-asyncpe (wrapper) 2.3 -> 2.4
pgi 8.0.4 -> 8.0.5

NOTES:

xt-mpt

MPI-IO performance improvements for collective buffering on MPI collective
writes.

This optimization is enabled by setting the MPIIO hint romio_cb_write to "enable" and setting the environment variable MPICH_MPIIO_CB_ALIGN to 2. Other values of this environment variable are 0 and 1, where 0 is for the original algorithm in MPT 3.0 and earlier and 1 is for the algorithm introduced in MPT 3.1. The MPICH_MPIIO_CB_ALIGN section of the "mpi" man page gives more details. If you are not already using collective buffering, read the MPICH_MPIIO_HINTS section for more information.

xt-libsci

SuperLU has been removed from xt-libsci release. It is released with the PETSc 3.0.0.1 and later releases. The xt-libsci module no longer loads the fftw module by default.

fftw

Performance improvements for some multidimensional r2c/c2r transforms.
Fortran documentation now recommends not using dfftw_execute, because of reports of problems with various Fortran compilers; it is better to use dfftw_execute_dft etc.

The backup server will be unavailable for an hour today, April 1st from 12:45 to 13:45, due a firmware upgrade of the tape robot. The normal non-disruptive procedure did not work forcing us to do it during downtime. Tape file systems like /migrate and /bcmhsm will also be unavailable during the upgrade.

We are sorry for the short notice.

Update 13:30, tape-robot and backup server is now up again.

Several key software and library packages have now been updated on hexagon.
We recommend that you recompile your programs to get the increased performance and fixes that has been introduced. Note that you need to log out and in again to get the new modules loaded by default.

See below for some excerpts from the release notes.

MPI and compiler wrappers:
xt-mpt 3.0.4 -> 3.1.0
xt-asyncpe 1.2 -> 2.0

Math libs (LAPACK, BLAS etc):
xt-libsci 10.3.0 -> 10.3.1

Notes regarding new MPI version from Cray:

This MPT 3.1 version contains the following new features.

* Move from MPICH2 1.0.4p1 to MPICH2 1.0.6p1
* Cpu affinity support
* Raise the maximum number of MPI ranks from 64,000 to 256,000 ranks.
* Raise the maximum number of SHMEM PEs from 32,000 to 256,000 SHMEM PEs.
* Automatically-tuned default values for MPICH environment variables
* Dynamic allocation of MPI internal message headers
* Improvements to start-up times when running at high process counts(40K
cores or more)
* Significant performance improvements for the MPI_Allgather collective
* Improvements for some error messages
* Wildcard matching for filenames in MPICH_MPIIO_HINTS
* Support for the Cray Compiling Environment (CCE) 7.0 compiler in
x86 ABI compatible mode
* MPI Barrier before collectives
* MPI-IO collective buffering alignment
* MPI Thread Safety
* Improved performance for on-node very large discontiguous messages

More detail for some of these below.

* Move from MPICH2 1.0.4p1 to MPICH2 1.0.6p1
- Performance improvements for derived datatypes (including packing and
communication) through loop-unrolling and buffer alignment.

- Performance improvements for MPI_Gather when non-power-of-two processes are
used, and when a non-zero ranked root is performing the gather.

- MPI_Comm_create now works for intercommunicators.

- Many other bug fixes, memory leak fixes and code cleanup.

- Includes a number of specific fixes from MPICH2 1.0.7 for regressions
introduced in MPICH1 1.0.6p1


* Automatically-tuned default values for MPICH environment variables

Several of the MPICH environment variable default values are now dependent
on the total number of processes in the job. Previously, these defaults
were set to static values. This feature is designed to allow higher scaling
of MPT jobs with fewer tweaks to environment variables. For more information
on how the new defaults are calculated, please see the "mpi" man page. As
before, the user is able to override any of these defaults by setting the
corresponding environment variable. The new default values are displayed
via the MPICH_ENV_DISPLAY setting.



* Dynamic allocation of MPI internal message headers

If additional message headers are required during program execution, MPI
dynamically allocates more message headers in quantities of MPICH_MSGS_PER_PROC.


* Significant performance improvements for the MPI_Allgather collective

This change adds in a new MPI_Allgather collective routine which scales well
for small data sizes. The default is to use the new algorithm for any
MPI_Allgather calls with 2048 bytes of data or less. The cutoff value can be
changed by setting the new MPICH_ALLGATHER_VSHORT_MSG environment variable.
In addition, some MPI functions use allgather internally and will now be
significantly faster. For example MPI_Comm_split will be significantly faster
at high pe counts. Initial results show improvements of around 2X around 16
cores to over 100X above 20K cores.


* Improvements for some error messages

This change fixes a small number of messages specific to Cray platforms that
were incorrect due to the merging of the Cray and ANL messages and message
handling processes.


* Wildcard matching for filenames in MPICH_MPIIO_HINTS

Support has been added for wildcard pattern matching for filenames in the
MPICH_MPIIO_HINTS environment variable. This allows easier specification of
hints for multiple files that are opened with MPI_File_open in the program.
The filename pattern matching follows standard shell pattern matching rules for
meta-characters ?, \, [], and *.


* MPI Barrier before collectives

In some situations a Barrier inserted before a collective may improve
performance due to load imbalance. This feature adds support for a new
MPICH_COLL_SYNC environment variable which will cause a Barrier call to
be inserted before all collectives or only certain collectives. See the
"mpi" man page for more information.


* MPI-IO collective buffering alignment

This feature improves MPI-IO by aligning collective buffering file domains
on Lustre boundaries. The new algorithms take into account physical I/O
boundaries and the size of the I/O requests. The intent is to improve
performance by having the I/O requests of each collective buffering node
(aggregator) start and end on physical I/O boundaries and to not have more
than one aggregator reference for any given stripe on a single collective
I/O call. The new algorithms are enabled by setting the MPICH_MPIIO_CB_ALIGN
environment variable but may become the default in a future release.
Initial results have shown as much as a 4X improvement on some benchmarks.
See the "mpi" man page for more information.


* MPI Thread Safety

The MPI Thread Safety feature provides a high-performance implementation
of thread-safety levels MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, and
MPI_THREAD_SERIALIZE in the main MPI library.

The MPI_THREAD_MULTIPLE thread-safety level support is in a separate
"mpich_threadm" library and is not a high-performance implementation.
Use "-lmpich_threadm" when linking to MPI_THREAD_MULTIPLE routines.

Set the MPI Thread Safety MPICH_MAX_THREAD_SAFETY environment variable
to the desired level (MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED,
MPI_THREAD_SERIALIZED, or MPI_THREAD_MULTIPLE), to control the value
returned in the "provided" argument of the MPI_Init_thread() routine.

See the "mpi" man page and the MPI standard for more information.


* Improved performance for on-node very large discontiguous messages

This feature enables a new algorithm for the on-node SMP device to process large
discontiguous messages. The new algorithm allows the use of our on-node
Portals-assisted call that is used in our MPT 3.0 single-copy feature rather
than buffering the data into very small chunks as was currently being done.
Some applications have seen as much as a 3X speedup with discontiguous messages
in excess of 4M bytes.

Several key software and library packages have now been updated on hexagon.
We recommend that you recompile your programs to get the increased performance and fixes that has been introduced. Note that you need to log out and in again to get the new modules loaded by default.

Compiler and MPI:
xt-mpt 3.0.3 -> 3.0.4
pgi 7.2.5 -> 8.0.1

Profiler with supporting libraries:
xt-craypat 4.3.2 -> 4.4.0
apprentice2 4.3.0 -> 4.4.0
xt-papi 3.6.1a -> 3.6.2
dwarf 8.6.0 -> 8.8.0
elf 0.8.9 -> 0.8.10

The libsci library is updated to version 10.3.0 and includes optimizations and new libraries. Users are encouraged to recompile their applications to benefit from optimazation and bugfixes.

Description of new features in xt-libsci 10.3.0:

CRAFFT (Cray Adaptive FFT) is a new feature in libsci-10.3.0. CRAFFT uses
offline and online testing information to adaptively select the best FFT
algorithm from the available FFT options. CRAFFT provides a very simple
user interface into advanced FFT functionality and performance. Planning
and execution are combined into one call with CRAFFT. The library comes
packaged with pre-computed plans so that in many cases the planning stage
can be omitted. Please see the manual page intro_crafft for more information.

Usage note : for the most optimal usage of CRAFFT, please copy the file
/opt/xt-libsci/10.3.0/fftw_wisdom into the luster directory from which the
executable is run from.

LibGoto 1.26 includes enhanced BLAS performance. There are several libsci
library variants installed with the libsci-10.3.0 package.

To use threaded BLAS, the thread-enabled libsci library whose name is
suffixed with '_mp' should be linked explicitly

e.g. ftn -o myexec -lsci_quadcore_mp

Dependencies:
=============

Libsci-10.3 and fftw-3.1.1 are now dependent. If you wish to use fftw
version 2.1.5 then do the following

module swap fftw/3.1.1 fftw/2.1.5.1

Since the last big software update on June 16th several libraries and programs have been updated.

MPT (MPI) 3.0.2
pgi 7.2.3
pathscale 3.2
CrayPat 4.3.1
libfast 1.0 (new library with some optimized math functions)
fftw 2.1.5.1
PAPI 3.6
Totalview 8.4.1b
gcc 4.2.4 (only for login-node programs)
xt-asyncpe 1.0c (new compiler wrappers)
xt-binutils-quadcore 2.0.1 (binutils for AMD quadcore)
Moab 5.2.3 scheduler (remember to log out and in again)

Users will need to log out and in again to get the above as default modules.
Because all applications that run on the compute nodes are statically compiled, we encourage re-compiling of applications and libraries, especially if you have experienced problems.

Wednesday July 16th 08:00, will Fimm be unavailable while the file system and the queuing system is upgraded. This upgrade will most likely last until 17:00.

Please note that a reservation has been set on the system. Jobs must finish before July 16th, if not they will stay in the queue until the upgrade has been completed.

Update, July 16th 08:00: Upgrade is started. Machine will be unavailable until upgrade is complete.

Update, July 16th 15:30: Starting to reinstalling compute nodes. Hopefully the upgraded will be completed within few hours.

Update, July 16th 20:30: Fimm is now available. All global file systems has been upgraded. Queuing system has not been upgraded.