Several libraries have been updated on hexagon. For maximum performance and stability users are encouraged to log out and in again and then recompile their programs and libraries.

Updated:
xt-asyncpe 3.0
xt-libsci 10.3.6
MPT 3.3.0
libfast_mv 1.0.4
PETSc 3.0.0.3

* xt-asyncpe 3.0
Bug fix and support in the compiler driver scripts for the Intel
compiler. No catamount support, CLE CNL only.
* xt-libsci 10.3.6
Bug fixes.
* MPT 3.3.0
Bug Fixes and improved MPI-IO by default.
* libfast_mv 1.0.4
Bug fixes and minor features.
* PETSc 3.0.0.3
Bug fixes and Istanbul support.

Fimm will be upgraded from Rocks 4.3 to Rocks 5.1 on June 9th at 08:00. This includes an upgrade from CentOS 4.6 to CentOS 5.3. All programs should be recompiled after the upgrade. Current jobs in the idle queue which will not be able to finish in time will not start. These must probably be resubmitted after the upgrade.

Update: we started fimm update at 9:00, expected down time is until tomorrow 22:00.

Update: 9th June 23:05 fimm login node is available for login. Fimm is not ready to receive jobs yet. We will continue tomorrow.

Update : 10th June 06:40 All I/O nodes are updated, We are currently working on queue migration from old cluster, estimated completion of update is before 12:00 today.

Update : 11th June 14:00 We currently still working on file system issue, New estimated completion time of fimm update is tonight 22:00.Sorry for inconvenience.

Update : 11th June 14:45 Fimm login node and all file system is available again. We are working on queuing system.

Update : 11th June 21:30 Fimm login node and file systems are up. We are currently running some online file systems checks and working on getting the queuing system to work correctly. We are sorry for the inconvenience this may cause you.

Update : 12th June 17:00 Fimm is updated to new Cent OS 5.3 final, queuing system is starting to receive jobs, currently we are installing new software. All the user program/code has to be recompiled.
some user specific software will be installed according to the request, you can send email to support-uib@notur.no to request new software installation.

We improved module function on fimm, according to basic compiler you already loaded in your environment, if you load some other software module, module will load right version(compiler) itself.

We will installing missing software, and will complete it as soon as possible, we are sorry for all inconvenience, and will be appreciated your patience during the software install.

If you encounter any problem , please contact us by email to support-uib@notur.no.

26 May, 08:30 is planned time for Hexagon software upgrade. It will be upgraded from UNICOS/lc 2.0 to CLE 2.1UP02. It is an major software upgrade and will take from several hours to several days. We will use our bests to minimize the downtime.
Lustre FS is going to be upgraded from version 1.4 to 1.6 which will need /work filesystem check for several hours. We kindly ask hexagon users to remove all unused files from /work filesystem, this will result in shortening downtime.
ALL programs which are going to be used after software upgrade MUST be recompiled! This is very important! Running application compiled for current OS release (2.0) can create unexpected results after upgrade.

Update: Upgrade time moved to 26.05.2009 08:30. Therefore hexagon reserved from 08:30, 26th of May. Long jobs which are not able to complete before the downtime will not start.

Update: May 27th 00:00: The upgrade will continue tomorrow. The machine will be unavailable until the upgrade is finished.

Update: May 27th 16:00: We have started recompiling software on Hexagon.

Update: May 27th 21:30 Software upgrade finished. Hexagon is back online.

As it was mentioned before this is MAJOR software upgrade. Now hexagon is running CLE 2.1UP02, with Lustre 1.6 filesystem

Notes:

* All programs MUST be RECOMPILED!.

* The following programs/modules was removed as they are not supported anymore:
gmalloc
gnet
iobuf
libscifft
openGLUT

* This software was replaced by Cray versions:
all hdf5:
hdf5
hdf5-parallel
all netcdf:
netCDF (for version 3.6.2)
netcdf (for version 4.0.1)
java/jdk 1.6.0

* The following software will be shortly recompiled.
amber
antlr
berkley-upc
cdo
coreutils
git
gnuplot
grads
grib_api
gsl
ncl_ncarg
nco
ncview
nedit
nwchem
imagemagick
jasper
libdap
libnc-dap
matlab
pgplot
python (static)
subversion
vim 7.2 or newer
WPS
WRF

* The libraries like:
zlib
libxml2
libpng
glib2.4.2
are available by default without modules

* Module name: changed program name and version number structure, like:
%ProgramName/%Version
eg. nwchem-cnl/5.1.1
netCDF/3.6.2
While loading modules, users are advised to use as much as possible only the program name, the optimal version will be loaded by default:
module load nwchem

* Please update your PBS scripts as well as environment to load correct modules.

We have to shutdown Hexagon because of a major water leakage in our machine room. Shutdown at 14:20, May 8th.

Update 17:20: Due to a clog in the drainage system, sewage has been spilled under the computer floor. Because of the danger of short circuits we need to keep the system shutdown. Hexagon will probably not be started until Saturday at the earliest. The Fimm room is currently operational, but can be affected if sewage rises.

Update 19:30: The clog has now been removed. The sewage under the computer floor is currently being cleaned. Due to the humidity in the room, Hexagon will be down until the machine room is dry again.

Update 21:15: The computer room has now been cleaned. Hexagon will remain down until the floor is dry again.

Update May 9th 11:50: Hexagon is now started again. All jobs running at the time of the crash has to be resubmitted.

We are sorry for any inconvenience.

Because of new equipment we need to expand the electric power in our machine room. We have therefore reserved the fimm cluster from 06:00, 14th of May. Long jobs not able to complete before the downtime will not be able to start.

The exact length of the downtime is currently unknown, but should not last more than half a day. We will provide more information as soon as we know more.

Update May 12th: We also discovered some issues with our file systems. We will therefore use this opportunity to perform complete file system checks. The down time will therefore be longer.

Update May 13th: The power shutdown will be at 10:00 tomorrow. We will make fimm unavailable from 09:00 because of needed upgrades.

Update May 14th 09:30: Todays power shutdown has been postponed until tomorrow at 07:00. We will still use the current down time to perform some maintenance.

Update May 14th 18:00: The file system checks are still not finished. We will monitor the progress through the evening. Fimm will be unavailable until after power shutdown tomorrow.

Update May 15th 08:55: Fimm is now available for usage again.

Several libraries have been updated on hexagon. For maximum performance and stability users are encouraged to log out and in again and then recompile their programs and libraries.

MPI
xt-mpt 3.1.2 -> 3.2.0

Libs/math:
xt-libsci 10.3.3 -> 10.3.4
hdf5 1.8.2.1 -> 1.8.2.2
netcdf_hdf5parallell 4.0.0.1 -> 4.0.0.2
netcdf 4.0.0.1 -> 4.0.0.2

Compiler:
xt-asyncpe (wrapper) 2.3 -> 2.4
pgi 8.0.4 -> 8.0.5

NOTES:

xt-mpt

MPI-IO performance improvements for collective buffering on MPI collective
writes.

This optimization is enabled by setting the MPIIO hint romio_cb_write to "enable" and setting the environment variable MPICH_MPIIO_CB_ALIGN to 2. Other values of this environment variable are 0 and 1, where 0 is for the original algorithm in MPT 3.0 and earlier and 1 is for the algorithm introduced in MPT 3.1. The MPICH_MPIIO_CB_ALIGN section of the "mpi" man page gives more details. If you are not already using collective buffering, read the MPICH_MPIIO_HINTS section for more information.

xt-libsci

SuperLU has been removed from xt-libsci release. It is released with the PETSc 3.0.0.1 and later releases. The xt-libsci module no longer loads the fftw module by default.

fftw

Performance improvements for some multidimensional r2c/c2r transforms.
Fortran documentation now recommends not using dfftw_execute, because of reports of problems with various Fortran compilers; it is better to use dfftw_execute_dft etc.

Hexagon went down again at 02:30. We are investigating problem.

Update, It turns out another rack has some power issues. We have to investigate more before we can turn the machine back on.

Update 15:10, Hexagon is now running again.

Update Tuesday 12:15, Since few jobs are running we will take the system down at 12:30, earlier than planned.

Update 16:45, Hexagon is now running with all racks included.

Hexagon went down at 09:30 due to a power issue on cabinet c4.

We are investigating.

Update 10:00, We are doing future scheduled maintenance work while we are waiting on the diagnostics.

Update 13:00, hexagon is running again in degraded mode without cabinet c4. We are waiting for a replacement PDU for this cabinet. When we get the part we will need to shutdown the machine to include the cabinet again. This will likely happen on Monday April 6th.

Update 16:50, Maintenance work is now scheduled for Tuesday April 7th at 14:00, after which the machine will be rebooted. To be able to run, all jobs needs to finish (as specified by walltime) before 14:00 on Tuesday.

Update Tuesday 12:15, Since few jobs are running we will take the system down at 12:30, earlier than planned.

The backup server will be unavailable for an hour today, April 1st from 12:45 to 13:45, due a firmware upgrade of the tape robot. The normal non-disruptive procedure did not work forcing us to do it during downtime. Tape file systems like /migrate and /bcmhsm will also be unavailable during the upgrade.

We are sorry for the short notice.

Update 13:30, tape-robot and backup server is now up again.