Hexagon crashed/powered off due to a lightning strike and power-spike at 08:10.
Update 10:00: Cooling is also affected.
Update 12:47: Machine is up.
We used the time to include a future (planned) maintenance in this downtime.
Hexagon: Updated software/libraries
Hexagon has updated software/libraries. Due to static linking programs must be recompiled to incorporate fixes.
MPI
xt-mpt 5.2.3 -> 5.3.0
Libraries
TPSL 1.0.0 -> 1.0.01
Compilers and debuggers
PGI 11.4 -> 11.5
Intel 12.0.3.174 -> 12.0.4.191
Totalview 8.9.0 -> 8.9.1
lgdb 1.2 -> 1.3
ATP 1.1.3 -> 1.2.0
NOTES:
xt-mpt
Bug-fixes.
The following features were added to MPT 5.3.0 over MPT 5.2.3:
- Merged in the ANL MPICH2 1.3.1 release which includes numerous bug fixes
- Improved support for MPI thread safety. It is no longer necessary to link
in a separate library for thread multiple support. See the
MPICH_MAX_THREAD_SAFETY env variable in the intro_mpi man page for more info.
- Improve performance of MPI_Init when using static connections.
- Make several additional MPICH_GNI environment variables available to users
and document them in the intro_mpi man page.
TPSL
Bug fixed in TPSL 1.0.01:
772335 massive unnecessary diagnostic output from SuperLU_dist under
petsc/3.1.05
ATP and lgdb
Bugfixes.
MPI
xt-mpt 5.2.3 -> 5.3.0
Libraries
TPSL 1.0.0 -> 1.0.01
Compilers and debuggers
PGI 11.4 -> 11.5
Intel 12.0.3.174 -> 12.0.4.191
Totalview 8.9.0 -> 8.9.1
lgdb 1.2 -> 1.3
ATP 1.1.3 -> 1.2.0
NOTES:
xt-mpt
Bug-fixes.
The following features were added to MPT 5.3.0 over MPT 5.2.3:
- Merged in the ANL MPICH2 1.3.1 release which includes numerous bug fixes
- Improved support for MPI thread safety. It is no longer necessary to link
in a separate library for thread multiple support. See the
MPICH_MAX_THREAD_SAFETY env variable in the intro_mpi man page for more info.
- Improve performance of MPI_Init when using static connections.
- Make several additional MPICH_GNI environment variables available to users
and document them in the intro_mpi man page.
TPSL
Bug fixed in TPSL 1.0.01:
772335 massive unnecessary diagnostic output from SuperLU_dist under
petsc/3.1.05
ATP and lgdb
Bugfixes.
Fimm file system crashed
Fimm internal network crashed yesterday around 4:00 clock. and also we lost all file system, all running jobs are also crashed.
We are sorry for inconvenience.
We are sorry for inconvenience.
Hexagon: system crash
Hexagon crashed due to a power spike related to a thunderstorm which left cabinets in a fault state.
Update 15:12: System is up after 1 hour downtime.
Update 15:12: System is up after 1 hour downtime.
Fimm maintenance 30th May 2011
Dear Fimm Cluster Users :
We will have maintenance for fimm cluster from 08:00 ~ 16:00 on 30th May 2011(Monday).
Following will be performed :
* Add extra login node for fimm
* Reinstall cluster compute node
* Cable rearrangement
* Firmware update
During that time you will be able to access login node to perform basic operation, but you will not be able to submit any jobs or check queue status. Some of the file system will not be accessible or unstable during the maintenance.
Entire cluster is reserved for maintenance. All running jobs which will not be able to finish by the time the maintenance starts will be killed. user has to resubmit all killed jobs.
All submitted jobs which will not be able to finish by the time maintenance starts will be queued until the end of maintenance and will start running when maintenance is over.
If you have any further question please contact us at
hpc-support@hpc.uib.no
We are sorry for inconvenience.
Support team.
We will have maintenance for fimm cluster from 08:00 ~ 16:00 on 30th May 2011(Monday).
Following will be performed :
* Add extra login node for fimm
* Reinstall cluster compute node
* Cable rearrangement
* Firmware update
During that time you will be able to access login node to perform basic operation, but you will not be able to submit any jobs or check queue status. Some of the file system will not be accessible or unstable during the maintenance.
Entire cluster is reserved for maintenance. All running jobs which will not be able to finish by the time the maintenance starts will be killed. user has to resubmit all killed jobs.
All submitted jobs which will not be able to finish by the time maintenance starts will be queued until the end of maintenance and will start running when maintenance is over.
If you have any further question please contact us at
hpc-support@hpc.uib.no
We are sorry for inconvenience.
Support team.
Hexagon: Updated software/libraries
Hexagon has updates software/libraries.
MPT 5.2.3
Fixes:
- Correct handling of SHMEM locks at large core counts.
ATP 1.1.3
Fixes:
- When the Abnormal Termination Processing (ATP) signal handler recognizes that the application is running on a single core it does not initialize itself. ATP adds no value to single core applications. To take
advantage of this change, the application must be relinked.
- In previous versions of ATP, short running application that did not
fail could exit cleanly before that ATP was fully started which would result in atpFrontend processes hanging around indefinitely.
PGI 11.4.0
Features of PGI 11.4.0 are documented at: http://www.pgroup.com/doc/pgirn114.pdf
Fixes:
- 769940 pgcc-generated code for complex number operations inferior to Intel
GCC 4.5.3
Fixes:
- 768930 MANPATH for 'man gfortran' incorrect or 'old' version
- 769591 GNU OpenMP gfortran 4.5.2 internal compiler error for '!$omp task if(omp_get_num_threads() > 0)' [47886]
- 769876 Modulefile points to wrong location for man pages since v4.5.0
- 770495 PrgEnv-gnu does not set up correct man paths
- 770512 internal compiler error: in build_int_cst_wide, at tree.c:1178
- 771068 gcc module has the wrong manpath
- 771867 Incorrect MANPATH for gnu compiler man pages
Iobuf 2.0.2
IOBUF is an I/O buffering library that can reduce the I/O wait time for programs that read or write large files sequentially. IOBUF intercepts I/O system calls such as read and open and adds a layer of buffering, thus improving program performance by enabling asynchronous prefetching and caching of file data.
Fixes:
- BUG 771578 - iobuf module doesn't trap fwrite and fails when combined with posix
- BUG 772207 - Program not working when iobuf is being used
MPT 5.2.3
Fixes:
- Correct handling of SHMEM locks at large core counts.
ATP 1.1.3
Fixes:
- When the Abnormal Termination Processing (ATP) signal handler recognizes that the application is running on a single core it does not initialize itself. ATP adds no value to single core applications. To take
advantage of this change, the application must be relinked.
- In previous versions of ATP, short running application that did not
fail could exit cleanly before that ATP was fully started which would result in atpFrontend processes hanging around indefinitely.
PGI 11.4.0
Features of PGI 11.4.0 are documented at: http://www.pgroup.com/doc/pgirn114.pdf
Fixes:
- 769940 pgcc-generated code for complex number operations inferior to Intel
GCC 4.5.3
Fixes:
- 768930 MANPATH for 'man gfortran' incorrect or 'old' version
- 769591 GNU OpenMP gfortran 4.5.2 internal compiler error for '!$omp task if(omp_get_num_threads() > 0)' [47886]
- 769876 Modulefile points to wrong location for man pages since v4.5.0
- 770495 PrgEnv-gnu does not set up correct man paths
- 770512 internal compiler error: in build_int_cst_wide, at tree.c:1178
- 771068 gcc module has the wrong manpath
- 771867 Incorrect MANPATH for gnu compiler man pages
Iobuf 2.0.2
IOBUF is an I/O buffering library that can reduce the I/O wait time for programs that read or write large files sequentially. IOBUF intercepts I/O system calls such as read and open and adds a layer of buffering, thus improving program performance by enabling asynchronous prefetching and caching of file data.
Fixes:
- BUG 771578 - iobuf module doesn't trap fwrite and fails when combined with posix
- BUG 772207 - Program not working when iobuf is being used
Hexagon: Updated software/libraries
Hexagon has updates software/libraries.
MPI
xt-mpt 5.2.1 -> 5.2.2
Compilers, wrappers and debuggers
PGI 11.2.0 -> 11.3.0
Intel 12.0.2.137 -> 12.0.3.174
Chapel 1.2.1 -> 1.3.0
Totalview 8.8.0 -> 8.9.0
Java 1.6.0-22 -> 1.6.0-24
Libraries
xt-libsci 10.5.01 -> 10.5.02
NOTES:
xt-mpt
Bugs fixed in this release:
767853 Add error message for hitting request limit on Seastar instead
of segfault
xt-libsci
LibSci 10.5.02 includes bugfixes.
Bug 769615 - scalapack routine pdsyev aborts when global matrix > sqrt (2^31)
Intel
The following bugs are fixed in the Intel 12.0.3.174 release.
767152 ifort OpenMP atomic subtraction produces incorrect answers [611742]
Java
Security update
MPI
xt-mpt 5.2.1 -> 5.2.2
Compilers, wrappers and debuggers
PGI 11.2.0 -> 11.3.0
Intel 12.0.2.137 -> 12.0.3.174
Chapel 1.2.1 -> 1.3.0
Totalview 8.8.0 -> 8.9.0
Java 1.6.0-22 -> 1.6.0-24
Libraries
xt-libsci 10.5.01 -> 10.5.02
NOTES:
xt-mpt
Bugs fixed in this release:
767853 Add error message for hitting request limit on Seastar instead
of segfault
xt-libsci
LibSci 10.5.02 includes bugfixes.
Bug 769615 - scalapack routine pdsyev aborts when global matrix > sqrt (2^31)
Intel
The following bugs are fixed in the Intel 12.0.3.174 release.
767152 ifort OpenMP atomic subtraction produces incorrect answers [611742]
Java
Security update
Fimm queueing system is crashed after reboot
Master node for fimm.bccs.uib.no was crashed this morning around 4:00 due to out of memory, after reboot this morning, master node, where queuing system is running, is crashed.
We are working on it , meantime you can not submit or monitor anything related to queuing system.
14:20 Update
Fimm queuing system is back online , sorry for inconvenience.
We are working on it , meantime you can not submit or monitor anything related to queuing system.
14:20 Update
Fimm queuing system is back online , sorry for inconvenience.
Hexagon: Updated software/libraries
Hexagon has updated software/libraries. Users should recompile their applications to gain any stability or performance improvements.
MPI
xt-mpt 5.2.0 -> 5.2.1
Compilers, wrappers and debuggers
PGI 11.1.0 -> 11.2.0
xt-asyncpe 4.8 -> 4.9
ATP 1.1.1 -> 1.1.2
Libraries
Petsc 3.1.04 -> 3.1.05
xt-libsci 10.5.0 -> 10.5.01
TPSL 1.0.0
NOTES:
xt-mpt
Bugfixes.
767319, 768355 Fix adi reference counter problem for builtin datatypes
768385 Add better PMI debug for an ALPs get_rank_from_pipe failure
755921 Add PMI feature that gives SPMD support to DMAPP applications
767095 Add PMI feature to deliver rank and PID attribute file for debuggers
xt-libsci
Purpose:
--------
The LibSci 10.5.01 release adds new C++ interfaces for CRAFFT
providing users an option to use CRAFFT Serial and Distributed
Routines in C++ applications. These interfaces are provided via
C++ function overloading, allowing a user to call a simple, generic
CRAFFT routine name and use optional arguments for advanced
functionality if required. CRAFFT offers a simpler interface for
FFT routines to improve application developer productivity. In
some cases the performance of the CRAFFT distributed transforms
is 10-50% better than FFTW2 MPI transforms. Users requiring more
information on usage should see the intro_crafft manpage.
Also included in the LibSci 10.5.01 release are new generic
interfaces for C users of CASE. CASE is a collection of
simplified interfaces into LAPACK and ScaLAPACK style routines
introduced in LibSci 10.5.00 that solve the symmetric eigenproblem.
The generic interfaces are provided via C++ function overloading,
allowing a user to call a simple CASE routine name and use optional
arguments for advanced functionality if required. This simplifies
calling CASE from C, improving application developer productivity.
Refer to the 'intro_case' man page for more information.
Known Problems:
---------------
Bug 768141 slatms code seg faults with gnu and xtpe-mc12 loaded
When using CASE interfaces and compiling with gnu add
"-Wl,-whole-archive -lpthread -Wl,-no-whole-archive" to the
link line of your code. This fixes a known link issue with gcc.
Bug 767985 10.5.0 pgi libsci_quadcore_mp.so: undefined reference
to `pgf90_dealloc03'
The workaround is to use PGI 10.6.0 or higher
PETSc
petsc/3.1.05 module now automatically loads the tpsl module
which includes Sundials2.4.0, Hypre 2.6.0b, SuperLU 4.0,
SuperLU_DIST 2.3, MUMPS 4.9.2, Parmetis 3.1.1 and Scotch 5.1.
PETSc 3.1.05 is not supported if the TPSL module is unloaded.
The Cray PETSc 3.1.05 is equivalent to the official
patch release of PETSc-3.1-p5 by Argonne National Laboratory.
TPSL
TPSL(Third Party Scientific Libraries) contains a collection of
third party mathematical libraries for use with PETSc and Trilinos.
These libraries increase the flexibility of PETSc and Trilinos by
providing users with multiple options for solving problems in dense
and sparse linear algebra. The tpsl module is automatically loaded
when PETSc or Trilinos is loaded.
The libraries included are Hypre, SuperLU, SuperLU_dist, MUMPs,
ParMetis, Sundials, and Scotch.
* MUMPS 4.9.2. MUMPS (MUltifrontal Massively Parallel sparse
direct Solver) is a package of parallel, sparse, direct
linear-system solvers based on a multifrontal algorithm. For
further information, see http://graal.ens-lyon.fr/MUMPS/. MUMPS
can now interface with SCOTCH as well.
* SuperLU 4.0. SuperLU is a sequential version of SuperLU_dist
(not included with petsc-complex), and a sequential incomplete
LU preconditioner that can accelerate the convergence of Krylov
subspace iterative solvers. For further information, see
http://crd.lbl.gov/~xiaoye/SuperLU/.
* SuperLU_dist 2.3. SuperLU_dist is a package of parallel,
sparse, direct linear-system solvers (available in Cray LibSci).
For further information, see
http://crd.lbl.gov/~xiaoye/SuperLU/.
* ParMETIS 3.1.1. ParMETIS (Parallel Graph Partitioning and
Fill-reducing Matrix Ordering) is a library of routines that
partition unstructured graphs and meshes and compute fill-
reducing orderings of sparse matrices. For further information,
see
http://glaros.dtc.umn.edu/gkhome/views/metis/.
* HYPRE 2.6.0b. HYPRE is a library of high-performance
preconditioners that use parallel multigrid methods for both
structured and unstructured grid problems (not included with
petsc-complex). For further information, see
http://www.llnl.gov/CASC/linear_solvers/.
* SUNDIALS 2.4.0 (SUite of Nonlinear and DIfferential/ALgebraic
equation Solvers) consists of 5 solvers: CVODE, CVODES, IDA,
IDAS, and KINSOL. In addition, SUNDIALS provides a MATLAB
interface to CVODES, IDAS, and KINSOL that is called sundialsTB.
For further information, see
https://computation.llnl.gov/casc/sundials/main.html.
* Scotch 5.1.1 Scotch is a software package and libraries for
sequential and parallel graph partitioning, static mapping,
sparse matrix block ordering, and sequential mesh and
hypergraph partitioning. For further information, see
http://www.labri.fr/perso/pelegrin/scotch
Documentation:
--------------
http://graal.ens-lyon.fr/MUMPS/
http://crd.lbl.gov/~xiaoye/SuperLU/
http://www.llnl.gov/CASC/linear_solvers/
http://glaros.dtc.umn.edu/gkhome/views/metis/
http://www.labri.fr/perso/pelegrin/scotch/
https://computation.llnl.gov/casc/sundials/main.html
PGI
The trilinos module is not supported with use of PGI/11. This is due to
an incompatibility in c++ code built with previous versions of PGI. For
more information on this incompatibility see the PGI manual at
http://www.pgroup.com/doc/pgiwsrn.pdf. The trilinos module is still
supported with the use of PGI 10.
ATP
- The ATP signal handler (contained in atpSigHandler.o)
is now compiled with position independent code (PIC).
Without this, linking it into shared, dynamic applications
was not possible.
- An ALPS bug can interact with ATP, causing compute nodes
to not free all of their resources during job exit. Such
nodes can become unusable. This is fixed in CLE 3.1 UP02.
However, ATP 1.1.2 has redundantly fixed this situation, for
the benefit of those who have not yet upgraded to CLE 3.1 UP02.
MPI
xt-mpt 5.2.0 -> 5.2.1
Compilers, wrappers and debuggers
PGI 11.1.0 -> 11.2.0
xt-asyncpe 4.8 -> 4.9
ATP 1.1.1 -> 1.1.2
Libraries
Petsc 3.1.04 -> 3.1.05
xt-libsci 10.5.0 -> 10.5.01
TPSL 1.0.0
NOTES:
xt-mpt
Bugfixes.
767319, 768355 Fix adi reference counter problem for builtin datatypes
768385 Add better PMI debug for an ALPs get_rank_from_pipe failure
755921 Add PMI feature that gives SPMD support to DMAPP applications
767095 Add PMI feature to deliver rank and PID attribute file for debuggers
xt-libsci
Purpose:
--------
The LibSci 10.5.01 release adds new C++ interfaces for CRAFFT
providing users an option to use CRAFFT Serial and Distributed
Routines in C++ applications. These interfaces are provided via
C++ function overloading, allowing a user to call a simple, generic
CRAFFT routine name and use optional arguments for advanced
functionality if required. CRAFFT offers a simpler interface for
FFT routines to improve application developer productivity. In
some cases the performance of the CRAFFT distributed transforms
is 10-50% better than FFTW2 MPI transforms. Users requiring more
information on usage should see the intro_crafft manpage.
Also included in the LibSci 10.5.01 release are new generic
interfaces for C users of CASE. CASE is a collection of
simplified interfaces into LAPACK and ScaLAPACK style routines
introduced in LibSci 10.5.00 that solve the symmetric eigenproblem.
The generic interfaces are provided via C++ function overloading,
allowing a user to call a simple CASE routine name and use optional
arguments for advanced functionality if required. This simplifies
calling CASE from C, improving application developer productivity.
Refer to the 'intro_case' man page for more information.
Known Problems:
---------------
Bug 768141 slatms code seg faults with gnu and xtpe-mc12 loaded
When using CASE interfaces and compiling with gnu add
"-Wl,-whole-archive -lpthread -Wl,-no-whole-archive" to the
link line of your code. This fixes a known link issue with gcc.
Bug 767985 10.5.0 pgi libsci_quadcore_mp.so: undefined reference
to `pgf90_dealloc03'
The workaround is to use PGI 10.6.0 or higher
PETSc
petsc/3.1.05 module now automatically loads the tpsl module
which includes Sundials2.4.0, Hypre 2.6.0b, SuperLU 4.0,
SuperLU_DIST 2.3, MUMPS 4.9.2, Parmetis 3.1.1 and Scotch 5.1.
PETSc 3.1.05 is not supported if the TPSL module is unloaded.
The Cray PETSc 3.1.05 is equivalent to the official
patch release of PETSc-3.1-p5 by Argonne National Laboratory.
TPSL
TPSL(Third Party Scientific Libraries) contains a collection of
third party mathematical libraries for use with PETSc and Trilinos.
These libraries increase the flexibility of PETSc and Trilinos by
providing users with multiple options for solving problems in dense
and sparse linear algebra. The tpsl module is automatically loaded
when PETSc or Trilinos is loaded.
The libraries included are Hypre, SuperLU, SuperLU_dist, MUMPs,
ParMetis, Sundials, and Scotch.
* MUMPS 4.9.2. MUMPS (MUltifrontal Massively Parallel sparse
direct Solver) is a package of parallel, sparse, direct
linear-system solvers based on a multifrontal algorithm. For
further information, see http://graal.ens-lyon.fr/MUMPS/. MUMPS
can now interface with SCOTCH as well.
* SuperLU 4.0. SuperLU is a sequential version of SuperLU_dist
(not included with petsc-complex), and a sequential incomplete
LU preconditioner that can accelerate the convergence of Krylov
subspace iterative solvers. For further information, see
http://crd.lbl.gov/~xiaoye/SuperLU/.
* SuperLU_dist 2.3. SuperLU_dist is a package of parallel,
sparse, direct linear-system solvers (available in Cray LibSci).
For further information, see
http://crd.lbl.gov/~xiaoye/SuperLU/.
* ParMETIS 3.1.1. ParMETIS (Parallel Graph Partitioning and
Fill-reducing Matrix Ordering) is a library of routines that
partition unstructured graphs and meshes and compute fill-
reducing orderings of sparse matrices. For further information,
see
http://glaros.dtc.umn.edu/gkhome/views/metis/.
* HYPRE 2.6.0b. HYPRE is a library of high-performance
preconditioners that use parallel multigrid methods for both
structured and unstructured grid problems (not included with
petsc-complex). For further information, see
http://www.llnl.gov/CASC/linear_solvers/.
* SUNDIALS 2.4.0 (SUite of Nonlinear and DIfferential/ALgebraic
equation Solvers) consists of 5 solvers: CVODE, CVODES, IDA,
IDAS, and KINSOL. In addition, SUNDIALS provides a MATLAB
interface to CVODES, IDAS, and KINSOL that is called sundialsTB.
For further information, see
https://computation.llnl.gov/casc/sundials/main.html.
* Scotch 5.1.1 Scotch is a software package and libraries for
sequential and parallel graph partitioning, static mapping,
sparse matrix block ordering, and sequential mesh and
hypergraph partitioning. For further information, see
http://www.labri.fr/perso/pelegrin/scotch
Documentation:
--------------
http://graal.ens-lyon.fr/MUMPS/
http://crd.lbl.gov/~xiaoye/SuperLU/
http://www.llnl.gov/CASC/linear_solvers/
http://glaros.dtc.umn.edu/gkhome/views/metis/
http://www.labri.fr/perso/pelegrin/scotch/
https://computation.llnl.gov/casc/sundials/main.html
PGI
The trilinos module is not supported with use of PGI/11. This is due to
an incompatibility in c++ code built with previous versions of PGI. For
more information on this incompatibility see the PGI manual at
http://www.pgroup.com/doc/pgiwsrn.pdf. The trilinos module is still
supported with the use of PGI 10.
ATP
- The ATP signal handler (contained in atpSigHandler.o)
is now compiled with position independent code (PIC).
Without this, linking it into shared, dynamic applications
was not possible.
- An ALPS bug can interact with ATP, causing compute nodes
to not free all of their resources during job exit. Such
nodes can become unusable. This is fixed in CLE 3.1 UP02.
However, ATP 1.1.2 has redundantly fixed this situation, for
the benefit of those who have not yet upgraded to CLE 3.1 UP02.
Limited access to the login nodes, March 24 21:00-23:00
The IT department of UIB will have planned network maintenance.
Therefore access to Hexagon and FImm login nodes can be limited on March 24th 21:00-23:00.
Therefore access to Hexagon and FImm login nodes can be limited on March 24th 21:00-23:00.