MOAB has been updated to version 6. Quite noticeable performance improvement.
Torque update to version 2.4.13, mostly bugfixes.
Hexagon: Updated software/libraries
Hexagon has updated software and libraries.
MPI
xt-mpt 5.1.4 -> 5.2.0
Compilers and wrappers
PGI 11.0.0 -> 11.1.0
Intel compiler 12.0.1.107 -> 12.0.2.137
xt-asyncpe 4.7 -> 4.8
Libraries
ATP 1.1.0 -> 1.1.1
STAT 1.1.3 -> 1.1.2
GA 4.3.1 -> 4.3.3
NOTES:
xt-mpt
The following features were added to MPT 5.2.0 over MPT 5.1.4:
- This new version is based on MPICH2 1.3a2 and includes numerous
fixes. It also includes some additional error checking. One class
of these errors involves the use of MPI_IN_PLACE. Some users may
find that using the MPICH_NO_BUFFER_ALIAS_CHECK environment
variable is a useful workaround until their application can be
corrected. See the intro_mpi man page for more information
concerning this environment variable.
In addition the following features were added in MPT updates since
the MPT 5.1.0 release and are also included:
- If a consistent ordering of the reduce operation is needed for
bitwise reproducibility, regardless of system configuration, two
new environment variables have been added to allow that. They are
MPICH_ALLREDUCE_NO_SMP and MPICH_REDUCE_NO_SMP and are described
further in the intro_mpi man page.
- Improvements to MPI-IO collective buffering.
- A check is now made during SHMEM initialization to detect
potential oversubscription of memory, and if detected, the
layout of the memory regions is printed. In addition, a new
environment variable, SHMEM_MEMINFO_DISPLAY, causes SHMEM to
display the layout even if oversubscription is not detected.
See the SHMEM_MEMINFO_DISPLAY environment variable in the
intro_shmem man page for more information.
xt-asyncpe
Purpose:
--------
Bugfix release:
768957 syntax error in /opt/cray/xt-asyncpe/4.7/bin/CC
768968 craype_hugepages2m and craype_hugepages8m are missing
Features:
---------
Support for redesign of petsc and trilinos,
and addition of Third Party Solvers Library product,
(tpsl), which will support both petsc and trilinos.
To see more information and alternatives to using the hugepages
feature, look at the manpage "man intro_hugepages".
ATP
Bugfixes.
STAT
Bugfixes.
MPI
xt-mpt 5.1.4 -> 5.2.0
Compilers and wrappers
PGI 11.0.0 -> 11.1.0
Intel compiler 12.0.1.107 -> 12.0.2.137
xt-asyncpe 4.7 -> 4.8
Libraries
ATP 1.1.0 -> 1.1.1
STAT 1.1.3 -> 1.1.2
GA 4.3.1 -> 4.3.3
NOTES:
xt-mpt
The following features were added to MPT 5.2.0 over MPT 5.1.4:
- This new version is based on MPICH2 1.3a2 and includes numerous
fixes. It also includes some additional error checking. One class
of these errors involves the use of MPI_IN_PLACE. Some users may
find that using the MPICH_NO_BUFFER_ALIAS_CHECK environment
variable is a useful workaround until their application can be
corrected. See the intro_mpi man page for more information
concerning this environment variable.
In addition the following features were added in MPT updates since
the MPT 5.1.0 release and are also included:
- If a consistent ordering of the reduce operation is needed for
bitwise reproducibility, regardless of system configuration, two
new environment variables have been added to allow that. They are
MPICH_ALLREDUCE_NO_SMP and MPICH_REDUCE_NO_SMP and are described
further in the intro_mpi man page.
- Improvements to MPI-IO collective buffering.
- A check is now made during SHMEM initialization to detect
potential oversubscription of memory, and if detected, the
layout of the memory regions is printed. In addition, a new
environment variable, SHMEM_MEMINFO_DISPLAY, causes SHMEM to
display the layout even if oversubscription is not detected.
See the SHMEM_MEMINFO_DISPLAY environment variable in the
intro_shmem man page for more information.
xt-asyncpe
Purpose:
--------
Bugfix release:
768957 syntax error in /opt/cray/xt-asyncpe/4.7/bin/CC
768968 craype_hugepages2m and craype_hugepages8m are missing
Features:
---------
Support for redesign of petsc and trilinos,
and addition of Third Party Solvers Library product,
(tpsl), which will support both petsc and trilinos.
To see more information and alternatives to using the hugepages
feature, look at the manpage "man intro_hugepages".
ATP
Bugfixes.
STAT
Bugfixes.
Hexagon: crash due to HSN Seastar chip failure
Hexagon had a hardware failure in one of the HSN Seastar chips, this caused the machine to crash.
Machine back up again after 60 min on 14:30.
Machine back up again after 60 min on 14:30.
Hexagon: Updated software/libraries
Hexagon has updated software and libraries. Users should recompile to gain stability and performance improvements.
MPI
xt-mpt 5.1.3 -> 5.1.4
Compilers and wrappers
PGI 10.8 -> 11.0
GCC 4.5.1 -> 4.5.2
GDB 7.2
Intel 12.0.0.084 -> 12.0.1.107
xt-asyncpe 4.6 -> 4.7
Libraries
IOBUF 2.0.1 NEW!
Trilinos 10.6.0 -> 10.6.2.0
NOTES:
IOBUF
IOBUF is an I/O buffering library that can reduce the I/O wait time for
programs that read or write large files sequentially. IOBUF
intercepts I/O
system calls such as read and open and adds a layer of buffering, thus
improving program performance by enabling asynchronous prefetching and
caching of file data.
xt-mpt
Bug fixes.
xt-asyncpe
Purpose:
--------
Bug fix release.
These bugs were fixed in this update:
766700 acmlwltime.c:(.text+0x43): undefined reference to
`clock_gettime'
767856 Create a module for Sandy Bridge processors
767863 Provide one or more modules to support and simplify the use of
huge pages
768025 error with PGI compiler and -mp option
768367 add CRAY/lib/72 to Trilinos 10.6.2.0 search path
Features:
---------
Add two modules to support hugepages settings: craype-hugepages2m
craype-hugepages8m.
To see more information and alternatives to using the hugepages
feature, look at the manpage "man intro_hugepages".
Loading either of these modules is sufficient to use the feature.
Trilinos
Purpose:
--------
Trilinos 10.6.2.0 includes several bug fixes and performance
improvement for GNU, Intel and CCE7.3 compilers.
Product and OS Dependencies:
----------------------------
xt-asyncpe 4.7 or later
xt-libsci 10.5.1 or later
petsc-3.1.04 or later
PGI 10.0.0 or later
CCE 7.2 or later
Known Problems:
---------------
Bug 765876 - Compilation failure of Trilinos-10.6 testers with PGI
Due to several template handling problems of PGI C++ compiler, PGI
compiler users might have link-time or run-time errors when using
relatively new capabilities based on C++ template such as Tpetra and
Teuchos packages.
As a workaround, we recommend using another compiler environment:
GNU, Cray and Intel instead of PGI or avoid using these new
capabilities.
Bug 768373 - Need dynamic libraries support for Trilinos
Trilinos 10.6.2.0 does not support dynamic libraries, this
support will be available in a future release.
gcc
The following bugs are fixed in the gcc 4.5.2 release.
764595 GNU gfortran OpenMP - internal error - in extract_omp_for_data,
at omp-low.c:335 [46753]
PGI
Features of PGI 11.0.0 are documented at:
http://www.pgroup.com/doc/pgiwsrn110.pdf
The following bugs are fixed in the PGI 11.0.0 release.
728159 PGF90-BUILT EXECUTABLE FAILS ON FORMATTED READ OF INF AND NAN
DATA [TPR 3962]
750947 Internal compiler error for out-of-order declaration
information
(PARAMETER statement) [15979]
755877 Upgrade to pgi/9.0.4 from 9.0.3 causes user code to segfault
759473 code segfaults when compiled with higher optimization levels
761988 user code segfaults when compiled with pgi 10.x
762496 PGI pgf90 OpenMP - incorrect output for test that should be
rejected at compile time [17297]
763483 ICE on PGI redimension statement [17177]
766531 PGI bug problem report (TPR#17335) submitted to PGI by User
[17335]
766619 sched.h has no macros to guard against multiple
inclusion[TPR 17432]
767221 -i8 option results in wrong iostat= values [17431]
MPI
xt-mpt 5.1.3 -> 5.1.4
Compilers and wrappers
PGI 10.8 -> 11.0
GCC 4.5.1 -> 4.5.2
GDB 7.2
Intel 12.0.0.084 -> 12.0.1.107
xt-asyncpe 4.6 -> 4.7
Libraries
IOBUF 2.0.1 NEW!
Trilinos 10.6.0 -> 10.6.2.0
NOTES:
IOBUF
IOBUF is an I/O buffering library that can reduce the I/O wait time for
programs that read or write large files sequentially. IOBUF
intercepts I/O
system calls such as read and open and adds a layer of buffering, thus
improving program performance by enabling asynchronous prefetching and
caching of file data.
xt-mpt
Bug fixes.
xt-asyncpe
Purpose:
--------
Bug fix release.
These bugs were fixed in this update:
766700 acmlwltime.c:(.text+0x43): undefined reference to
`clock_gettime'
767856 Create a module for Sandy Bridge processors
767863 Provide one or more modules to support and simplify the use of
huge pages
768025 error with PGI compiler and -mp option
768367 add CRAY/lib/72 to Trilinos 10.6.2.0 search path
Features:
---------
Add two modules to support hugepages settings: craype-hugepages2m
craype-hugepages8m.
To see more information and alternatives to using the hugepages
feature, look at the manpage "man intro_hugepages".
Loading either of these modules is sufficient to use the feature.
Trilinos
Purpose:
--------
Trilinos 10.6.2.0 includes several bug fixes and performance
improvement for GNU, Intel and CCE7.3 compilers.
Product and OS Dependencies:
----------------------------
xt-asyncpe 4.7 or later
xt-libsci 10.5.1 or later
petsc-3.1.04 or later
PGI 10.0.0 or later
CCE 7.2 or later
Known Problems:
---------------
Bug 765876 - Compilation failure of Trilinos-10.6 testers with PGI
Due to several template handling problems of PGI C++ compiler, PGI
compiler users might have link-time or run-time errors when using
relatively new capabilities based on C++ template such as Tpetra and
Teuchos packages.
As a workaround, we recommend using another compiler environment:
GNU, Cray and Intel instead of PGI or avoid using these new
capabilities.
Bug 768373 - Need dynamic libraries support for Trilinos
Trilinos 10.6.2.0 does not support dynamic libraries, this
support will be available in a future release.
gcc
The following bugs are fixed in the gcc 4.5.2 release.
764595 GNU gfortran OpenMP - internal error - in extract_omp_for_data,
at omp-low.c:335 [46753]
PGI
Features of PGI 11.0.0 are documented at:
http://www.pgroup.com/doc/pgiwsrn110.pdf
The following bugs are fixed in the PGI 11.0.0 release.
728159 PGF90-BUILT EXECUTABLE FAILS ON FORMATTED READ OF INF AND NAN
DATA [TPR 3962]
750947 Internal compiler error for out-of-order declaration
information
(PARAMETER statement) [15979]
755877 Upgrade to pgi/9.0.4 from 9.0.3 causes user code to segfault
759473 code segfaults when compiled with higher optimization levels
761988 user code segfaults when compiled with pgi 10.x
762496 PGI pgf90 OpenMP - incorrect output for test that should be
rejected at compile time [17297]
763483 ICE on PGI redimension statement [17177]
766531 PGI bug problem report (TPR#17335) submitted to PGI by User
[17335]
766619 sched.h has no macros to guard against multiple
inclusion[TPR 17432]
767221 -i8 option results in wrong iostat= values [17431]
Fimm: work file system crashed
work file system crashed due to disk failure , we are working on it , will keep you updated. all the jobs running on work file system were killed.
Update: 18:00
we are still working on getting back work file system. there are some issue with backbone storage system. estimated down time is until tomorrow lunch time.
Sorry for inconvenience.
Update : 18:30 21/01/2011
We have to inform you that work file system on fimm.bccs.uib.no crashed
yesterday (20/01/2011) at 14:50 , and we lost all data on it. After file
system crash we did try our best to rescue it but we were not able to get
anything back.
Since /work are designed to be *Temporary* file system to increase
efficiency of running jobs it was not in back up. therefor all your data
on /work are lost unfortunately.
Sorry for inconvenience.
We created new work file system, and we will create directory for you
upon request.You can send mail to support-uib@notur.no or directly
contact me at Phone: (+47) 55 58 40 43 and mail Saerda Halifu
Update: 18:00
we are still working on getting back work file system. there are some issue with backbone storage system. estimated down time is until tomorrow lunch time.
Sorry for inconvenience.
Update : 18:30 21/01/2011
We have to inform you that work file system on fimm.bccs.uib.no crashed
yesterday (20/01/2011) at 14:50 , and we lost all data on it. After file
system crash we did try our best to rescue it but we were not able to get
anything back.
Since /work are designed to be *Temporary* file system to increase
efficiency of running jobs it was not in back up. therefor all your data
on /work are lost unfortunately.
Sorry for inconvenience.
We created new work file system, and we will create directory for you
upon request.You can send mail to support-uib@notur.no or directly
contact me at Phone: (+47) 55 58 40 43 and mail Saerda Halifu
/bcmhsm and /migrate unavailable, Jan 14th 9:00-16:00
Due to backup server is going to have an urgent scheduled maintenance /migrate and /bcmhsm will be unavailable tomorrow Friday, January 14th from 9:00 to 16:00. The restore from backup functionality will not be available during this time slot aswell.
Sorry for such a short notice.
Sorry for such a short notice.
Hexagon: scheduled maintenance, Jan. 19th 09:00
Hexagon will have a scheduled maintenance starting at Wednesday January 19th 09:00. Downtime will last for 2 days and we expect to have machine up at Thursday January 20th late evening.
This note will be updated when we have more information.
We will upgrade machine to the latest Cray software release, fix cabinet 3 issues and perform other hardware maintenance.
Note that a reservation is set in the queue system. Jobs must have a walltime set so that they may finish before the maintenance to be allowed to start.
Update: 20.01.2011 15:15 We finished maintenance. Machine is back online, almost in full configuration.
This note will be updated when we have more information.
We will upgrade machine to the latest Cray software release, fix cabinet 3 issues and perform other hardware maintenance.
Note that a reservation is set in the queue system. Jobs must have a walltime set so that they may finish before the maintenance to be allowed to start.
Update: 20.01.2011 15:15 We finished maintenance. Machine is back online, almost in full configuration.
Hexagon: crash due to power issue (EPO) on cab 3
Hexagon crashed at 20:20 due to EPO (emergency power off) problem on cabinet 3. We are doing diagnostics and will then restart machine.
Update 22:50, machine is back up.
Update 22:50, machine is back up.
Hexagon: Updated software/libraries
Hexagon has updated software/libraries. Users should recompile their application to gain new features and updates.
MPI
xt-mpt 5.1.2 -> 5.1.3
Compilers and debugging
xt-asyncpe 4.5 -> 4.6
Intel compiler 11.1.073 -> 12.0.084
Chapel 1.2.0 -> 1.2.1
ATP 1.0.3 -> 1.1.0
STAT 1.1.1 -> 1.1.2
Performance tools and Math libs
xt-craypat/apprentice2 5.1.2 -> 5.1.3
xt-libsci 10.4.9 -> 10.5.0
NOTES:
xt-mpt
The following features were added to MPT 5.1.3:
- Improvements to MPI-IO collective buffering.
ATP
Abnormal Termination Processing (ATP) is a system that monitors Cray XT
System user applications, and should an application take a system trap,
ATP performs analysis on the dying application. With release 1.1 all of
the stack backtraces of the application processes are gathered into a
merged stack backtrace tree and written to disk as the file
"atpMergedBT.dot". The stack backtrace for the first process to die is
sent to stderr as is the number of the signal that caused the death.
atpMergedBT.dot can be viewed with 'statview', a component of the STAT
package (module load stat). The merged stack backtrace tree provides
a concise, yet comprehensive, view of what the application was doing at
the time of its death.
Further information on ATP can found in the intro_atp man page.
Release notes for release 1.1.0
--------------------------------
1.1.0:
- ATP is now automatically linked in to user applications
and automatically initialized. That is, users do not need
to modify their source code nor their link line (and, in
fact, should not). One must use the Cray compiler drivers
(cc, CC, ftn) to achieve this.
In order for this to occur one must do all of the following:
o have the module atp/1.1.0 or greater loaded (which
is automatically done by the latest PrgEnv modules)
o use the Cray compiler drivers when linking
o relink your application
- It is now necessary to overtly define the environment variable
'ATP_ENABLED' so that the running of an application gets ATP
processing.
- ATP will now perform its analysis in the event of the
queuing system aborting the job due to the wall clock
expiring. Note that the amount of time between when the
queuing system signals the that wall clock has expired
and when the queuing system SIGKILLs the job is
something that sites can customize. If sufficient time
is not configured, this feature may not be able to
complete its task. Thirty seconds is typically more than
generous.
- The environment variable ATP_HOLD_TIME can be used to
define the number of minutes that ATP should hold a dying
application in stasis so that it can be attached via
a debugger.
- ATP is now willing to collect data, even if some nodes
have stopped responding. Since such a system is clearly
sick in some manner, this may not always be successful.
- Fixed a memory corruption bug that could cause various,
unpredictable
symptoms.
- ATP no longer needs be installed on a cross mounted location
xt-libsci
The following features were added to LibSci 10.5.0:
* Now includes CASE, a collection of simplified interfaces into high
performance LAPACK and ScaLAPACK style routines that find the
eigenvalues
and eigenvectors of a symmetric or hermitian matrix. CASE is
written in
Fortran but has interfaces for C users as well. CASE is provided for
serial or parallel problems, with real or complex data types and
single
or double precision. It has generic interfaces callable from
Fortran and
specific interfaces callable from Fortran or C. See the
'intro_case' man
page for more information.
The LibSci 10.5.0 release adds new interfaces for CRAFFT providing
users
an option to use CRAFFT Serial and Distributed Routines in C
applications.
CRAFFT offers a simpler interface for FFT routines to improve
application
developer productivity. In some cases the performance of the CRAFFT
distributed transforms is 10-50% better than FFTW2 MPI transforms.
Users
requiring more information on usage should see the intro_crafft
man page.
STAT
Eliminate the need to have the STAT daemon installed on the lustre
file system.
Perftools (craypat/apprentice2)
IMPORTANT NOTE: The perftools modulefile needs to be loaded,
otherwise the
following error will occur if a user attempts to load craypat or
apprentice2.
ERROR: xt-craypat and apprentice2 have been merged into one module
called perftools.
Please run the following to load perftools:
module unload xt-craypat apprentice2
module load perftools
General
* UPC and CAF require CCE version 7.2.7 or later (see bug 763219)
pat_build
* update the following trace groups: adios, dmapp, hdf5, netcdf,
petsc, pgas, pthreads, upc
* allow tracing of functions defined as WEAK (bug 764102)
* remove PAT_BUILD_ADDSYM and PAT_BUILD_TRACE_ARCHIVE environment
variables
* add new directives to control addsym utility features
* improve tracing functions that have aggregates as formal parameters
(bug 764058)
pat_report
* now shows inclusive loop times from CCE -hprofile_generate option
MPI
xt-mpt 5.1.2 -> 5.1.3
Compilers and debugging
xt-asyncpe 4.5 -> 4.6
Intel compiler 11.1.073 -> 12.0.084
Chapel 1.2.0 -> 1.2.1
ATP 1.0.3 -> 1.1.0
STAT 1.1.1 -> 1.1.2
Performance tools and Math libs
xt-craypat/apprentice2 5.1.2 -> 5.1.3
xt-libsci 10.4.9 -> 10.5.0
NOTES:
xt-mpt
The following features were added to MPT 5.1.3:
- Improvements to MPI-IO collective buffering.
ATP
Abnormal Termination Processing (ATP) is a system that monitors Cray XT
System user applications, and should an application take a system trap,
ATP performs analysis on the dying application. With release 1.1 all of
the stack backtraces of the application processes are gathered into a
merged stack backtrace tree and written to disk as the file
"atpMergedBT.dot". The stack backtrace for the first process to die is
sent to stderr as is the number of the signal that caused the death.
atpMergedBT.dot can be viewed with 'statview', a component of the STAT
package (module load stat). The merged stack backtrace tree provides
a concise, yet comprehensive, view of what the application was doing at
the time of its death.
Further information on ATP can found in the intro_atp man page.
Release notes for release 1.1.0
--------------------------------
1.1.0:
- ATP is now automatically linked in to user applications
and automatically initialized. That is, users do not need
to modify their source code nor their link line (and, in
fact, should not). One must use the Cray compiler drivers
(cc, CC, ftn) to achieve this.
In order for this to occur one must do all of the following:
o have the module atp/1.1.0 or greater loaded (which
is automatically done by the latest PrgEnv modules)
o use the Cray compiler drivers when linking
o relink your application
- It is now necessary to overtly define the environment variable
'ATP_ENABLED' so that the running of an application gets ATP
processing.
- ATP will now perform its analysis in the event of the
queuing system aborting the job due to the wall clock
expiring. Note that the amount of time between when the
queuing system signals the that wall clock has expired
and when the queuing system SIGKILLs the job is
something that sites can customize. If sufficient time
is not configured, this feature may not be able to
complete its task. Thirty seconds is typically more than
generous.
- The environment variable ATP_HOLD_TIME can be used to
define the number of minutes that ATP should hold a dying
application in stasis so that it can be attached via
a debugger.
- ATP is now willing to collect data, even if some nodes
have stopped responding. Since such a system is clearly
sick in some manner, this may not always be successful.
- Fixed a memory corruption bug that could cause various,
unpredictable
symptoms.
- ATP no longer needs be installed on a cross mounted location
xt-libsci
The following features were added to LibSci 10.5.0:
* Now includes CASE, a collection of simplified interfaces into high
performance LAPACK and ScaLAPACK style routines that find the
eigenvalues
and eigenvectors of a symmetric or hermitian matrix. CASE is
written in
Fortran but has interfaces for C users as well. CASE is provided for
serial or parallel problems, with real or complex data types and
single
or double precision. It has generic interfaces callable from
Fortran and
specific interfaces callable from Fortran or C. See the
'intro_case' man
page for more information.
The LibSci 10.5.0 release adds new interfaces for CRAFFT providing
users
an option to use CRAFFT Serial and Distributed Routines in C
applications.
CRAFFT offers a simpler interface for FFT routines to improve
application
developer productivity. In some cases the performance of the CRAFFT
distributed transforms is 10-50% better than FFTW2 MPI transforms.
Users
requiring more information on usage should see the intro_crafft
man page.
STAT
Eliminate the need to have the STAT daemon installed on the lustre
file system.
Perftools (craypat/apprentice2)
IMPORTANT NOTE: The perftools modulefile needs to be loaded,
otherwise the
following error will occur if a user attempts to load craypat or
apprentice2.
ERROR: xt-craypat and apprentice2 have been merged into one module
called perftools.
Please run the following to load perftools:
module unload xt-craypat apprentice2
module load perftools
General
* UPC and CAF require CCE version 7.2.7 or later (see bug 763219)
pat_build
* update the following trace groups: adios, dmapp, hdf5, netcdf,
petsc, pgas, pthreads, upc
* allow tracing of functions defined as WEAK (bug 764102)
* remove PAT_BUILD_ADDSYM and PAT_BUILD_TRACE_ARCHIVE environment
variables
* add new directives to control addsym utility features
* improve tracing functions that have aggregates as formal parameters
(bug 764058)
pat_report
* now shows inclusive loop times from CCE -hprofile_generate option
Hexagon: failed seastar in one module
Hexagon got a problem in high speed network. We are working to fix the problem. All running jobs failed.
Update: 22:55 Machine is back online.
Update: 22:55 Machine is back online.