Updated libraries on hexagon, Dec. 15th

lsz075 • December 15, 2008

Several key software and library packages have now been updated on hexagon.
We recommend that you recompile your programs to get the increased performance and fixes that has been introduced. Note that you need to log out and in again to get the new modules loaded by default.

See below for some excerpts from the release notes.

MPI and compiler wrappers:
xt-mpt 3.0.4 -> 3.1.0
xt-asyncpe 1.2 -> 2.0

Math libs (LAPACK, BLAS etc):
xt-libsci 10.3.0 -> 10.3.1

Notes regarding new MPI version from Cray:

This MPT 3.1 version contains the following new features.

* Move from MPICH2 1.0.4p1 to MPICH2 1.0.6p1
* Cpu affinity support
* Raise the maximum number of MPI ranks from 64,000 to 256,000 ranks.
* Raise the maximum number of SHMEM PEs from 32,000 to 256,000 SHMEM PEs.
* Automatically-tuned default values for MPICH environment variables
* Dynamic allocation of MPI internal message headers
* Improvements to start-up times when running at high process counts(40K
cores or more)
* Significant performance improvements for the MPI_Allgather collective
* Improvements for some error messages
* Wildcard matching for filenames in MPICH_MPIIO_HINTS
* Support for the Cray Compiling Environment (CCE) 7.0 compiler in
x86 ABI compatible mode
* MPI Barrier before collectives
* MPI-IO collective buffering alignment
* MPI Thread Safety
* Improved performance for on-node very large discontiguous messages

More detail for some of these below.

* Move from MPICH2 1.0.4p1 to MPICH2 1.0.6p1
- Performance improvements for derived datatypes (including packing and
communication) through loop-unrolling and buffer alignment.

- Performance improvements for MPI_Gather when non-power-of-two processes are
used, and when a non-zero ranked root is performing the gather.

- MPI_Comm_create now works for intercommunicators.

- Many other bug fixes, memory leak fixes and code cleanup.

- Includes a number of specific fixes from MPICH2 1.0.7 for regressions
introduced in MPICH1 1.0.6p1

* Automatically-tuned default values for MPICH environment variables

Several of the MPICH environment variable default values are now dependent
on the total number of processes in the job. Previously, these defaults
were set to static values. This feature is designed to allow higher scaling
of MPT jobs with fewer tweaks to environment variables. For more information
on how the new defaults are calculated, please see the "mpi" man page. As
before, the user is able to override any of these defaults by setting the
corresponding environment variable. The new default values are displayed
via the MPICH_ENV_DISPLAY setting.

* Dynamic allocation of MPI internal message headers

If additional message headers are required during program execution, MPI
dynamically allocates more message headers in quantities of MPICH_MSGS_PER_PROC.

* Significant performance improvements for the MPI_Allgather collective

This change adds in a new MPI_Allgather collective routine which scales well
for small data sizes. The default is to use the new algorithm for any
MPI_Allgather calls with 2048 bytes of data or less. The cutoff value can be
changed by setting the new MPICH_ALLGATHER_VSHORT_MSG environment variable.
In addition, some MPI functions use allgather internally and will now be
significantly faster. For example MPI_Comm_split will be significantly faster
at high pe counts. Initial results show improvements of around 2X around 16
cores to over 100X above 20K cores.

* Improvements for some error messages

This change fixes a small number of messages specific to Cray platforms that
were incorrect due to the merging of the Cray and ANL messages and message
handling processes.

* Wildcard matching for filenames in MPICH_MPIIO_HINTS

Support has been added for wildcard pattern matching for filenames in the
MPICH_MPIIO_HINTS environment variable. This allows easier specification of
hints for multiple files that are opened with MPI_File_open in the program.
The filename pattern matching follows standard shell pattern matching rules for
meta-characters ?, \, [], and *.

* MPI Barrier before collectives

In some situations a Barrier inserted before a collective may improve
performance due to load imbalance. This feature adds support for a new
MPICH_COLL_SYNC environment variable which will cause a Barrier call to
be inserted before all collectives or only certain collectives. See the
"mpi" man page for more information.

* MPI-IO collective buffering alignment

This feature improves MPI-IO by aligning collective buffering file domains
on Lustre boundaries. The new algorithms take into account physical I/O
boundaries and the size of the I/O requests. The intent is to improve
performance by having the I/O requests of each collective buffering node
(aggregator) start and end on physical I/O boundaries and to not have more
than one aggregator reference for any given stripe on a single collective
I/O call. The new algorithms are enabled by setting the MPICH_MPIIO_CB_ALIGN
environment variable but may become the default in a future release.
Initial results have shown as much as a 4X improvement on some benchmarks.
See the "mpi" man page for more information.

* MPI Thread Safety

The MPI Thread Safety feature provides a high-performance implementation
of thread-safety levels MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, and
MPI_THREAD_SERIALIZE in the main MPI library.

The MPI_THREAD_MULTIPLE thread-safety level support is in a separate
"mpich_threadm" library and is not a high-performance implementation.
Use "-lmpich_threadm" when linking to MPI_THREAD_MULTIPLE routines.

Set the MPI Thread Safety MPICH_MAX_THREAD_SAFETY environment variable
to the desired level (MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED,
MPI_THREAD_SERIALIZED, or MPI_THREAD_MULTIPLE), to control the value
returned in the "provided" argument of the MPI_Init_thread() routine.

See the "mpi" man page and the MPI standard for more information.

* Improved performance for on-node very large discontiguous messages

This feature enables a new algorithm for the on-node SMP device to process large
discontiguous messages. The new algorithm allows the use of our on-node
Portals-assisted call that is used in our MPT 3.0 single-copy feature rather
than buffering the data into very small chunks as was currently being done.
Some applications have seen as much as a 3X speedup with discontiguous messages
in excess of 4M bytes.

Updated software/libraries on hexagon, Dec. 5th

lsz075 • December 5, 2008

Scheduled maintenance for fimm cluster December 15th

lsz075 • December 2, 2008

We will have a scheduled maintenance for fimm cluster on Monday December 15th at 09:00. Estimated downtime is 8 hours. Task is to extend /work and /work2 directories.

Update 15th, 08:15: The login node has been blocked for new connections. It will be made available again as soon as the upgrade has been completed.

Update 16th, 00:10: Fimm is now, after some delay, updated and available for all users. /work and /work2 has been upgraded with more capacity. The queuing system and the scheduler has be upgraded to a newer version. The global file system has also be upgraded together with the latest kernel.
We have also removed the intel compiler from PATH, and replaced it with the pgi compiler. You can however still use the intel compiler, after executing "module swap pgi intel". If you experience any trouble please inform us at support-uib@notur.no.

Disk-controller failure on hexagon, Sat 29. nov

lsz075 • November 29, 2008

One of the disk-controllers for hexagon has failed, forcing us to shutdown the machine. We are investigating possible workarounds.

Update Sat., 23:00, unfortunately no workaround was found, we are waiting for hardware replacement to arrive.

Update Mon., 14:30, we expect new hardware to arrive tomorrow (Tue).

Update Tue., 09:00, we have a workaround in place and have done the scheduled maintenance work that was planned for Thursday. The machine will have to be shutdown again when the replacement disk-controller arrives today, therefore only short jobs will be allowed and users should expect to be logged-out of the login nodes on short notice.

Update Tue., 14:15, controller arrives and we shutdown the machine and replace controller.

Update Tue., 16:00, we are currently running file-system check to be sure that all is OK.

Update Tue., 16:50, machine is running again, thank you for your patience.

Scheduled maintenance for hexagon on Dec. 4th

lsz075 • November 27, 2008

There will be a short scheduled maintenance for hexagon on Thursday December 4th at 13:30. Expected downtime is around 30 minutes. As usual, a reservation will be put in place in the queue-system to avoid having jobs running at the time of maintenance.

Update December 3rd: This maintenance has been canceled since the unexpected downtime over the weekend.

Fimm global file system crash. Nov. 25th

lsz075 • November 25, 2008

At 15:30 global file system on fimm crashed. All file system is down, we are working on solving the problem.

Update 16:55: The file system on fimm is now up again. All running jobs sadly crashed and has to be resubmitted.

We are sorry for the inconvenience.

Maintenance on /migrate and /bcmhsm Nov. 24th

lsz075 • November 24, 2008

We will install a 10Gb card on the server providing /migrate and /bcmhsm today at 14:45. The downtime should be minimal.

Update 15:30: The server is now up again. The server connected to the tape robot is not, so /migrate, /bcmhsm files which are on tape will not be available until this has been solved.

Update 16:00: Tape robot is now available. All systems should now be available.

Scheduled maintenance for hexagon on Nov. 17th

lsz075 • November 11, 2008

We will have a short scheduled maintenance for hexagon on Monday November 17th at 13:30. Estimated downtime is 1 hour. Task is to apply a patch for a memory bug.

Update Monday 17th 12:15, due to empty queue system and an issue with the batch system scheduler we did the restart early, we are sorry for any inconvenience this may have caused.

Update 12:30, machine is now running again.

Hexagon crash on October 21st

lsz075 • October 21, 2008

hexagon got a failed voltage regulator on one of the modules at 16:20, this in turn caused a crash on several of the io-nodes responsible for /work.
We are collecting debug information and rebooting (replacing hardware at next scheduled maintenance).

Update 17:30, hexagon is running again. All jobs must be resubmitted.

Scheduled maintenance on hexagon moved to Nov. 5th.

lsz075 • October 17, 2008

October 27th at 14:00 will hexagon be unavailable due a scheduled maintenance. Some faulty hardware will be replaced and some software will be updated. The maintenance will probably take 5 hours.

Update: The scheduled maintenance have been moved to 10:00 at November the 5th.

Update Nov 5th 10:00: Maintenance is now started.

Update 18:30: We have some problems with the file system /work. We are working on solving this problem.

Update 23:00: Hexagon is now running again.

HPC Syslog

Log over changes and events on UiB's HPC systems