The Maui batch scheduler was upgraded from version 3.2.6p6-snap.1070912066 to 3.2.6p6-snap.1079990700. Hopefully this should fix a memory leak, which has forced us to restart the scheduler every second week to keep it from crashing.
AIX
IBM High Performance Computing Toolkit
The IBM High Performance Computing Toolkit has been installed on the regatta.
The IBM High Performance Toolkit is a collection of tools and libraries which make it easy to collect performance data from a program. It provides libraries to to gather data from programs parallelized with OpenMP, SHMEM and MPI. It allows you to collect data from the hardware performance counters too. With SiGMA a tool is provided which is able to show the utilisation of the memory with a very fine granularity. The PeekPerf GUI is able to show the collected data within one window.
The Turbo libraries allows to improve the communication performance of MPI just by linking with this library.
Please check http://tre.ii.uib.no/ihpct_doc/ for documentation on the individual parts of this toolkit, and examples under the /usr/local/ihpct/examples directory.
libgoto.a
The high performance BLAS library from Kazushige Goto has been installed in /usr/local/lib/libgoto.a.
For more information see http://www.cs.utexas.edu/users/flame/goto/.
XL Fortran compiler upgrade
The May 2004 XL Fortran V8.1 Compiler and Runtime PTF was installed.
Power outage – machines down
There was a power outage in the machine room, and no working
UPS, so all machines crashed.
Downtime:
20040420 18:47-20:19 = 2 hours, 32 minutes
disk crash on backupserver – hsmbcm unavailable
A disk crashed on the backup and fileserver for /net/bcmhsm,
so the /net/bcmhsm filesystem was unavailable for a couple
of minutes while the server crashed and rebootet.
XLF compiler upgrade
The february 2004 XLF compiler fixes was installed. Bugs fixed:
IY43656 - error msg 1587-113 is not very useful
IY49972 - Incorrect output at -O5
IY50157 - Seg Fault at (WHERE(.NOT.LOG) TAB=TAB2)
IY50896 - -qcheck option causes SIGTRAP
IY51019 - vector routines incorrect result with -qhot
IY51075 - Error in calcStk fails
IY51167 - Free heap overwritten when call to MPI_IRECV
IY51237 - -qsmp produces ICE in search_threadprivate_mbr
IY51264 - ICE: lbound(typ%i) on array of derived type
IY51426 - -qipa results in unresolved symbol dbgincut
IY51436 - ICE when compiling three nested modules.
IY51486 - EOSHIFT() function produces INCORROUT
IY51597 - improve real-to-integer conversion performance
IY51634 - 3 objects allocated with same statement
IY52006 - ICE in XLF V8.1.1 for AIX
IY52183 - Missing procedure list entries
IY52363 - ICE in IPA with -C and -qsmp=omp
IY52827 - Large environment causes compiler crash
IY52928 - -O3 causes wrong line numbers to be created
IY53532 - Feb 2004 XL Fortran V8.1 for AIX Compiler PTF
IY53533 - Feb 2004 XL Fortran V8.1 for AIX Runtime PTF
IY53015 - SMP Runtime Lib 1.3.8 January 2004 PTF
IY53435 - XLOPT 132 February 2004 PTF
Tape robot firmware upgrades
The tape robot and drives had a firmware upgrade to correct
some problems we have had with the tape drives. Access to
/migrate and /net/bcmhsm might have been slow or even
temporarily failed during the upgrade (14:00-15:00)
/migrate available again
The /migrate filesystem is now back online.
Downtime: 20040122 08:28- 12:30 = 4 hours 2 minutes.
Maintenance stop for /migrate
The /migrate filesystem will be unavailable thursday January
22. because of maintenance on the tape storage system.
We will unmount /migrate around 08:00 thursday morning, and
bring it back online as soon as we're finished, but it might
take all day. Any processes accessing /migrate this morning
will be killed.
