An important network switch just failed, and took down the GPFS filesystems on TRE. Will borrow a new switch from the it-department ASAP.

09:50
New switch in place. Rebooting the nodes to get everything back up in shape.

10:12
Everything on node TRE is up. Rebooting node TO.

10:26
TO is all up. Rebooting node EN.

10:48
All nodes are up. /migrate and /net/bcmhsm is also resolved.

Total downtime:

09:10-10:48 = 1:38 on en, to, tre and fire.

Fimm was mostly unhurt.. only jobs accessing /home/parallab were affected.

Because of major stability problems caused by NFS, the home directories has moved from NFS to GPFS on fimm. We hope this should fix all performance issues for the home-directories, and also make fimm more robust. It should no longer be depending on external filesystems, and the load on the regatta shouldn't affect the fimm-cluster anymore.

The new home directory on fimm is under /home/fimm/$department/$username/. This is only accessible on fimm.

The old home directory was /home/parallab/$department/$username/. This will still be the home directory on TRE, but if you only or mainly use FIMM, please move your files from /home/parallab, to /home/fimm/.

Very sorry for any inconvenience this sudden change has caused, but the situation on fimm was getting quite bad, and something needed to be done.

Totalview as upgraded to v6.7.0-1 on FIMM and TRE.

* New Memory Debugging Features

- Heap Debugging Filters
- Export Memory Debugging Information
- Error Event Reporting Controls
- Improved Memory Event Details window
- Graphical Heap Browsing
- Pointer Queries

For more detail, check http://www.etnus.com/TotalView/Latest_Release.html
and http://www.etnus.com/Documentation/rel6/pdf/new_features.pdf

The Bergen Center for Computational Science (BCCS) at the University of Bergen has planned the following two courses on standard techniques for efficient and portable parallel programming.

* MPI programming (distributed memory, message passing)
Date : Tuesday 15 February 9.30am - 3pm
Speaker : Thierry Matthey

* OpenMP programming (shared memory)
Date : Wednesday 16 February, 10am - 3pm
Speaker : Helge Avlesen

Both are 1-day courses and will consist of

* short introduction to the supercomputer facilities at UiB (morning)

* a 2-3 hour overview of the relevant MPI/OpenMP concepts (morning)

* a hands-on session with exercises (afternoon)

More information about location and registration can be found on

http://www.parallaw.uib.no/courses/mpi-openmp

Deadline for registration is Thursday 10 February.

Participation is FREE but registration is necessary.

The regatta node EN crashed around 08:00 this morning. This caused filesystems to hang on fimm, and also made /work disappear from TO and TRE for a couple of hours.

Everything should be back up again now.

Downtime: ~4 hours on EN.

-------------

Update 20050121: The crash was caused by an uncorrectable memory error.

-------------

Update 20050124: IBM wants to replace 32 GB memory module. Also wants to upgrade several firmwares, so we should schedule a stop on all nodes soon.