/home/parallab and /work failure, regatta rebooted

An important network switch just failed, and took down the GPFS filesystems on TRE. Will borrow a new switch from the it-department ASAP.

09:50
New switch in place. Rebooting the nodes to get everything back up in shape.

10:12
Everything on node TRE is up. Rebooting node TO.

10:26
TO is all up. Rebooting node EN.

10:48
All nodes are up. /migrate and /net/bcmhsm is also resolved.

Total downtime:

09:10-10:48 = 1:38 on en, to, tre and fire.
Fimm was mostly unhurt.. only jobs accessing /home/parallab were affected.

HPC Syslog