Downtime

A failure is being fixed on the robot. All backup operations (backup and restore) as well as the /migrate filesystem on tre,to,en is unavailable.

Update, 15:09: Some parts needs to be replaced, should arrive tomorrow.

Update, 2005-07-12 16:54: Replaced a cable in the robot. Backup/restore and /migrate is available.

The GPFS filesystems is unavailable because of several failed disks. The problem seems to be identical to what happened March 30.

http://www.parallaw.uib.no/syslog/56

We had installed a firmware fix for this problem, but that fix seems to be incomplete. A newer more complete fix will be installed ASAP.

Downtime started Monday June 27 00:42:51.

fimm got back on-line at 11:32:00
Downtime: 10 hours 50 minutes

Firmware on SATABlades upgraded to 'firmware 9037'. This will hopefully fix this 'failing disks' problem.

Regatta node TO and TRE had downtime from 08:00 to 12:45
for update of firmware.

Regatta node EN had downtime from 08:00 to 16:00
for update of firmware and change of 32GB memory module.
This node had problem booting from root-disks after hardware changes.
Moving the disks to TO and back again made EN bootable (unclear why).

Linux cluster FIRE had downtime from 08:00 to 16:00 due to dependancy on disks on EN.