GAMESS has been installed on the linux cluster::
/localnet/gamess/rungms
Linux
norgrid – scsi backplane replaced
The norgrid.bccs.no node was having trouble scanning the
scsi buses. Seems to have been a problem with the scsi
backplane. IBM replaced it today, and the problem disappeared.
scsi buses. Seems to have been a problem with the scsi
backplane. IBM replaced it today, and the problem disappeared.
node14 and node15 rebooted
The networking failed on node14 and node15 of the linux
cluster. Both nodes were complaining about::
eth0: card reports no resources.
Not sure if this is a hardware or software bug, but it's
happened once before. This time we lost 4 of flikka's jobs. They were most likely working hard against NFS, maybe this triggered the crash?
Will upgrade all nodes to the latest kernel from redhat to
see if this fixes the problem.
Node14 downtime: 20:25 20040123 - 07:45 20040126 = 2 days,
11 hours, 10 minutes
Node15 downtime: 12:26 20040124 - 07:45 20040126 = 1 day,
10 hours, 11 minutes
cluster. Both nodes were complaining about::
eth0: card reports no resources.
Not sure if this is a hardware or software bug, but it's
happened once before. This time we lost 4 of flikka's jobs. They were most likely working hard against NFS, maybe this triggered the crash?
Will upgrade all nodes to the latest kernel from redhat to
see if this fixes the problem.
Node14 downtime: 20:25 20040123 - 07:45 20040126 = 2 days,
11 hours, 10 minutes
Node15 downtime: 12:26 20040124 - 07:45 20040126 = 1 day,
10 hours, 11 minutes
linux kernel upgrade + reboot
The linux cluster frontend had a kernel upgrade and quick
reboot today. No jobs affected.
Downtime: 20040106 09:41-09:44 = 3 minutes.
reboot today. No jobs affected.
Downtime: 20040106 09:41-09:44 = 3 minutes.