The networking failed on node14 and node15 of the linux
cluster. Both nodes were complaining about::
eth0: card reports no resources.
Not sure if this is a hardware or software bug, but it's
happened once before. This time we lost 4 of flikka's jobs. They were most likely working hard against NFS, maybe this triggered the crash?
Will upgrade all nodes to the latest kernel from redhat to
see if this fixes the problem.
Node14 downtime: 20:25 20040123 - 07:45 20040126 = 2 days,
11 hours, 10 minutes
Node15 downtime: 12:26 20040124 - 07:45 20040126 = 1 day,
10 hours, 11 minutes