The maui scheduler will now notice if a project has run out
of prioritized cpuquota, and move them to the FREECPU QoS.
There they will get lowest priority.
Author Archives: lsz075
UPS battery under voltaged
The UPS has failed one of the last couple of days.
The display said "ALARM: BATTERY UV", so it looks like the
batteries has had too low voltage (UV=UnderVoltage), and
therefore the UPS went into bypass-mode.
I switched it back into battery backed power, and it
immediately went into normal operation.
The display said "ALARM: BATTERY UV", so it looks like the
batteries has had too low voltage (UV=UnderVoltage), and
therefore the UPS went into bypass-mode.
I switched it back into battery backed power, and it
immediately went into normal operation.
node14 and node15 rebooted
The networking failed on node14 and node15 of the linux
cluster. Both nodes were complaining about::
eth0: card reports no resources.
Not sure if this is a hardware or software bug, but it's
happened once before. This time we lost 4 of flikka's jobs. They were most likely working hard against NFS, maybe this triggered the crash?
Will upgrade all nodes to the latest kernel from redhat to
see if this fixes the problem.
Node14 downtime: 20:25 20040123 - 07:45 20040126 = 2 days,
11 hours, 10 minutes
Node15 downtime: 12:26 20040124 - 07:45 20040126 = 1 day,
10 hours, 11 minutes
cluster. Both nodes were complaining about::
eth0: card reports no resources.
Not sure if this is a hardware or software bug, but it's
happened once before. This time we lost 4 of flikka's jobs. They were most likely working hard against NFS, maybe this triggered the crash?
Will upgrade all nodes to the latest kernel from redhat to
see if this fixes the problem.
Node14 downtime: 20:25 20040123 - 07:45 20040126 = 2 days,
11 hours, 10 minutes
Node15 downtime: 12:26 20040124 - 07:45 20040126 = 1 day,
10 hours, 11 minutes
/migrate available again
The /migrate filesystem is now back online.
Downtime: 20040122 08:28- 12:30 = 4 hours 2 minutes.
Downtime: 20040122 08:28- 12:30 = 4 hours 2 minutes.
Maintenance stop for /migrate
The /migrate filesystem will be unavailable thursday January
22. because of maintenance on the tape storage system.
We will unmount /migrate around 08:00 thursday morning, and
bring it back online as soon as we're finished, but it might
take all day. Any processes accessing /migrate this morning
will be killed.
22. because of maintenance on the tape storage system.
We will unmount /migrate around 08:00 thursday morning, and
bring it back online as soon as we're finished, but it might
take all day. Any processes accessing /migrate this morning
will be killed.
Java 1.4 installed
Java 1.4 has been installed on the regattas. It's available
in 32-bit and 64-bit versions. To use them, set your path to
include /usr/java14/bin or /usr/java14_64/bin/:
For c-shell::
setenv PATH /usr/java14_64/bin:$PATH
rehash
For ksh/bash::
export PATH=/usr/java14_64/bin:$PATH
in 32-bit and 64-bit versions. To use them, set your path to
include /usr/java14/bin or /usr/java14_64/bin/:
For c-shell::
setenv PATH /usr/java14_64/bin:$PATH
rehash
For ksh/bash::
export PATH=/usr/java14_64/bin:$PATH
linux kernel upgrade + reboot
The linux cluster frontend had a kernel upgrade and quick
reboot today. No jobs affected.
Downtime: 20040106 09:41-09:44 = 3 minutes.
reboot today. No jobs affected.
Downtime: 20040106 09:41-09:44 = 3 minutes.