The tre node has crashed at 15:30, possibly due to disk-issues. We are investigating.
Update 16:30: Machine is now up. Jobs on the "tre" node was lost.
Filesystem hang on tre and en
The GPFS filesystem on tre and en stopped working sometime after Friday 16:30. It was restored (restarted GPFS) again on Saturday 12:00. All jobs lost.
/migrate (but not Bjerknes bcmhsm) is being moved
Due in part to the recent /work failure on tre we are moving the /migrate filesystem. It is therefore unavailable due to being moved from "to" to another machine (bregne). Do not try to use /migrate on "to" anymore.
The moved migrate will be mounted on tre as /migrate and on fimm as /migrate.
We will update this notice with further progress.
Update Thursday 16:50: The move is not entirely complete, but I have mounted /migrate again on tre and fimm. Expect the move to complete in a couple of hours.
The moved migrate will be mounted on tre as /migrate and on fimm as /migrate.
We will update this notice with further progress.
Update Thursday 16:50: The move is not entirely complete, but I have mounted /migrate again on tre and fimm. Expect the move to complete in a couple of hours.
/work filesystem failure on “tre”
The /work filesystem on tre is down (from 07:30), most probably due to disk failure.
We are looking into the issue.
Update (10:55): The /work filesystem is un-recoverable. We are creating a new /work filesystem from the remaining functioning disks. All data on /work is then of course lost.
Update (12:30): All nodes and filesystems are up again.
NB: You must check that you have the necessary folders in /work ready before you submit your jobs.
We are looking into the issue.
Update (10:55): The /work filesystem is un-recoverable. We are creating a new /work filesystem from the remaining functioning disks. All data on /work is then of course lost.
Update (12:30): All nodes and filesystems are up again.
NB: You must check that you have the necessary folders in /work ready before you submit your jobs.
GPFS filesystem problem on fimm
The GPFS filesystem went into a hang at 01:30, related to a power outage, and at 08:00 was forced unmounted/mounted. All running jobs were killed. The filesystem is now OK again.
Matlab upgraded on fimm
Matlab on fimm is upgraded from R14-sp3 to R2007a.
/local/bin/matlab should be used and is now pointing to the new release.
Questions and problems should be sent to support-uib@notur.no
/local/bin/matlab should be used and is now pointing to the new release.
Questions and problems should be sent to support-uib@notur.no
Maintenance stop of tre (to fix cooling)
We need to replace a fan-motor in the cooling unit that cools "tre".
During this maintenence it is expected that the temperature will rise so much that we will be forced to shutdown the regatta nodes.
The maintenance is expected to take place from 08:00 to 10:00 tomorrow, Friday 20.
Update: Friday 11:30, The nodes are now up again.
During this maintenence it is expected that the temperature will rise so much that we will be forced to shutdown the regatta nodes.
The maintenance is expected to take place from 08:00 to 10:00 tomorrow, Friday 20.
Update: Friday 11:30, The nodes are now up again.
Problems with cooling
Regatta nodes have some problems due to the failed cooling in machine room.
We are investigating.
UPDATE: 18:35 - all nodes up again
We are investigating.
UPDATE: 18:35 - all nodes up again
Problems with HSM/Backup-server
There is a problem with the HSM/Backup-server jambu. /migrate and /net/bcmhsm is down. We are investigating.
Update: 15:15 We are waiting on external support to upgrade/fix the firmware on this machine. It is unkown when we will get the machine up again. Possibly tomorrow.
Update: Friday 11:00 We are still waiting for a part to the machine from abroad. Estimated time of arrival was yesterday afternoon - but it still has not arrived yet.
Update: Friday 15:00 The message from the transport company used by the vendor is now that they will not be able to deliver the part until Monday. Unfortunately, this means that HSM and backup will be unavailable until later in the day on Monday 16.
Update: Monday 14:45 The HSM/backup-server jambu is now up again and /migrate and /net/bcmhsm works.
Update: 15:15 We are waiting on external support to upgrade/fix the firmware on this machine. It is unkown when we will get the machine up again. Possibly tomorrow.
Update: Friday 11:00 We are still waiting for a part to the machine from abroad. Estimated time of arrival was yesterday afternoon - but it still has not arrived yet.
Update: Friday 15:00 The message from the transport company used by the vendor is now that they will not be able to deliver the part until Monday. Unfortunately, this means that HSM and backup will be unavailable until later in the day on Monday 16.
Update: Monday 14:45 The HSM/backup-server jambu is now up again and /migrate and /net/bcmhsm works.
New NOTUR cpu-quota
CPU-quota for the period 2007-1 has now been activated on tre (and fimm). Send a request to support-uib@notur.no if you (incorrectly) have wrong quota access. Please note that according to prior agreements the project nn4648k have had their quota moved from tre to fimm with a conversion factor of 1:1.
Note: the machine "fire" is no longer available in NOTUR. Users on tre should note that the machine is now in lower maintenance mode and is prepared to be removed from the system
Note: the machine "fire" is no longer available in NOTUR. Users on tre should note that the machine is now in lower maintenance mode and is prepared to be removed from the system