Update 19.03 18:40: Upgrade is finished, machine is open for SSH access.

Update, Monday 19th: We are finalizing the upgrade, the machine is up and we expect to allow logins later today. When logging in for the first time, please remember to recompile ALL your applications and libraries to be compatible with the new system.


Hexagon will get a major hardware and software upgrade in the first week of March.

The current schedule is for the upgrade to start on March 9th 2012 at 8:00 (a delay of 1 week from initial announcement) and to last for about 1 week.

NOTE: A reservation is set in the queue system. Thus, jobs must have a walltime set so that they can finish before the maintenance to be allowed to start.

The upgraded hexagon will have the following specs:
* Cray XE6m-200
* 204.9 TFlops peak performance
* 22272 cores
* AMD Opteron 6276 (2.3GHz "Interlagos")
* 1392 CPUs (sockets)
* 696 nodes
* 32 cores per node
* 32GB RAM per node (1GB/core)
* New interconnect: Cray Gemini
* New topology: 2.5D Torus
* OS: Cray Linux Environment, CLE 4.0 (Based on Novell Linux SLES11sp1)

Although the user experience will be very much the same after the upgrade (with just newer versions of familiar software, and a faster machine) please observe the following critical point:

IMPORTANT! All applications MUST be recompiled to be compatible with the new and upgraded hexagon.

You can expect that the software list that is available via "modules" to be short right after the upgrade for then to grow during the next few weeks. Please be patient while we recompile and install the necessary applications and libraries.

We remind you that you have to move all files not related to your current runs out from the /work file system. Please see our previous email for details.

IMPORTANT! The old /work will be available on new hexagon only up to April 9th. On April 11th it will be completely DESTROYED!
It is therefore very important that you participate in moving data out of hexagon or transfer it to the new file-system. The old /work will be mounted back after a reformat and used as secondary storage.

You can follow the upgrade at our Syslog:
http://computing.uni.no/syslog

Please contact support-uib at notur.no if you have any questions regarding the upgrade.

Hello,
Maui job scheduler on fimm is still behaving strange. Jobs get scheduled to random nodes. This can break already running jobs on these nodes. Please check results of completed jobs and expect irregular job cancellations over the next days.

We are working on resolving the problem and will let you know when we're back with regular job running conditions.

Hi,

Update: 11:00

Maui job scheduler on fimm is taken down due to some problem.
we are working on resolving problem. will keep you updated.

Update: 13:20

We restart maui and some other processes, due to restart some of your jobs was killed, please check your job status , and submit it again if necessary.

We are sorry for inconvenience.

Hexagon cabinets c1 and c8 experienced Emergency Power Off failure on Dec 2. 23:41. We are investigating.

Due to the cabinets involved (and the topology of the interconnect) we cannot just start the machine without the two cabinets, looking into possibilities.

Update: 2011-12-05 12:45 2 cabinets can not be started because of the PDU failures. We have now started machine without 2 cabinets (c6 and c8).

Work file system on fimm cluster is taken down due to misconfiguration of GPFS file system.

We are working on correction of configuration , will keep you updated.

10/11/2011 Work file system is back online with more space (3.7TB)

Update 11/11/2011

We are balancing data on different disk on work file system since we added new disk to work file system, this is creating load on GPFS file system on fimm, which means the operation related to file system is going to be slow, we are expecting this balancing process will finish during the weekend.