Author Archives: lsz075

About lsz075

IT-avdelingen

Hexagon cabinets c1 and c8 experienced Emergency Power Off failure on Dec 2. 23:41. We are investigating.

Due to the cabinets involved (and the topology of the interconnect) we cannot just start the machine without the two cabinets, looking into possibilities.

Update: 2011-12-05 12:45 2 cabinets can not be started because of the PDU failures. We have now started machine without 2 cabinets (c6 and c8).

Work file system on fimm cluster is taken down due to misconfiguration of GPFS file system.

We are working on correction of configuration , will keep you updated.

10/11/2011 Work file system is back online with more space (3.7TB)

Update 11/11/2011

We are balancing data on different disk on work file system since we added new disk to work file system, this is creating load on GPFS file system on fimm, which means the operation related to file system is going to be slow, we are expecting this balancing process will finish during the weekend.

We have updated following main software on fimm:

PGI/11.8
GCC/4.6.1
intel/12.1.6_233

openmpi/1.4.4 compiled with pgi/11.8 gcc/4.6.1

netcdf/4.1.3 compiled with pgi/11.8 gcc/4.6.1

HDF5/1.8.7 compiled with pgi/11.8 gcc/4.6.1

szip/2.1 compiled with pgi/11.8 gcc/4.6.1 intel/12.1.6_233

zlib/2.3.1 compiled with pgi/11.8 gcc/4.6.1 intel/12.1.6_233

We have also implemented PrgEnv-pgi and PrgEnv-gcc on fimm which will work same as hexagon, it is a software environment set which helps you to load right set of the software.


We keep rest of the software updated.

We have updated following main software on fimm:

PGI/11.8
GCC/4.6.1
intel/12.1.6_233

openmpi/1.4.4 compiled with pgi/11.8 gcc/4.6.1

netcdf/4.1.3 compiled with pgi/11.8 gcc/4.6.1

HDF5/1.8.7 compiled with pgi/11.8 gcc/4.6.1

szip/2.1 compiled with pgi/11.8 gcc/4.6.1 intel/12.1.6_233

zlib/2.3.1 compiled with pgi/11.8 gcc/4.6.1 intel/12.1.6_233

We have also implemented PrgEnv-pgi and PrgEnv-gcc on fimm which will work same as hexagon, it is a software environment set which helps you to load right set of the software.


We keep rest of the software updated.

There is an issue with part of the /work filesystem on Hexagon. We are investigating.

Update Tuesday 09:30, Still diagnosing the issue. No known fix-time as of now.

Update Tuesday 10:00, Machine goes down for maintenance.

Update Tuesday 13:30, Part of filesystem has been e2fsck checked.

Update Tuesday 14:00, Machine up again after maintenance.

Due to fimm.bccs.uib.no cluster core switch firmware update we will take down both internal and external core switch for maintenance tomorrow from 13:00~15:00, actual down time can be shorter then this.



All running job will be killed.

We are sorry for inconvenience and short notice.

We will keep you updated.

10:30 Fimm login node is blocked.

16:00 Both internal and external switch is updated to new firmware.

17:10 maintenance is finished. fimm cluster is operational.