Author Archives: lsz075

About lsz075

IT-avdelingen

/bcmhsm filesystem expirience software problem. To avoid data corruption or unexpected results we have to temporary stop it. Case is under investigation with high priority. Updates will be posted.

NB: If there are users which demands highly urgent files from that filesystem, please post a service request to support email address with list of files which you would like to restore and location to where.

Update: 17/12 20:55 filesystem is back online.

The module package has been updated on hexagon to version 3.1.6.5

New Features:
The "module avail" command is enhanced with filtering options, -U, -L, -T, -P and -D. These options to "avail" control the lists of avail output by product type. To see the complete explanation of feature usage and of some configuration options, read "man module".

Bugs fixed in this release:
742630 'SET-ALIAS' MODULEFILE OMMAND DOESN'T WORK
749121 Have PE build and distribute modules as an async product rpm
750364 Suse modules-3.1.6 rpm has a bug
751441 Woud like CRAY repackage these files into a more up-to-date version of the modules
752915 Init files for "modules" needs to include setting up its man path
754456 Add options to "module avail" for more productive listings

Due to hardware update on fimm login node and master node , we will have short down time on fimm cluster coming Wednesday, 9th of December, fimm login node will not be available from 13:00~16:00, all the running jobs which is not be able to finish until that time will crash , and has to be resubmitted, reservation set on fimm cluster, so that jobs will not finish before downtime will not be able to run.

We will keep information updated.

One of the OST /work FS nodes crashed. We are working on it. /work fs currently is unavailable.

Update:13:12 OST was recovered , /work FS should be back online
Update:25.11 15:44 new crash of the same node in filesystem. We are working to fix FS ASAP.
Update:25.11 16:15 /work is back alive. We had to disable quota.
Update:26.11 5:30 This time another OST crashed, fs is online, we are investigating root cause for OST crashes.

Hi,
Due to firmware update on the storage system, We have to take down work file system on fimm.

We will start update firmware from 12:00 Monday (23th NOV), it will last for 3-4 hours, during that time fimm will be accessible without work file system. All the compute nodes reserved from now for update, job which can not finish before the update will not run.

We will keep information updated as it goes.

12:30 UPDATE work file system unmounted from cluster, preparing for
firmware update .

18:00 UPDATE firmware update on storage system failed some of the disc firmware update , we are working on it.

20:45 UPDATE firmware update finished. work file system mounted back to the cluster.

20:50 UPDATE reservation is canceled, all jobs will start to run.

Hexagon will have a scheduled maintenance on Monday Nov. 23rd from 13:00 to approx. 19:00. Some software updates and hardware replacements will be made. The queue have a reservation in place such that only jobs that can complete (according to asked for walltime) before the maintenance will start.
This note will be updated when we have more information.

Update: 19:08 Maintenance finished, system is up and open for users.