Upgrade


Update 12_11 21:30:

Migration is over, we manage to take up Lustre filesystem with new MDS server. /shared and /work filesystem is mounted on cyclone.hpc.uib.no and grunch.hpc.uib.no. Hexagon is up and running again. Samba and NFS exports are also running on Leo.hpc.uib.no.

Update 12_11 15:00 :

Migration is still ongoing, we will keep you posted.

Update 02_11 09:30 :

Due to the delayed delivery of physical parts, we have to postpone our downtime to 12th November. Corresponding node reservation on the hexagon is also postponed to 12th November.

Thank you for your consideration!

Dear HPC User,

The metadata server for the /shared file system has to be replaced/upgraded and therefore it must be unmounted from all the clients.

This will result in scheduled downtime for Hexagon, Grunch and Cyclone machines. We start at 08:00 AM on the 5th of November and expect to be ready by the end of the working day.

Thank you for your consideration!

  • This is a significant upgrade and some binaries are not compatible anymore, thus before opening support case please recompile your code.


  • Older compilers will have to use the following compiling recipe:

    module swap cray-mpich cray-mpich2
    module swap cray-libsci cray-libsci/12.2.0
    module load craype-barcelona


  • Only current version of PGI will work without using the above mentioned recipe.


  • xt prefixed modules are deprecated and replaced by cray prefixed modules.
    ex:
    xtpe-interlagos

    - replaced by
    craype-interlagos


  • Cray CCE is not working currently because of missing license. We are in contact with the vendor.
  • Update 19.03 18:40: Upgrade is finished, machine is open for SSH access.

    Update, Monday 19th: We are finalizing the upgrade, the machine is up and we expect to allow logins later today. When logging in for the first time, please remember to recompile ALL your applications and libraries to be compatible with the new system.


    Hexagon will get a major hardware and software upgrade in the first week of March.

    The current schedule is for the upgrade to start on March 9th 2012 at 8:00 (a delay of 1 week from initial announcement) and to last for about 1 week.

    NOTE: A reservation is set in the queue system. Thus, jobs must have a walltime set so that they can finish before the maintenance to be allowed to start.

    The upgraded hexagon will have the following specs:
    * Cray XE6m-200
    * 204.9 TFlops peak performance
    * 22272 cores
    * AMD Opteron 6276 (2.3GHz "Interlagos")
    * 1392 CPUs (sockets)
    * 696 nodes
    * 32 cores per node
    * 32GB RAM per node (1GB/core)
    * New interconnect: Cray Gemini
    * New topology: 2.5D Torus
    * OS: Cray Linux Environment, CLE 4.0 (Based on Novell Linux SLES11sp1)

    Although the user experience will be very much the same after the upgrade (with just newer versions of familiar software, and a faster machine) please observe the following critical point:

    IMPORTANT! All applications MUST be recompiled to be compatible with the new and upgraded hexagon.

    You can expect that the software list that is available via "modules" to be short right after the upgrade for then to grow during the next few weeks. Please be patient while we recompile and install the necessary applications and libraries.

    We remind you that you have to move all files not related to your current runs out from the /work file system. Please see our previous email for details.

    IMPORTANT! The old /work will be available on new hexagon only up to April 9th. On April 11th it will be completely DESTROYED!
    It is therefore very important that you participate in moving data out of hexagon or transfer it to the new file-system. The old /work will be mounted back after a reformat and used as secondary storage.

    You can follow the upgrade at our Syslog:
    http://computing.uni.no/syslog

    Please contact support-uib at notur.no if you have any questions regarding the upgrade.

    Hexagon will have a scheduled maintenance on Monday Feb. 8th from 13:00 to approx. Tuesday late evening Feb. 9th.
    The following operations will be performed during maintenance slot:
    * Base software upgrade from CLE2.1 to CLE2.2
    * Optimization of /work filesystem metadata
    * Hardware maintenance
    The queue have a reservation in place such that only jobs that can complete (according to asked for walltime) before the maintenance will start.
    This note will be updated when we have more information.

    NB! Users are encouraged to recompile all binaries after performed maintenance. This is due to a new CLE release.

    Update: We will try to start maintenance at 12:30 instead of 13:00 since only few jobs are running.

    Update: 09/02/2010 21:25 Maintenance finished. Hexagon is online.
    Please remember to recompile all your programs!
    Before contacting support please be sure that you have recompiled your code, this will speed up your case processing.

    Several libraries and have been updated on hexagon.
    For maximum performance and stability users are encouraged to log out and in again and then recompile their programs and libraries.

    Updates:
    xt-asyncpe 3.2
    xt-libsci 10.3.8
    petsc 3.0.0.5
    libfast_mv 1.0.5
    mpt 3.4.1
    java-jdk 1.6.0_15
    PGI compiler 9.0.2
    Intel compiler 11.1.046

    Removed
    xt-libsci 10.3.5
    petsc 3.0.0.3
    libfast_mv 1.0.3
    mpt 3.1.2