There will be a planned maintenance on hexagon for software upgrade on Monday June 16th starting at 14:00 and expected to last approximately 3 hours.

The Cray software release will be upgraded from 2.0.44 to 2.0.53.
This release will have more quad-core optimizations as well as a new version of the MPI library. We therefore recommend that you recompile your programs and libraries after the upgrade. We will notify when we have re-compiled the libraries/modules installed by us.

Update 16th, 14:40 System taken down.
Update 16th, 19:30 System back online with version 2.0.53 and MPT 3.0

Look for update on when we have re-compiled libraries:

All compute-node (cnl) software has been re-compiled.
Most login node software has been recompiled, except GNUPLOT.
UPC is not re-compiled yet.

Wednesday, 28th of May 08:00: /home/t1home and /home/uibkvant will be unavailable during a firmware upgrade of those file systems.
This downtime will probably last for a couple of hours.

Update 28th of May:

10:30: File system is now being unmounted on all compute nodes, and will be unavailable until upgrade is complete.
11:30: The firmware upgrade is now started.
12:30: One disk controller failed during the upgrade and has to be replaced. We are expecting the replacements arrival later today. The downtime will therefore be extended.
16:30: /home/uibkvant is now up again. /home/t1home have to wait for the new controller to arrive.
01:30: Will continue tomorrow.

Update 29th of May:

09:00: We continue the work from yesterday.
18:00: Controllers have been revived. Working on recovering data disks.
00:00: Will continue tomorrow. Seems like no data has to be restored from backup.

Update 30th of May:

09:00: We continue with recovering data disks.
14:00: Running file system checks to vertify data. If data verification is successfull /home/t1home will be up again soon.
14:50: Verification was successfull. /home/t1home is now mounted on all compute nodes.

Friday 9th of May, the backup system will be unavailable for a short time, because of a upgrade of our system. File systems like /migrate and /bcmhsm will be unavailable during this upgrade, which will start at 12:00 and be finished at 15:00.

Update: 15:30: Upgrade is finished.

Hexagon now has activated the accounting/allocation manager "Gold", this means that all jobs will need to have a valid cpu-account with enough cpu-hours to run the submitted job.
At the time of job-submission a "credit-check" against the specified cpu-account is done and the asked for cpu-hours are reserved until the job ends - at which time the real amount of cpu-hours is subtracted from the account. The number of cpu-hours reserved and subtracted is calculated as follows:

cpuhours = 4 * blocked nodes * wallclock time

For reservations "wallclock time" is the specified "wallclock" parameter used in the PBS script or on the command line (or the default 1 hour).
For job-end account subtraction "wallclock time" is the actual used wallclock (start-time -> end-time).
The number "4" comes from 4 cores per node. A node is considered blocked if one or more cores on the node is reserved for the user since only one job can run at any time on a node.

This means that setting e.g. mppnppn=1 and mppwidth=12 for 1 hour the actual cpu-hour usage will be calculated as:

4 * 12 * 1 = 48 cpuhours

whereas a job with mppnppn=4 (the default) and mppwidth=12 for 1 hour will have cpu-hour usage calculated as:

4* 3 * 1 = 12

If your job fails to start, you should use the command:

checkjob -v jobnumber

where jobnumber is the PBS jobnumber given to you upon job-submission. If the command returns "Cannot debit account" you need to check for correct "-A mycpuaccount" specification for your job as well as enough credits to reserve and run the job.
You can check the names and balance of your available cpu-accounts with the "cost" command.

Note also that the version of Moab scheduler was updated. Users currently logged in needs to do a "module swap moab/5.2.1 moab/5.2.2" or log out and in again to have the moab client commands use the correct version.

The previously postponed maintenance of hexagon (http://www.parallaw.uib.no/syslog/154) is now scheduled for Thursday April 24th from 14:00 to approximately 18:00.

This note will be updated as we know more about the maintenance.

Update, Thursday 24th:

14:10: System is taken down for diagnostics and init change.
14:30: Hardware work begins.
16:00: Hardware work ends.
17:10: System is up and running.