Configuration changes

Thursday Oct. 16th at 09:00 will /migrate and /bcmhsm be mounted read only as /migrate-old and /bcmhsm-old on both fimm and hexagon.

A new EMPTY version will be mounted as /migrate and /bcmhsm on fimm and hexagon which will be writeable, but empty (except for directories). We will then start the process of moving the files of every user from the old file system to the new. This will be done one user at the time. Each user will be informed individually when all hers/his files has been moved to the new file system.

We would appreciate if users of /migrate and /bcmhsm would clean up their directories before we mount them read only. This will reduce the amount of data to move, hence, speed up the time consuming moving process.

We will keep the /migrate-old and /bcmhsm-old for some time after the moving has been completed. We will come back with a date for the termination of these later.

If you have any questions regarding this, please send your request to support-uib@notur.no

Update Oct. 16th 09:30: The old /migrate and /bcmhsm are now mounted read only under /migrate-old and /bcmhsm-old, while the new empty version are mounted under /migrate and /bcmhsm. We will now start the moving of data from the old to the new.

Hexagon now has activated the accounting/allocation manager "Gold", this means that all jobs will need to have a valid cpu-account with enough cpu-hours to run the submitted job.
At the time of job-submission a "credit-check" against the specified cpu-account is done and the asked for cpu-hours are reserved until the job ends - at which time the real amount of cpu-hours is subtracted from the account. The number of cpu-hours reserved and subtracted is calculated as follows:

cpuhours = 4 * blocked nodes * wallclock time

For reservations "wallclock time" is the specified "wallclock" parameter used in the PBS script or on the command line (or the default 1 hour).
For job-end account subtraction "wallclock time" is the actual used wallclock (start-time -> end-time).
The number "4" comes from 4 cores per node. A node is considered blocked if one or more cores on the node is reserved for the user since only one job can run at any time on a node.

This means that setting e.g. mppnppn=1 and mppwidth=12 for 1 hour the actual cpu-hour usage will be calculated as:

4 * 12 * 1 = 48 cpuhours

whereas a job with mppnppn=4 (the default) and mppwidth=12 for 1 hour will have cpu-hour usage calculated as:

4* 3 * 1 = 12

If your job fails to start, you should use the command:

checkjob -v jobnumber

where jobnumber is the PBS jobnumber given to you upon job-submission. If the command returns "Cannot debit account" you need to check for correct "-A mycpuaccount" specification for your job as well as enough credits to reserve and run the job.
You can check the names and balance of your available cpu-accounts with the "cost" command.

Note also that the version of Moab scheduler was updated. Users currently logged in needs to do a "module swap moab/5.2.1 moab/5.2.2" or log out and in again to have the moab client commands use the correct version.

Early on March 26th hexagon will be shutdown for the initial quad-core upgrade. We hope to be able to have parts of the machine up while the second half is upgraded. It will nevertheless mean that the entire machine will be taken down first, before being booted to a smaller size.The physical upgrade will probably take three days. There will then be some more days with tuning and reconfiguring.

One very important part of this is that ALL programs and libraries will have to be re-compiled when hexagon is booted up after the finished upgrade.

Wednesday, 09:00: Upgrade has started. Machine is now down for a while for diagnostics.

Wednesday, 12:30: Half of the machine is now running again, while the other half is being upgraded to quad-core. We expect to take the entire machine down Friday morning. Please consider the machine to be in testing state, so unannounced downtime might occure.

Wednesday, 16:45: The upgrade is ahead of schedule, therefore the machine will be taken down tomorrow around 10am.

Thursday, 12:00: Two racks are now running, which will run till tomorrow morning, Friday 28th, and then the entire machine will be shutdown at 8am. The machine will then stay down untill, at least, Monday.

Friday, 08:00: Hardware part of upgrade is now finished. The machine is now unavailable until the software, diagnostics and testing has finished.

Saturday, 17:00: Main part of software upgrade is finished. The machine is running, but is unavailable due to testing.

Tuesday, April the 1st, 18:00: Hexagon is now available again, see http://www.parallaw.uib.no/syslog/153 for more details.

The new machine "hexagon.bccs.uib.no", a Cray XT4 MPP, is now available for users. The machine was installed in the beginning of January this year and has had a few test users until now. The machine will be upgraded from its current configuration with dual-cores to the final, quad-core, configuration later this spring. After the upgrade the formal acceptance test will be executed.

More information about the machine can be found on the documentation pages.

The machine is available for "local" users from the University of Bergen, IMR and NERSC. These users can apply for an account by using the form found here

In addition, users that have acquired quota from the NOTUR consortium will have access to the NOTUR part of the machine.

Any questions regarding support, the documentation or other matters related to this can be sent to hpc-support@hpc.uib.no or the alias support-uib@notur.no

The scheduled maintenance of the fimm cluster is now (mostly) complete. Please note the following changes:

- Cluster is now running Rocks 4.3 which is based on CentOS 4.5
- Login to fimm.bccs.uib.no now ends up on one of the compute nodes acting as a login node. Currently this is called compute-1-14.
- Compilers are upgraded to Intel 10.0 and PGI 7.0
- Totalview is upgraded to 8.2
- MPI libraries are upgraded and located in /local
- Several libraries and programs in /local is upgraded

All jobs that were waiting on the old queue need to be submitted again into the new queue after the upgrade.

Send questions to support-uib@notur.no

The IBM p690 Regatta tre.bccs.uib.no / tre.ii.uib.no will be shut down and decommissioned in the morning of Monday October 1st 2007 at 08:00.

All jobs must be finished, and all data and personal files must be copied out of the machine before this time. The only exception would of course be data on external disk like /migrate, /net/bcmhsm and /net/bjerknes*.

Any questions regarding this can be sent to support-uib@notur.no.