Configuration changes

Removed old netcdf libraries on Hexagon Feb. 11th

lsz075 • February 12, 2009

Old netcdf-pgi-cnl and netcdf-pathscale-cnl modules have been removed from the module list on Hexagon.

Use instead Crays version, available as module netCDF, or netcdf-cnl which we have compiled.

Both netCDF and netcdf-cnl checks which PrgEnv you have loaded, and loads the appropriate version.

netcdf-cnl/4.0 is currently not netcdf4 enabled.

Moving of /migrate and /bcmhsm October 16th

lsz075 • October 9, 2008

Thursday Oct. 16th at 09:00 will /migrate and /bcmhsm be mounted read only as /migrate-old and /bcmhsm-old on both fimm and hexagon.

A new EMPTY version will be mounted as /migrate and /bcmhsm on fimm and hexagon which will be writeable, but empty (except for directories). We will then start the process of moving the files of every user from the old file system to the new. This will be done one user at the time. Each user will be informed individually when all hers/his files has been moved to the new file system.

We would appreciate if users of /migrate and /bcmhsm would clean up their directories before we mount them read only. This will reduce the amount of data to move, hence, speed up the time consuming moving process.

We will keep the /migrate-old and /bcmhsm-old for some time after the moving has been completed. We will come back with a date for the termination of these later.

If you have any questions regarding this, please send your request to support-uib@notur.no

Update Oct. 16th 09:30: The old /migrate and /bcmhsm are now mounted read only under /migrate-old and /bcmhsm-old, while the new empty version are mounted under /migrate and /bcmhsm. We will now start the moving of data from the old to the new.

Quota set on /var/spool/torque on hexagon

lsz075 • October 8, 2008

We have now activated Quota on /var/spool/torque on hexagon.
The soft limit is 3 GB and the hard limit is 6GB.

If the output of your job exceeds these limits the job will stop. You must then redirect the output of a job to a file.
I.e:
aprun -n 1 ./program > /work/$USER/output.txt

Activated “Gold” accounting system on hexagon

lsz075 • April 26, 2008

Hexagon now has activated the accounting/allocation manager "Gold", this means that all jobs will need to have a valid cpu-account with enough cpu-hours to run the submitted job.
At the time of job-submission a "credit-check" against the specified cpu-account is done and the asked for cpu-hours are reserved until the job ends - at which time the real amount of cpu-hours is subtracted from the account. The number of cpu-hours reserved and subtracted is calculated as follows:

cpuhours = 4 * blocked nodes * wallclock time

For reservations "wallclock time" is the specified "wallclock" parameter used in the PBS script or on the command line (or the default 1 hour).
For job-end account subtraction "wallclock time" is the actual used wallclock (start-time -> end-time).
The number "4" comes from 4 cores per node. A node is considered blocked if one or more cores on the node is reserved for the user since only one job can run at any time on a node.

This means that setting e.g. mppnppn=1 and mppwidth=12 for 1 hour the actual cpu-hour usage will be calculated as:

4 * 12 * 1 = 48 cpuhours

whereas a job with mppnppn=4 (the default) and mppwidth=12 for 1 hour will have cpu-hour usage calculated as:

4* 3 * 1 = 12

If your job fails to start, you should use the command:

checkjob -v jobnumber

where jobnumber is the PBS jobnumber given to you upon job-submission. If the command returns "Cannot debit account" you need to check for correct "-A mycpuaccount" specification for your job as well as enough credits to reserve and run the job.
You can check the names and balance of your available cpu-accounts with the "cost" command.

Note also that the version of Moab scheduler was updated. Users currently logged in needs to do a "module swap moab/5.2.1 moab/5.2.2" or log out and in again to have the moab client commands use the correct version.

Quad-core upgrade of hexagon

lsz075 • March 18, 2008

Early on March 26th hexagon will be shutdown for the initial quad-core upgrade. We hope to be able to have parts of the machine up while the second half is upgraded. It will nevertheless mean that the entire machine will be taken down first, before being booted to a smaller size.The physical upgrade will probably take three days. There will then be some more days with tuning and reconfiguring.

One very important part of this is that ALL programs and libraries will have to be re-compiled when hexagon is booted up after the finished upgrade.

Wednesday, 09:00: Upgrade has started. Machine is now down for a while for diagnostics.

Wednesday, 12:30: Half of the machine is now running again, while the other half is being upgraded to quad-core. We expect to take the entire machine down Friday morning. Please consider the machine to be in testing state, so unannounced downtime might occure.

Wednesday, 16:45: The upgrade is ahead of schedule, therefore the machine will be taken down tomorrow around 10am.

Thursday, 12:00: Two racks are now running, which will run till tomorrow morning, Friday 28th, and then the entire machine will be shutdown at 8am. The machine will then stay down untill, at least, Monday.

Friday, 08:00: Hardware part of upgrade is now finished. The machine is now unavailable until the software, diagnostics and testing has finished.

Saturday, 17:00: Main part of software upgrade is finished. The machine is running, but is unavailable due to testing.

Tuesday, April the 1st, 18:00: Hexagon is now available again, see http://www.parallaw.uib.no/syslog/153 for more details.

The new Cray XT4 machine “hexagon” is now available

lsz075 • February 1, 2008

The new machine "hexagon.bccs.uib.no", a Cray XT4 MPP, is now available for users. The machine was installed in the beginning of January this year and has had a few test users until now. The machine will be upgraded from its current configuration with dual-cores to the final, quad-core, configuration later this spring. After the upgrade the formal acceptance test will be executed.

More information about the machine can be found on the documentation pages.

The machine is available for "local" users from the University of Bergen, IMR and NERSC. These users can apply for an account by using the form found here

In addition, users that have acquired quota from the NOTUR consortium will have access to the NOTUR part of the machine.

Any questions regarding support, the documentation or other matters related to this can be sent to hpc-support@hpc.uib.no or the alias support-uib@notur.no

IBM regatta p690 “tre” decommissioned

lsz075 • October 1, 2007

As previously announced, the IBM p690 Regatta system "tre" is now decommissioned.

Questions regarding access to files etc. must quickly be sent to

support-uib@notur.no

fimm cluster is upgraded

lsz075 • September 12, 2007

The scheduled maintenance of the fimm cluster is now (mostly) complete. Please note the following changes:

- Cluster is now running Rocks 4.3 which is based on CentOS 4.5
- Login to fimm.bccs.uib.no now ends up on one of the compute nodes acting as a login node. Currently this is called compute-1-14.
- Compilers are upgraded to Intel 10.0 and PGI 7.0
- Totalview is upgraded to 8.2
- MPI libraries are upgraded and located in /local
- Several libraries and programs in /local is upgraded

All jobs that were waiting on the old queue need to be submitted again into the new queue after the upgrade.

Send questions to support-uib@notur.no

Scheduled maintenance / upgrade of fimm

lsz075 • September 4, 2007

There will be a upgrade of fimm from Rocks 4.1 to Rocks 4.3 on Wed. Sep. 12th. Expected downtime is from 08:00 to 14:00. The new system will have updated OS, compilers and software and will be integrated with the grid-related activities.

This notice may be updated with more information at a later time.

Important notice: tre.bccs.uib.no will be taken out of service

lsz075 • September 1, 2007

The IBM p690 Regatta tre.bccs.uib.no / tre.ii.uib.no will be shut down and decommissioned in the morning of Monday October 1st 2007 at 08:00.

All jobs must be finished, and all data and personal files must be copied out of the machine before this time. The only exception would of course be data on external disk like /migrate, /net/bcmhsm and /net/bjerknes*.

Any questions regarding this can be sent to support-uib@notur.no.

HPC Syslog

Log over changes and events on UiB's HPC systems