Hexagon have accumulated a number of the hardware failures, which have to be fixed to ensure stable operations. Hexagon will be fully stopped and login nodes will not be accessible. We expect to finish in 4 hours.
We have also discovered a bug in our SLURM statistics, that will lead to that we will have to delete all jobs from the queue system during this downtime, including PENDING.
Our apologies for any inconvenience this downtime can give you.
Date: March 26
PRACE Workshop: Programming and Optimizing the Intel Knights LandingManycore Processor, HPC2N, Umeå University, 2018-04-24/25
This two days workshop is arranged together with PRACE, and will have
instructors from Intel. The course focuses on programming and optimizing
the Intel Knights Landing Manycore Processor. It addresses the Intel(R) Xeon
Phi(TM), codenamed "Knights Landing (KNL)" architecture and how to best
use it efficiently.
While the focus is on the KNL, the methods are applicable to many other
architectures as well. The course is thus relevant to many different groups of
people, including HPC users and researchers who want to get as much as
possible out of their architecture. As well, the course is of interest for anyone
who wants to increase their knowledge about vectorization and optimization,
and learn more about how to apply it to their own codes.
Bring your laptop for the hands-on!
Lunch and coffee/tea will be provided.
Instructors: Mikko Byckling and Asma Farjallah from Intel.
Date: 24-25 April 2018.
Location: Umeå University, Umeå, Sweden
Deadline for registration: 16 April 2018
More information and registration on the course website:https://www.hpc2n.umu.se/events/courses/knl-spring-2018
We are happy to announce a 2-day introductory HPC course at UiB on January 25-26.
We will shutdown Hexagon for maintenance on January 3rd at 09:00 to continue on reconfiguration tasks. We are expecting to have Hexagon up again same day at around 16:00.Update 2018-01-03 19:23
- Access to Hexagon is re-opened.
- /work file system had to be reformatted. Please accept our apologies for any inconvenience it might have caused.
- /home storage area is increased and default quota is doubled from 10GB to 20GB for each user.
After a series of power blinks, Hexagon high performance network, as well as some nodes are in inconsistent state. We have to restart whole machine.
- 312 compute nodes
- 9984 processing elements
- /work - 175TB
- /shared - 217TB
- SLURM scheduler
- UiB usernames for UiB users
Please find below a short list of changes:
- All users (except IMR) have to reaply for access at https://skjemaker.app.uib.no/view.php?id=2901837
- SLURM is a new job scheduler
1. Documentation link https://docs.hpc.uib.no/wiki/Job_execution_(Hexagon)
2. External Torque/Moab to Slurm reference https://www.glue.umd.edu/hpcc/help/slurm-vs-moab.html
- Please use Support for help and support.
- The email@example.com mailing list will be migrated to a self managed mailing list in a short time. All current mailing list users will be removed soon. If you want to subscribe please get back to our syslog https://syslog.hpc.uib.no in a week, we will post a link to the new mailing list.
We didn’t manage to replace all HW components we’ve planned during this maintenance. We are planning to have a shorter maintenance somewhere in winter/spring to finish this job.
All software as modules is still available, we will review and remove old in the coming weeks.
There are some major changes, as the new job scheduler and the HW configuration, maybe some things stopped working for you, some configurations are not finally in place, we will continue on improving this as well as updating documentation during the following weeks, we ask for your patience. And of course all feedback is welcome at firstname.lastname@example.org.
The following changes will come in the next months:
- /shared will be bigger in a few weeks
- Bigger /home after the next maintenance window
We are reminding you that tomorrow morning (2017.11.21) Hexagon and Fimm are going to be shut down for the reconfiguration.
ALL DATA on /work, /work/shared (/work-common), /home and /fimm filesystems will be deleted.
Please find more details at https://docs.hpc.uib.no
Login2 was rebooted due to the hardware errors with the Ethernet card, rendering login2 unavailable from the network. The problem should be resolved now.
Most of the login nodes are having high disk (IO) load currently mostly due to copying process going on.
You can find less busy nodes by the following workaround:
module load pdsh
pdsh -w login[1-5] uptime
login2: 11:05am up 14 days 19:06, 18 users, load average: 4.62, 4.55, 3.98
login3: 11:05am up 14 days 19:06, 7 users, load average: 2.47, 2.96, 2.89
login1: 11:05am up 14 days 19:06, 9 users, load average: 16.21, 11.97, 13.34
login4: 11:05am up 14 days 19:06, 13 users, load average: 0.68, 0.31, 0.21
login5: 11:05am up 14 days 19:06, 8 users, load average: 40.72, 35.99, 23.38
In this example login4 is less busy and login5 is totally overloaded, you can ssh to login4 and try working on it.
We will see what we can do to decrease effect of the file transfers on the interactive user sessions. As a general rule we can recommend to you to run file transfers at night to decrease disk load on the login nodes interactive sessions.
Due to physical rearrangements in the server room the tape robot hosting /migrate and /bcmhsm will be unavailable today after 12:00 for several hours. Updates will be posted here.Update 2017-09-11:
Uni Computing is experiencing troubles with the backend holding /migrate and /bcmhsm and it is unknown yet when this will be fixed. As these file systems were supposed to be already decommissioned earlier this year in June, we will not mount
those back in ordinary place even after the file systems are healthy. However, we will finish transfer of IMR/HI files as it was agreed as soon as the filesystem is healthy. We will issue a separate update for this.
Other users than IMR/HI needing files from those file systems are advised to contact Uni Computing helpdesk at email@example.com