Author Archives: Alexander Oltu

Hexagon: urgent maintenance on March 26th, 9:00-13:00

Alexander Oltu • March 14, 2018

Hexagon have accumulated a number of the hardware failures, which have to be fixed to ensure stable operations. Hexagon will be fully stopped and login nodes will not be accessible. We expect to finish in 4 hours.

We have also discovered a bug in our SLURM statistics, that will lead to that we will have to delete all jobs from the queue system during this downtime, including PENDING.

Our apologies for any inconvenience this downtime can give you.

Date: March 26
Timeslot: 9:00-13:00

Update:

26.03.18 15:20 The machine is still down due to hardware issues. We are working on it. We will keep you updated.
27.03.18 14:00 Hardware problems are fixed and access to the machine is reopened now.

Intel KNL workshop in Umeå on 24-25 April

Alexander Oltu • February 28, 2018

PRACE Workshop: Programming and Optimizing the Intel Knights Landing
Manycore Processor, HPC2N, Umeå University, 2018-04-24/25

This two days workshop is arranged together with PRACE, and will have
instructors from Intel. The course focuses on programming and optimizing
the Intel Knights Landing Manycore Processor. It addresses the Intel(R) Xeon
Phi(TM), codenamed "Knights Landing (KNL)" architecture and how to best
use it efficiently.

While the focus is on the KNL, the methods are applicable to many other
architectures as well. The course is thus relevant to many different groups of
people, including HPC users and researchers who want to get as much as
possible out of their architecture. As well, the course is of interest for anyone
who wants to increase their knowledge about vectorization and optimization,
and learn more about how to apply it to their own codes.

Bring your laptop for the hands-on!

Lunch and coffee/tea will be provided.

Instructors: Mikko Byckling and Asma Farjallah from Intel.

Date: 24-25 April 2018.

Location: Umeå University, Umeå, Sweden

Deadline for registration: 16 April 2018

More information and registration on the course website:
https://www.hpc2n.umu.se/events/courses/knl-spring-2018

HPC course on January 25-26

Alexander Oltu • January 2, 2018

We are happy to announce a 2-day introductory HPC course at UiB on January 25-26.
https://docs.hpc.uib.no/wiki/HPC_course_2018.1

Hexagon scheduled maintenance on January 3rd

Alexander Oltu • December 18, 2017

We will shutdown Hexagon for maintenance on January 3rd at 09:00 to continue on reconfiguration tasks. We are expecting to have Hexagon up again same day at around 16:00.

Update 2018-01-03 19:23:

Access to Hexagon is re-opened.
/work file system had to be reformatted. Please accept our apologies for any inconvenience it might have caused.
/home storage area is increased and default quota is doubled from 10GB to 20GB for each user.

Hexagon reboot after power blink

Alexander Oltu • December 8, 2017

After a series of power blinks, Hexagon high performance network, as well as some nodes are in inconsistent state. We have to restart whole machine.

Hexagon is up after reconfiguration

Alexander Oltu • December 5, 2017

New configuration:

312 compute nodes
9984 processing elements
/work - 175TB
/shared - 217TB
SLURM scheduler
UiB usernames for UiB users

Please find below a short list of changes:

All users (except IMR) have to reaply for access at https://skjemaker.app.uib.no/view.php?id=2901837
SLURM is a new job scheduler
1. Documentation link https://docs.hpc.uib.no/wiki/Job_execution_(Hexagon)
2. External Torque/Moab to Slurm reference https://www.glue.umd.edu/hpcc/help/slurm-vs-moab.html
Please use Support for help and support.
The hexagon@hpc.uib.no mailing list will be migrated to a self managed mailing list in a short time. All current mailing list users will be removed soon. If you want to subscribe please get back to our syslog https://syslog.hpc.uib.no in a week, we will post a link to the new mailing list.

We didn’t manage to replace all HW components we’ve planned during this maintenance. We are planning to have a shorter maintenance somewhere in winter/spring to finish this job.

All software as modules is still available, we will review and remove old in the coming weeks.

There are some major changes, as the new job scheduler and the HW configuration, maybe some things stopped working for you, some configurations are not finally in place, we will continue on improving this as well as updating documentation during the following weeks, we ask for your patience. And of course all feedback is welcome at support@hpc.uib.no.

The following changes will come in the next months:

/shared will be bigger in a few weeks
Bigger /home after the next maintenance window

All local HPC will be down for ~2 weeks starting tomorrow

Alexander Oltu • November 20, 2017

We are reminding you that tomorrow morning (2017.11.21) Hexagon and Fimm are going to be shut down for the reconfiguration.

ALL DATA on /work, /work/shared (/work-common), /home and /fimm filesystems will be deleted.

Please find more details at https://docs.hpc.uib.no

Hexagon: login2 rebooted

Alexander Oltu • October 16, 2017

Login2 was rebooted due to the hardware errors with the Ethernet card, rendering login2 unavailable from the network. The problem should be resolved now.

Hexagon: slow IO on login nodes

Alexander Oltu • October 11, 2017

Most of the login nodes are having high disk (IO) load currently mostly due to copying process going on.

You can find less busy nodes by the following workaround:

module load pdsh
pdsh -w login[1-5] uptime
login2: 11:05am up 14 days 19:06, 18 users, load average: 4.62, 4.55, 3.98
login3: 11:05am up 14 days 19:06, 7 users, load average: 2.47, 2.96, 2.89
login1: 11:05am up 14 days 19:06, 9 users, load average: 16.21, 11.97, 13.34
login4: 11:05am up 14 days 19:06, 13 users, load average: 0.68, 0.31, 0.21
login5: 11:05am up 14 days 19:06, 8 users, load average: 40.72, 35.99, 23.38

In this example login4 is less busy and login5 is totally overloaded, you can ssh to login4 and try working on it.

We will see what we can do to decrease effect of the file transfers on the interactive user sessions. As a general rule we can recommend to you to run file transfers at night to decrease disk load on the login nodes interactive sessions.

/migrate and /bcmhsm offline on September 4th

Alexander Oltu • September 4, 2017

Due to physical rearrangements in the server room the tape robot hosting /migrate and /bcmhsm will be unavailable today after 12:00 for several hours. Updates will be posted here.

Update 2017-09-11:

Uni Computing is experiencing troubles with the backend holding /migrate and /bcmhsm and it is unknown yet when this will be fixed. As these file systems were supposed to be already decommissioned earlier this year in June, we will not mount those back in ordinary place even after the file systems are healthy. However, we will finish transfer of IMR/HI files as it was agreed as soon as the filesystem is healthy. We will issue a separate update for this.
Other users than IMR/HI needing files from those file systems are advised to contact Uni Computing helpdesk at trouble@computing.uni.no.

HPC Syslog

Log over changes and events on UiB's HPC systems