Author Archives: saerda

leo.hpc.uib.no downtime tomorrow 08:30

saerda • July 3, 2019

leo.hpc.uib.no will have downtime tomorrow from 08:30 until 13:00. we will perform memory check and another related hardware check on the machine.
Progress of the maintenance will be published here.

Update 2019.08.01 12:00 lustre developer provided us with debugging patch, I am going to implement it, and restart NFS on Leo.
Update 2019.07.04 22:51 Leo crashed again after maintenance, we will stop NFS, and start smb from now on until we find the solution for NFS problem.
Update 2019.07.04 12:55 Leo is back online, only NFS server is restarted. we will keep monitor the system.
Update2019.07.04 12:50 Maintenance is over, network card firmware is updated, BIOS is updated, hardware diagnoses completed without any problem, memory test went fine.
Update 2019.07.04 08:45 Maintenance started, leo is going offline.

Read-only short period for /shared

saerda • June 28, 2019

/shared filesystem became read-only for a short period of time during 12:00 -12:30 today while we are debugging NFS problem on leo.hpc.uib.no.

13:35 Update: /shared should be mounted read and write mode.

13:25 Update: metadata filesystem check is over, /shared filesystem is mounted back on
cyclone.

13:15 Update: We are running e2fsck on the metadata filesystem to check possible corruption.

13:10 Update: We are backing up original metadata filesystem for /shared.

12:50 Update: we found errors on metadata server, we have to run fscheck on it.

12:45 Update: Problem persists, we are working on it.

leo.hpc.uib.no crashed again

saerda • June 28, 2019

NFS and smaba server leo.hpc.uib.no has crashed yesterday around 23:30.
We have restarted server samba service is restarted. We are invetigaing NFS part, will come with more information later today.

Cyclone.hpc.uib.no external interface down.

saerda • June 27, 2019

cyclone.hpc.uib.no has lost external interface since late yesterday, which made users unable to access cyclone.

The problem is resolved and the cause is under investigation.

We apologize for the inconvenience.

Leo NFS porblem

saerda • June 27, 2019

During the last week, we have experienced a problem with leo.hpc.uib.no as our NFS server, it crashes due to Lustre bug triggered by NFS related unknown reason.

After some debugging, we have made some changes to our Lustre configuration, which looks promising so far.

leo.hpc.uib.no has been running without a problem for the last 2 days.

We will keep monitoring the system and will post here if anything else happens.

Please don't hesitate to contact us if you encounter any problem regarding NFS and samba exports from Leo.hpc.uib.no

We apologize for the inconvenience.

Downtime February 3rd 2019

saerda • January 24, 2019

Machine room will have power maintenance on February 3rd.

Following servers/services will be down during this time:

Hexagon.hpc.uib.no
Grunch.hpc.uib.no
Cyclone.hpc.uib.no
Leo.hpc.uib.no

Everything under /shared/ and /Data will not be accessible. NFS and SMB exports will be offline.

The maintenance will start from 08:00 and will hopefully finish at 14:00. We kindly ask you to save all your work on mentioned servers and log out safely before servers are going down.
And we will keep you updated on this page.

Hexagon end of life

saerda • December 13, 2018

Dear Hexagon Users,

Hexagon has been running without maintenance contract with Cray for more than a year now, moreover it is very difficult to get spare parts for it while hardware failures occur more often. Therefore we plan to switch off Hexagon on 28.12.2018. In consequence, job execution will not be possible after that date. Regarding data, there will be a grace period of 2 months (until the end of February 2019) to allow users to move their data out of the hexagon filesystem. At the end of the grace period, the following will happen:

• /home will be reformatted and only data for users of grunch and cyclone retained,

• /work will disappear as it is per today,

• /shared file system will be reconfigured and shrunk and only paid project spaces will be retained (as per today uniklima, gfi and skd subfolders),

• scientific applications installed in /shared/apps will be retained.

It is very important that you plan for this as soon as possible. Please do not hesitate to contact me or support@hpc.uib.no if you have any questions regarding this process.

Hexagon work filesystem crash

saerda • December 12, 2018

Dear Hexagon users,

work filesystem crashed yesterday night again due to hardware errors from some compute nodes and service nodes. For clean up errors, we have to shutdown hexagon and restart it again.

Update 12:16 Hexagon is restarted and back online now.

shared filesystem crash.

saerda • December 4, 2018

Dear shared filesystem users :

Today around 16:00, /shared filesystem mounted itself read-only automatically due to a bug in the version of the Lustre filesystem we are running.

This made whole /shared filesystem read-only.

We had to unmount /shared filesystem and eliminated error to avoid the bug.

We apologize for any inconvenience and appreciate your understanding.

/work filesystem crash

saerda • October 25, 2018

/work filesystem on hexagon is crashed due to failed MDS server, we are working on it.

HPC Syslog

Log over changes and events on UiB's HPC systems