Author Archives: saerda

Cyclone and Grunch downtime 4th February

saerda • January 31, 2021

Dear Cyclone and Grunch users,

We plan to have downtime Thursday (4th February) from 10:00 to 14:00. Both cyclone and grunch will not be accessible for users during that time.

Leo will be down, /shared filesystem which is exported via Leo by smb and nfs will also be down and inaccessible.

We have to replace the core switch.

Downtime is anticipated, exact time slot can be changed or extended, we will keep you updated here.

Please don't hesitate to contact us via hjelp.uib.no

Update 22:20 08.02 : Due to technical reasons our last attempt to replace the core switch was not successful, we will try again Tuesday 09.02 at 17:00, we have to take down the whole system , expected downtime is about 2 hours, all servers should be online before 19:00 the same day, we will keep you updated.

Update 18:40 09.02 : The maintenance takes longer than expected. We expect an additional 2 hours downtime, i.e. until 21:00 tonight.

Update 20:35 09.02 : The maintenance takes longer than expected. As a consequence Cyclone and Grunch are not available until tomorrow. We try to make them available again in the course of Wednesday. We are sorry for the inconvenience.
Update 00:30 10.02: The maintenance is finally over and we have successfully replaced old switch to the new one, all services should be online, please contact us via hjelp.uib.no if you discover any problem.

HPC and NIRD toolkit course fall 2020

saerda • October 1, 2020

Dear all,

We are pleased to announce the course on the national HPC systems & NIRD Toolkit and a basic course on UNIX. More details are available on the course webpage at:

https://docs.hpc.uib.no/wiki/HPC_and_NIRD_toolkit_course_fall_2020

The course is free of charge and is on Zoom. Zoom link will be sent to all participants before the course.

For registration:

https://skjemaker.app.uib.no/view.php?id=9042605

The HPC course will be held between 27th -29th October (900-12:30) and the UNIX course between 19th &20th October (900-12:30).

Register before 14th October for the UNIX course and before October 21st for HPC & NIRD Toolkit Course.

Looking forward to see you at the event!

--
The Scientific Computing Group at the UiB IT department

Downtime: Machine room power outage at Thormøhlensgate 55 April 17~19.

saerda • April 2, 2020

Dear Cyclone and Grunch Users,

Our machine room at Thormøhlensgate 55 will have power outage during Friday 17 April 15:00 to Sunday 19. April 24:00.

There will be planned work to do maintenance for power line in the building.
Because of that maintenance, we have to take down all our servers at 15:00 Friday 17. April and hopefully we will take all servers online before 08:00 Monday 20. April.

cyclone and grunch server will be taken down and will not be accessible during this time.

/shared/ filesystem which is exported via leo.hpc.uib.no to the university campus will not be accessible during this time.

We advice all users to plan their work in good time to avoid unnecessary problems.

Update20:30_19.04.2020: Our downtime is over, cyclone and grunch is back online.

Please contact us via hjelp.uib.no if you have any further questions.

Best Regards

Scientific computing team.

Cyclone and Grunch short downtime

saerda • March 11, 2020

Some of the servers and other equipment in our server room have to be relocated. Due to this process, Cyclone and Grunch server will have short downtime today at 11:00, estimated downtime is about 30 minutes.

Please save all your work before 10:45 and log out safely.

Don't hesitate to contact us if you have any questions via hjelp.uib.no.

Update: 13:45: maintenance is over, all services are online, we are sorry for the delay.
Update: 20:00: we are still experiencing a communication issue between the file server, this can cause /shared filesystem unresponsive or slow, we are working on the issue.
Update 22:00: Problem resolved.

Cyclone and Grunch Downtime

saerda • February 21, 2020

Dear Cyclone and Grunch users:

We will have a short downtime at 10:00 21/02/2020, expected downtime is about 30 minutes. Please save your work in advance and log out safely before 09:45 on the same day, after this time users will be removed. We apologize for the short notice and the inconvenience it may cause you.

Feel free to contact us via hjelp.uib.no if you have any questions.

Update: 10:25 Unfortunately we have to extend our downtime until 12:00 today.
Update: 11:55 downtime extended to 13:00 today.
Update: 12:30 Downtime is over. We manage to move home directory of users from the different server back to /shared filesystem, please log in to your account on Cyclone and Grunch, check all your files are there and permission is correct.

Cyclone and Grunch immediate Downtime

saerda • January 20, 2020

Dear Cyclone and Grunch server users:

We discovered possible filesystem corruption on /shared Lustre filesystem today. To eliminate data loss, we have to take down /shared filesystem and run filesystem check immediately to make sure that the filesystem is not corrupted.

We plan to start at 12:50 today.

User login to Cyclone and Grunch will be locked during the process, and all NFS and smb exports will be terminated.

We are sorry for any inconvenience this may have caused, and we will keep you updated on this page.

Please contact us via https://hjelp.uib.no if you have any further questions.

Update 14:00 : The filesystem check is running.

Update 20:50 : The filesystem check is still running, we will continue tomorrow.
Update 21.01.2020_13:40: /shared filesystem is back online, and user access for cyclone and grunch is enabled. NFS and SMB is started. There might be some data loss under /shared/projects/gfi/ folder, make sure that you go through your important files and report back if you missing some files. So far, we identified one file loss from this folder which we will make direct contact with the owner of the file.

NFS server rash on cyclone.hpc.uib.no and grunch.hpc.uib.no server

saerda • November 15, 2019

Dear Cyclone and Grunch server users,

Our NFS server crashed again while we are debugging NFS hang problem.

Both servers are rebooted.

We have changed some of the optimization parameters for our NFS server, and we hope that this will help to eliminate the issue.

Update: 12:50 NFS on both cyclone and grunch are stable so far. we have done some changes for NFS configuration.

Update : 14:00 Unfortunately the problem is not resolved. we are working on it. login to both server is disabled.

Update: 18.11.2019 09:00 : Home directories are mounted on both Cyclone and Grunch, NFS mounts should be stable.

Cyclone home mount crashed.

saerda • November 12, 2019

Cyclone home mount is crashed today around 09:30. We are trying to resolve the problem. We will keep the update posted.

Update 13:05: Problem is fixed.

Cyclone maintenance 25th September

saerda • September 24, 2019

Cyclone will be taken down tomorrow, 25th September, from 12:00 until 14:00 for regular maintenance. We will perform OS related updates and some core libraries are going to be updated too. more information will be posted.
Update 25.09.2019 14:55 cyclone.hpc.uib.no is back online.

Leo SMB stop

saerda • August 5, 2019

Dear cyclone users:

We have to stop smb service on leo for short period of time, this is due to the new patch which we have to install on leo, this will help lustre developer to create a proper fix to our problem.
SMB service will be stopped in a short time.

Update 11:30: debug patch has been applied, and both NFS and SMB service is started. We are expecting the server crash sooner or later, then the machine will generate a new crash dump which we will eventually send it over to Lustre developer.

HPC Syslog

Log over changes and events on UiB's HPC systems