Some of the servers and other equipment in our server room have to be relocated. Due to this process, Cyclone and Grunch server will have short downtime today at 11:00, estimated downtime is about 30 minutes.
Please save all your work before 10:45 and log out safely.
Don't hesitate to contact us if you have any questions via hjelp.uib.no.
Update: 13:45: maintenance is over, all services are online, we are sorry for the delay.
Update: 20:00: we are still experiencing a communication issue between the file server, this can cause /shared filesystem unresponsive or slow, we are working on the issue.
Update 22:00: Problem resolved.
Uncategorized
Cyclone and Grunch Downtime
Dear Cyclone and Grunch users:
We will have a short downtime at 10:00 21/02/2020, expected downtime is about 30 minutes. Please save your work in advance and log out safely before 09:45 on the same day, after this time users will be removed. We apologize for the short notice and the inconvenience it may cause you.
Feel free to contact us via hjelp.uib.no if you have any questions.
Update: 10:25 Unfortunately we have to extend our downtime until 12:00 today.
Update: 11:55 downtime extended to 13:00 today.
Update: 12:30 Downtime is over. We manage to move home directory of users from the different server back to /shared filesystem, please log in to your account on Cyclone and Grunch, check all your files are there and permission is correct.
We will have a short downtime at 10:00 21/02/2020, expected downtime is about 30 minutes. Please save your work in advance and log out safely before 09:45 on the same day, after this time users will be removed. We apologize for the short notice and the inconvenience it may cause you.
Feel free to contact us via hjelp.uib.no if you have any questions.
Update: 10:25 Unfortunately we have to extend our downtime until 12:00 today.
Update: 11:55 downtime extended to 13:00 today.
Update: 12:30 Downtime is over. We manage to move home directory of users from the different server back to /shared filesystem, please log in to your account on Cyclone and Grunch, check all your files are there and permission is correct.
Cyclone and Grunch immediate Downtime
Dear Cyclone and Grunch server users:
We discovered possible filesystem corruption on /shared Lustre filesystem today. To eliminate data loss, we have to take down /shared filesystem and run filesystem check immediately to make sure that the filesystem is not corrupted.
We plan to start at 12:50 today.
User login to Cyclone and Grunch will be locked during the process, and all NFS and smb exports will be terminated.
We are sorry for any inconvenience this may have caused, and we will keep you updated on this page.
Please contact us via https://hjelp.uib.no if you have any further questions.
Update 14:00 : The filesystem check is running.
Update 20:50 : The filesystem check is still running, we will continue tomorrow.
Update 21.01.2020_13:40: /shared filesystem is back online, and user access for cyclone and grunch is enabled. NFS and SMB is started. There might be some data loss under /shared/projects/gfi/ folder, make sure that you go through your important files and report back if you missing some files. So far, we identified one file loss from this folder which we will make direct contact with the owner of the file.
We discovered possible filesystem corruption on /shared Lustre filesystem today. To eliminate data loss, we have to take down /shared filesystem and run filesystem check immediately to make sure that the filesystem is not corrupted.
We plan to start at 12:50 today.
User login to Cyclone and Grunch will be locked during the process, and all NFS and smb exports will be terminated.
We are sorry for any inconvenience this may have caused, and we will keep you updated on this page.
Please contact us via https://hjelp.uib.no if you have any further questions.
Update 14:00 : The filesystem check is running.
Update 20:50 : The filesystem check is still running, we will continue tomorrow.
Update 21.01.2020_13:40: /shared filesystem is back online, and user access for cyclone and grunch is enabled. NFS and SMB is started. There might be some data loss under /shared/projects/gfi/ folder, make sure that you go through your important files and report back if you missing some files. So far, we identified one file loss from this folder which we will make direct contact with the owner of the file.
NFS server rash on cyclone.hpc.uib.no and grunch.hpc.uib.no server
Dear Cyclone and Grunch server users,
Our NFS server crashed again while we are debugging NFS hang problem.
Both servers are rebooted.
We have changed some of the optimization parameters for our NFS server, and we hope that this will help to eliminate the issue.
Update: 12:50 NFS on both cyclone and grunch are stable so far. we have done some changes for NFS configuration.
Update : 14:00 Unfortunately the problem is not resolved. we are working on it. login to both server is disabled.
Update: 18.11.2019 09:00 : Home directories are mounted on both Cyclone and Grunch, NFS mounts should be stable.
Our NFS server crashed again while we are debugging NFS hang problem.
Both servers are rebooted.
We have changed some of the optimization parameters for our NFS server, and we hope that this will help to eliminate the issue.
Update: 12:50 NFS on both cyclone and grunch are stable so far. we have done some changes for NFS configuration.
Update : 14:00 Unfortunately the problem is not resolved. we are working on it. login to both server is disabled.
Update: 18.11.2019 09:00 : Home directories are mounted on both Cyclone and Grunch, NFS mounts should be stable.
Cyclone home mount crashed.
Cyclone home mount is crashed today around 09:30. We are trying to resolve the problem. We will keep the update posted.
Update 13:05: Problem is fixed.
Update 13:05: Problem is fixed.
Cyclone maintenance 25th September
Cyclone will be taken down tomorrow, 25th September, from 12:00 until 14:00 for regular maintenance. We will perform OS related updates and some core libraries are going to be updated too.
more information will be posted.
Update 25.09.2019 14:55 cyclone.hpc.uib.no is back online.
Update 25.09.2019 14:55 cyclone.hpc.uib.no is back online.
Leo SMB stop
Dear cyclone users:
We have to stop smb service on leo for short period of time, this is due to the new patch which we have to install on leo, this will help lustre developer to create a proper fix to our problem.
SMB service will be stopped in a short time.
Update 11:30: debug patch has been applied, and both NFS and SMB service is started. We are expecting the server crash sooner or later, then the machine will generate a new crash dump which we will eventually send it over to Lustre developer.
We have to stop smb service on leo for short period of time, this is due to the new patch which we have to install on leo, this will help lustre developer to create a proper fix to our problem.
SMB service will be stopped in a short time.
Update 11:30: debug patch has been applied, and both NFS and SMB service is started. We are expecting the server crash sooner or later, then the machine will generate a new crash dump which we will eventually send it over to Lustre developer.
leo.hpc.uib.no downtime tomorrow 08:30
leo.hpc.uib.no will have downtime tomorrow from 08:30 until 13:00. we will perform memory check and another related hardware check on the machine.
Progress of the maintenance will be published here.
Update 2019.08.01 12:00 lustre developer provided us with debugging patch, I am going to implement it, and restart NFS on Leo.
Update 2019.07.04 22:51 Leo crashed again after maintenance, we will stop NFS, and start smb from now on until we find the solution for NFS problem.
Update 2019.07.04 12:55 Leo is back online, only NFS server is restarted. we will keep monitor the system.
Update2019.07.04 12:50 Maintenance is over, network card firmware is updated, BIOS is updated, hardware diagnoses completed without any problem, memory test went fine.
Update 2019.07.04 08:45 Maintenance started, leo is going offline.
Progress of the maintenance will be published here.
Update 2019.08.01 12:00 lustre developer provided us with debugging patch, I am going to implement it, and restart NFS on Leo.
Update 2019.07.04 22:51 Leo crashed again after maintenance, we will stop NFS, and start smb from now on until we find the solution for NFS problem.
Update 2019.07.04 12:55 Leo is back online, only NFS server is restarted. we will keep monitor the system.
Update2019.07.04 12:50 Maintenance is over, network card firmware is updated, BIOS is updated, hardware diagnoses completed without any problem, memory test went fine.
Update 2019.07.04 08:45 Maintenance started, leo is going offline.
Read-only short period for /shared
/shared filesystem became read-only for a short period of time during 12:00 -12:30 today while we are debugging NFS problem on leo.hpc.uib.no.
13:35 Update: /shared should be mounted read and write mode.
13:25 Update: metadata filesystem check is over, /shared filesystem is mounted back on
cyclone.
13:15 Update: We are running e2fsck on the metadata filesystem to check possible corruption.
13:10 Update: We are backing up original metadata filesystem for /shared.
12:50 Update: we found errors on metadata server, we have to run fscheck on it.
12:45 Update: Problem persists, we are working on it.
13:35 Update: /shared should be mounted read and write mode.
13:25 Update: metadata filesystem check is over, /shared filesystem is mounted back on
cyclone.
13:15 Update: We are running e2fsck on the metadata filesystem to check possible corruption.
13:10 Update: We are backing up original metadata filesystem for /shared.
12:50 Update: we found errors on metadata server, we have to run fscheck on it.
12:45 Update: Problem persists, we are working on it.
leo.hpc.uib.no crashed again
NFS and smaba server leo.hpc.uib.no has crashed yesterday around 23:30.
We have restarted server samba service is restarted. We are invetigaing NFS part, will come with more information later today.