Uncategorized

Most of the login nodes are having high disk (IO) load currently mostly due to copying process going on.

You can find less busy nodes by the following workaround:

 module load pdsh
 pdsh -w login[1-5] uptime
login2: 11:05am up 14 days 19:06, 18 users, load average: 4.62, 4.55, 3.98
login3: 11:05am up 14 days 19:06, 7 users, load average: 2.47, 2.96, 2.89
login1: 11:05am up 14 days 19:06, 9 users, load average: 16.21, 11.97, 13.34
login4: 11:05am up 14 days 19:06, 13 users, load average: 0.68, 0.31, 0.21
login5: 11:05am up 14 days 19:06, 8 users, load average: 40.72, 35.99, 23.38

In this example login4 is less busy and login5 is totally overloaded, you can ssh to login4 and try working on it.

We will see what we can do to decrease effect of the file transfers on the interactive user sessions. As a general rule we can recommend to you to run file transfers at night to decrease disk load on the login nodes interactive sessions.

We will have a planned maintenance on Hexagon, starting on May 22nd at 09:00 AM. The maintenance is expected to last one day.
During the maintenance we will carry out software and firmware upgrades as well service the hardware.

The job submission system has reservation in place, thus jobs which are not able to finish before maintenance start, will not be started.

/work-common will be unavailable during the maintenance period and will be unmounted from Grunch and Fimm.

UPDATES:
  • 2017-05-22 09:00: Maintenance has started.
  • 2017-05-22 14:16: /work-common is available again and remounted on Grunch.
  • 2017-05-22 15:59: Maintenance has finished and access to Hexagon is re-opened.

UNINETT Sigma2 is organizing a Software Developer Course for the new HPC-system.

We are pleased to inform you that there will be a second HPC-course this autumn in Trondheim, at 30 November - 1 December, respectively. 
Registration is open at 
https://response.questback.com/uninett/hpctrainingseminar

Please refer to the announcement on www.sigma2.no for further details.

Dear Fimm cluster and Grunch server users:

Fimm cluster and grunch server will have downtime 25th August from 09:00 to 16:00.

During this downtime we will perform hardware firmware update, internal and external switch firmware update.

For Grunch server except hardware firmware update we will also enable quota on grunchfs.


We will also update slurm version to 16.05.4 on fimm.hpc.uib.no.

Both fimm.hpc.uib.no and grunch.bccs.uib.no will not be accessible during this downtime.

We will keep all process updated on this page.

Please contact hpc-support@hpc.uib.no  if you have any questions.


26/08/2016 09:40 Update:  Firmware update is done on internal and external switches.  Slurm is updated on fimm.hpc.uib.no and Grunch firmware is also updated. Currently we are working on quota on grunchfs, which needs to scan whole file system, this will take some time before we can make gruncfs available for users on grunch server.


26/08/2016 10:40 Update: Grunch quota is enabled, and grunch server is online again.

This only concerns users who has data under /export/grunchfs/ not /fimm/home



Dear Grunchfs file system users:

  On grunch.bccs.uib.no, we are in the process of changing underlying file system of grunchfs from GPFS to XFS. This is due to increasing maintenance cost of GPFS file system. Current usage of grunchfs is 221 TB. In the migration process we have to move the data from grunchfs to another temporary filesystem. We have created a temporary file system called grunchxfs and mounted it on grunch under /mnt/ro/grunchxfs. We have already done our first rsync between the two file systems. We plan to run two more rsync to finish whole process. 

 During the second rsync grunchfs will be still online, but during the last rsync we will take it offline. This necessary to assure filesystem consistency. This means, when we run last rsync, grunchfs will not be accessible for users for some period of time. We will have more accurate estimate of the downtime when we finish the second rsync. To make this downtime as short as possible (depends on the size of file system) we would kindly ask all grunch users to do the following:

  • check and remove any duplicated files and folders,
  • check and remove any unnecessary files and folders,
  • we have observed that some users have huge number of small files (inodes), if this is the case and is possible please pack (tar) them up.

These steps will significantly speed up the migration process to XFS, and we will have the shortest possible down time. 

Thank you in advance and we appreciate your understanding. 

Please contact hpc-support@hpc.uib.no if you have any questions.

Update 10:30, 08-06-2016 :
  
Second rsync is running, hopefully will finish late today. planning for lest offline rsync.

Update 10:40, 10-06-2016 : 
  
We postponed Grunch server  downtime to next Friday 17-06-2016, we will start from 09:00, all grunch users will have logoff from grunch server before 09:00.  We will run last offline rysnc, hopefully grunchfs will be online again by 10:00, Monday 20-06-2016. Please plan your work beforehand.

Update 11:00 20-06-2016

We will mount back temporary grunchfs  as soon as last rsync process finishes(around 13:00 today). Since this is temporary file system you experience performance withdraw on grunchfs.

Update 13:30 20-06-2016

The rsync processes did not finish as we planned,  therefor we can not open grunch server access.  We monitor the process closely will open access as soon as we can. 

Update 15:30 20-06-2016

The rsync process are still running, we are expecting process will finish later today.

Sorry for inconvenience. 

Update 10:30 21-06-2016

maintenance is finally finished, grunch server is back online. We will continue file system change and will keep you updated.

Update 12:00 28-06-2016 

We plan to have last downtime to finish the whole transformation process. Last downtime will start from 13:00 01-07-2016 and will end 13:00 04-07-2016. During this time we will run last offline rsync. Users are not allowed to access to the grunch server during this downtime.

We appreciate your understanding and  support.

Please do not hesitate to contact us if you have any further question. 


Update 10:36 04-07-2016

Due to unpredicted change and high usage of grunch files system, our rsync process is delayed. Therefor we have to extend our downtime until Wednesday 06-07-2016. By that time we plan to completely finalize  grunchfs move process. Downtime may finish earlier, we will open access as soon as last rsync finishes.

We appreciate your understanding and  support. 

Please do not hesitate to contact us if you have any further question. 

Update 10:00 06-07-2016

We have finished all grunchfs filesystem transfer yesterday afternoon, now grunchfs is mounted back as xfs.