Hexagon went down because of problems with cooling related to a thunderstorm. We are looking into this issue.
Update 21:00: Machine is up.
Author Archives: lsz075
Hexagon: emergency reboot
We are experiencing issues with HSN network and rebooting Hexagon. We expect it to be back soon. In addition we will apply a few important security patches.
Update 16:41: Machine is back online.
Update 16:41: Machine is back online.
Hexagon: /home is read only
There was an interrupt in storage connection to /home filesystem. This rendered /home to be read-only, we are working to fix this problem ASAP.
Update 16:08 We are running fsck on /home, access is still closed
Update 19:50 Fsck has finished booting machine
Update 20:18 Machine is back online
Update 16:08 We are running fsck on /home, access is still closed
Update 19:50 Fsck has finished booting machine
Update 20:18 Machine is back online
Hexagon: rebooted because of important security update
We had to reboot Hexagon because of important security update. Machine will be up in an hour. Our apologies for inconvenience.
Hexagon: updated software/libraries
Hexagon has updated software/libraries.
Please see the following for full description:
http://docs.cray.com/books/S-9407-1404//S-9407-1404.pdf
http://docs.cray.com/books/S-9404-19//S-9404-19.pdf
cray-mpich 6.3.0 -> 6.3.1
cray-lgdb 2.2.4 -> 2.3.0
cce 8.2.5 -> 8.2.6
atp 1.7.1 -> 1.7.2
chapel 1.8.0 -> 1.9.0
pgi 14.2.0 -> 14.3.0
re-issue of cray-gcc 4.8.0, 4.8.1 and 4.8.2
Please see the following for full description:
http://docs.cray.com/books/S-9407-1404//S-9407-1404.pdf
http://docs.cray.com/books/S-9404-19//S-9404-19.pdf
cray-mpich 6.3.0 -> 6.3.1
cray-lgdb 2.2.4 -> 2.3.0
cce 8.2.5 -> 8.2.6
atp 1.7.1 -> 1.7.2
chapel 1.8.0 -> 1.9.0
pgi 14.2.0 -> 14.3.0
re-issue of cray-gcc 4.8.0, 4.8.1 and 4.8.2
Fimm: login1 will go down
Dear fimm cluster users:
We have recently added two extra login nodes (login2 and login3 ) to fimm.bccs.uib.no. Current login1 will be go under maintenance for short period of time,and will be added back eventually.
We kindly ask you to save your current work on login1 and log-off from login1 and relogin again, you will landed one of login2 or login3. When you do "ssh fimm" DNS server will pick login node according to round-robin scheduling.
Eventually we will have 3 login nodes on fimm(login1, login2,login3), all of them has identical hardware, you can ssh between login nodes.
We have done this to increase redundancy and uptime.
Let us know if this caused any problem for you.
We have recently added two extra login nodes (login2 and login3 ) to fimm.bccs.uib.no. Current login1 will be go under maintenance for short period of time,and will be added back eventually.
We kindly ask you to save your current work on login1 and log-off from login1 and relogin again, you will landed one of login2 or login3. When you do "ssh fimm" DNS server will pick login node according to round-robin scheduling.
Eventually we will have 3 login nodes on fimm(login1, login2,login3), all of them has identical hardware, you can ssh between login nodes.
We have done this to increase redundancy and uptime.
Let us know if this caused any problem for you.
Hexagon: scheduled maintenance on April, 23rd 9:30
Hexagon is going to have a scheduled maintenance on April, 23rd. The
maintenance will start at 9:30. The expected downtime is about 12 hours.
During the maintenance we are going to do the following:
* Upgrade the compute node Linux to 4.2UP02.
* Upgrade the management station base OS and Cray software release.
* Apply different security patches.
* Upgrade the storage firmware.
All running jobs will terminated. The job submission system has a
reservation in place, it will not allow to start jobs which will not
be able to finish before the maintenance start.
Update 20:10 The maintenance is over. The machine is back online.
maintenance will start at 9:30. The expected downtime is about 12 hours.
During the maintenance we are going to do the following:
* Upgrade the compute node Linux to 4.2UP02.
* Upgrade the management station base OS and Cray software release.
* Apply different security patches.
* Upgrade the storage firmware.
All running jobs will terminated. The job submission system has a
reservation in place, it will not allow to start jobs which will not
be able to finish before the maintenance start.
Update 20:10 The maintenance is over. The machine is back online.
/mighsm and /bcmhsm unavailable on Apr 5th
We will have an electrician doing maintenance in one of the server
rooms this Saturday, April 5th. This will lead to that /migrate
and /bcmhsm file systems will not be available for several hours on
April 5th, from 9:00.
Hexagon and running jobs should not be affected.
rooms this Saturday, April 5th. This will lead to that /migrate
and /bcmhsm file systems will not be available for several hours on
April 5th, from 9:00.
Hexagon and running jobs should not be affected.
Hexagon: updated software/libraries
Hexagon has updated compilers and libraries.
Please read the full description/changelog in this announcement:
http://docs.cray.com/books/S-9407-1403//S-9407-1403.pdf
Updated software:
cray-mpich 6.2.2 -> 6.3.0
pmi 5.0.2 -> 5.0.3
cce 8.2.4 -> 8.2.5
pgi 14.1.0 -> 14.2.0
cray-ccdb 1.0.1 -> 1.0.2
xt-asyncpe 5.25 -> 5.26
totalview 8.12.0.1 -> 8.13.0
papi 5.2.0 -> 5.3.0
perftools (craypat) 6.1.3 -> 6.1.4
cray-libsci 12.1.3 -> 12.2.0
cray-petsc 3.4.2.3 -> 3.4.3.1
cray-tpsl 1.3.04 -> 1.4.0
cray-ga 5.1.0.3 -> 5.1.0.4
cray-trilinos 11.4.1.0 -> 11.6.1.0
In addition the following modules where removed:
chapel 1.4.0, 1.5.0, 1.7.0, 1.7.0.1
totalview 8.9.2, 8.10.0
pgi 11.10, 12.9, 13.6
xt-asyncpe 5.07 -> 5.15
cce 8.2.0
fftw 3.3.0.2
intel 13.1.163
Please read the full description/changelog in this announcement:
http://docs.cray.com/books/S-9407-1403//S-9407-1403.pdf
Updated software:
cray-mpich 6.2.2 -> 6.3.0
pmi 5.0.2 -> 5.0.3
cce 8.2.4 -> 8.2.5
pgi 14.1.0 -> 14.2.0
cray-ccdb 1.0.1 -> 1.0.2
xt-asyncpe 5.25 -> 5.26
totalview 8.12.0.1 -> 8.13.0
papi 5.2.0 -> 5.3.0
perftools (craypat) 6.1.3 -> 6.1.4
cray-libsci 12.1.3 -> 12.2.0
cray-petsc 3.4.2.3 -> 3.4.3.1
cray-tpsl 1.3.04 -> 1.4.0
cray-ga 5.1.0.3 -> 5.1.0.4
cray-trilinos 11.4.1.0 -> 11.6.1.0
In addition the following modules where removed:
chapel 1.4.0, 1.5.0, 1.7.0, 1.7.0.1
totalview 8.9.2, 8.10.0
pgi 11.10, 12.9, 13.6
xt-asyncpe 5.07 -> 5.15
cce 8.2.0
fftw 3.3.0.2
intel 13.1.163
Fimm and Grunch down time 5th April
We will have maintenance in machine room on 5th of
April. Electrician will work on power line in machine room which
requirers electricity to be switched off completely.
Therefor fimm.bccs.uib.no cluster and grunch server will be
shutdown for 3 hours. We have reserved cluster for maintenance which
means jobs submitted to cluster which can not be finished by that time
will not run, and jobs which is already running but will not be able to
finish by that time will be killed.
Maintenance will start from 09:00 AM in the morning. We would advice
you to save all your work on fimm.bccs.uib.no and
grunch.bccs.uib.no by that time.
We are sorry for inconvenience, and appreciate your understanding.
April. Electrician will work on power line in machine room which
requirers electricity to be switched off completely.
Therefor fimm.bccs.uib.no cluster and grunch server will be
shutdown for 3 hours. We have reserved cluster for maintenance which
means jobs submitted to cluster which can not be finished by that time
will not run, and jobs which is already running but will not be able to
finish by that time will be killed.
Maintenance will start from 09:00 AM in the morning. We would advice
you to save all your work on fimm.bccs.uib.no and
grunch.bccs.uib.no by that time.
We are sorry for inconvenience, and appreciate your understanding.