For all fimm users :
GPFS file system on fimm crashed today around 10:15, home file system and work file system was unmounted from all compute nodes there for all jobs running were killed.
Problem is resolved now, file system is back online, all crashed jobs has to be resubmitted.
We are apologize for inconvenience.
Author Archives: lsz075
Hexagon: thunderstorm causes reboot
Hexagon needs a reboot after a thunderstorm caused power-blink in building power-supply.
Update 21:10: Hexagon is up again without cabinet 8 (needs manual intervention).
Update 21:10: Hexagon is up again without cabinet 8 (needs manual intervention).
Hexagon: scheduled maintenance on August 1st
Hexagon will have scheduled maintenance August 1st extending into August 2nd.
We will do changes to all the PDUs to increase reliability of the system and additionally install latest update of the OS (CLE and SMW). The latest software update will also increase stability and decrease startup and debug times for system failures.
The scheduled downtime will start August 1st at 09:00.
Please send any questions to support-uib@notur.no
An email concerning this was sent out on July 25th to all hexagon users.
Hexagon Sysadmins
Updates:
Update Thursday 10:30, The changes to the PDUs is taking longer than expected, therefore the maintenance will be extended into Friday August 3rd.
Update Friday 12:00, There will unfortunately be a further delay before the system is up again. Currently, we expect to boot the system on Saturday August 4th.
Update Sunday 19:25, The maintenance has been finished and the system is back online.
We will do changes to all the PDUs to increase reliability of the system and additionally install latest update of the OS (CLE and SMW). The latest software update will also increase stability and decrease startup and debug times for system failures.
The scheduled downtime will start August 1st at 09:00.
Please send any questions to support-uib@notur.no
An email concerning this was sent out on July 25th to all hexagon users.
Hexagon Sysadmins
Updates:
Update Thursday 10:30, The changes to the PDUs is taking longer than expected, therefore the maintenance will be extended into Friday August 3rd.
Update Friday 12:00, There will unfortunately be a further delay before the system is up again. Currently, we expect to boot the system on Saturday August 4th.
Update Sunday 19:25, The maintenance has been finished and the system is back online.
Hexagon: updated software/libraries – 2
Hexagon has some further updates to software/libraries:
xt-mpich2 (MPI) 5.5.1 -> 5.5.2
xt-asyncpe 5.11 -> 5.12
gcc 4.7.0 -> 4.7.1
ATP 1.4.4 -> 1.5.0
Intel compiler 12.1.4.319 -> 12.1.5.339
Totalview 8.9.2 -> 8.10
See http://docs.cray.com/books/S-9401-1207//S-9401-1207.pdf
xt-mpich2 (MPI) 5.5.1 -> 5.5.2
xt-asyncpe 5.11 -> 5.12
gcc 4.7.0 -> 4.7.1
ATP 1.4.4 -> 1.5.0
Intel compiler 12.1.4.319 -> 12.1.5.339
Totalview 8.9.2 -> 8.10
See http://docs.cray.com/books/S-9401-1207//S-9401-1207.pdf
Hexagon: updated software/libraries
Hexagon has updated software/libraries. Please read the full information about new features and bugs fixed in:
http://docs.cray.com/books/S-9401-1206/
In short the following has been updated:
xt-mpich2 (MPI) 5.4.5 -> 5.5.1
PGI 12.4.0 -> 12.5.0
GCC 4.6.3 -> 4.7.0
Cray compiler CCE 8.0.5 -> 8.0.6
xt-asyncpe 5.10 -> 5.11
xt-libsci 11.0.06 -> 11.1
Trilinos 10.8.3.0 -> 10.8.3.1
Petsc 3.2.01 -> 3.2.02
TPSL 1.2.00 -> 1.2.01
fftw 3.3.00 -> 3.3.01
Netcdf 4.1.3 -> 4.2.0
HDF5 1.8.7 -> 1.8.8
Note that Netcdf 4.2.0 no longer provides the legacy libnetcdf_c++ API but only the new libnetcdf_c++4 API.
Programs needs to be recompiled to gain any features or bugfixes due to static linking.
http://docs.cray.com/books/S-9401-1206/
In short the following has been updated:
xt-mpich2 (MPI) 5.4.5 -> 5.5.1
PGI 12.4.0 -> 12.5.0
GCC 4.6.3 -> 4.7.0
Cray compiler CCE 8.0.5 -> 8.0.6
xt-asyncpe 5.10 -> 5.11
xt-libsci 11.0.06 -> 11.1
Trilinos 10.8.3.0 -> 10.8.3.1
Petsc 3.2.01 -> 3.2.02
TPSL 1.2.00 -> 1.2.01
fftw 3.3.00 -> 3.3.01
Netcdf 4.1.3 -> 4.2.0
HDF5 1.8.7 -> 1.8.8
Note that Netcdf 4.2.0 no longer provides the legacy libnetcdf_c++ API but only the new libnetcdf_c++4 API.
Programs needs to be recompiled to gain any features or bugfixes due to static linking.
Hexagon: scheduled reboot on Friday June 29th
We are going to reboot hexagon on Friday, June 29th at 10:00. This is to add cabinet c12 into the system. The job scheduler has reservation so that only jobs which can finish before maintenance reboot can start. We expect reboot should not take longer than 1 hour.
Update: 10:43 System is up and running.
Update: 10:43 System is up and running.
Hexagon: reboot due to high speed network problems
Hexagon is getting restart due to the high speed network problems.
Update 20:00, hexagon is now up again without cabinet c12, we will do maintenance on this cabinet soon, likely next week.
Update 20:00, hexagon is now up again without cabinet c12, we will do maintenance on this cabinet soon, likely next week.
Hexagon: cabinet power issue
Hexagon lost 1 cabinet on May 30th because of a power failure, due to the high resiliency it continues to run. On June 7th at about 08:00 another cabinet also got a power failure. The current state is that it continues to operate but 2 login-nodes are down causing connection attempts to fail (depends on round-robing of dns) and some of the nodes have communication problems. We are investigating possible solutions.
Update 12:30: We need to restart the machine to be able to bring it back up.
Update 13:00: Machine is now up again.
Update 12:30: We need to restart the machine to be able to bring it back up.
Update 13:00: Machine is now up again.
Hexagon: updated software/libraries
Hexagon has updated software/libraries.
Please see http://docs.cray.com/books/S-9401-1205//S-9401-1205.pdf for the full release information.
Briefly, these packages were updated:
PGI: 12.3.0 -> 12.4.0
Cray compiler (CCE) 8.0.4 -> 8.0.5
ATP 1.4.3 -> 1.4.4.
Chapel 1.3.0 -> 1.4.0
xt-asyncpe 5.09 -> 5.10
Intel compiler 12.1.2.273 -> 12.1.4.319
Please see http://docs.cray.com/books/S-9401-1205//S-9401-1205.pdf for the full release information.
Briefly, these packages were updated:
PGI: 12.3.0 -> 12.4.0
Cray compiler (CCE) 8.0.4 -> 8.0.5
ATP 1.4.3 -> 1.4.4.
Chapel 1.3.0 -> 1.4.0
xt-asyncpe 5.09 -> 5.10
Intel compiler 12.1.2.273 -> 12.1.4.319
Hexagon: updated software/libraries
Hexagon has updated software and libraries.
xt-mpich2 5.4.5
Cray compiler cce 8.0.4
papi 4.3.0.1
PGI 12.3.0
GCC 4.6.3
lgdb 1.5
PETSc 3.2.01
xt-asyncpe 5.09
perftools 5.3.2
See
http://docs.cray.com/books/S-9401-1204//S-9401-1204.pdf
for full changelog.
xt-mpich2 5.4.5
Cray compiler cce 8.0.4
papi 4.3.0.1
PGI 12.3.0
GCC 4.6.3
lgdb 1.5
PETSc 3.2.01
xt-asyncpe 5.09
perftools 5.3.2
See
http://docs.cray.com/books/S-9401-1204//S-9401-1204.pdf
for full changelog.