Hexagon has some further updates to software/libraries:
xt-mpich2 (MPI) 5.5.1 -> 5.5.2
xt-asyncpe 5.11 -> 5.12
gcc 4.7.0 -> 4.7.1
ATP 1.4.4 -> 1.5.0
Intel compiler 12.1.4.319 -> 12.1.5.339
Totalview 8.9.2 -> 8.10
See http://docs.cray.com/books/S-9401-1207//S-9401-1207.pdf
Hexagon: updated software/libraries
Hexagon has updated software/libraries. Please read the full information about new features and bugs fixed in:
http://docs.cray.com/books/S-9401-1206/
In short the following has been updated:
xt-mpich2 (MPI) 5.4.5 -> 5.5.1
PGI 12.4.0 -> 12.5.0
GCC 4.6.3 -> 4.7.0
Cray compiler CCE 8.0.5 -> 8.0.6
xt-asyncpe 5.10 -> 5.11
xt-libsci 11.0.06 -> 11.1
Trilinos 10.8.3.0 -> 10.8.3.1
Petsc 3.2.01 -> 3.2.02
TPSL 1.2.00 -> 1.2.01
fftw 3.3.00 -> 3.3.01
Netcdf 4.1.3 -> 4.2.0
HDF5 1.8.7 -> 1.8.8
Note that Netcdf 4.2.0 no longer provides the legacy libnetcdf_c++ API but only the new libnetcdf_c++4 API.
Programs needs to be recompiled to gain any features or bugfixes due to static linking.
http://docs.cray.com/books/S-9401-1206/
In short the following has been updated:
xt-mpich2 (MPI) 5.4.5 -> 5.5.1
PGI 12.4.0 -> 12.5.0
GCC 4.6.3 -> 4.7.0
Cray compiler CCE 8.0.5 -> 8.0.6
xt-asyncpe 5.10 -> 5.11
xt-libsci 11.0.06 -> 11.1
Trilinos 10.8.3.0 -> 10.8.3.1
Petsc 3.2.01 -> 3.2.02
TPSL 1.2.00 -> 1.2.01
fftw 3.3.00 -> 3.3.01
Netcdf 4.1.3 -> 4.2.0
HDF5 1.8.7 -> 1.8.8
Note that Netcdf 4.2.0 no longer provides the legacy libnetcdf_c++ API but only the new libnetcdf_c++4 API.
Programs needs to be recompiled to gain any features or bugfixes due to static linking.
Hexagon: scheduled reboot on Friday June 29th
We are going to reboot hexagon on Friday, June 29th at 10:00. This is to add cabinet c12 into the system. The job scheduler has reservation so that only jobs which can finish before maintenance reboot can start. We expect reboot should not take longer than 1 hour.
Update: 10:43 System is up and running.
Update: 10:43 System is up and running.
Hexagon: reboot due to high speed network problems
Hexagon is getting restart due to the high speed network problems.
Update 20:00, hexagon is now up again without cabinet c12, we will do maintenance on this cabinet soon, likely next week.
Update 20:00, hexagon is now up again without cabinet c12, we will do maintenance on this cabinet soon, likely next week.
Hexagon: cabinet power issue
Hexagon lost 1 cabinet on May 30th because of a power failure, due to the high resiliency it continues to run. On June 7th at about 08:00 another cabinet also got a power failure. The current state is that it continues to operate but 2 login-nodes are down causing connection attempts to fail (depends on round-robing of dns) and some of the nodes have communication problems. We are investigating possible solutions.
Update 12:30: We need to restart the machine to be able to bring it back up.
Update 13:00: Machine is now up again.
Update 12:30: We need to restart the machine to be able to bring it back up.
Update 13:00: Machine is now up again.
Hexagon: updated software/libraries
Hexagon has updated software/libraries.
Please see http://docs.cray.com/books/S-9401-1205//S-9401-1205.pdf for the full release information.
Briefly, these packages were updated:
PGI: 12.3.0 -> 12.4.0
Cray compiler (CCE) 8.0.4 -> 8.0.5
ATP 1.4.3 -> 1.4.4.
Chapel 1.3.0 -> 1.4.0
xt-asyncpe 5.09 -> 5.10
Intel compiler 12.1.2.273 -> 12.1.4.319
Please see http://docs.cray.com/books/S-9401-1205//S-9401-1205.pdf for the full release information.
Briefly, these packages were updated:
PGI: 12.3.0 -> 12.4.0
Cray compiler (CCE) 8.0.4 -> 8.0.5
ATP 1.4.3 -> 1.4.4.
Chapel 1.3.0 -> 1.4.0
xt-asyncpe 5.09 -> 5.10
Intel compiler 12.1.2.273 -> 12.1.4.319
Hexagon: updated software/libraries
Hexagon has updated software and libraries.
xt-mpich2 5.4.5
Cray compiler cce 8.0.4
papi 4.3.0.1
PGI 12.3.0
GCC 4.6.3
lgdb 1.5
PETSc 3.2.01
xt-asyncpe 5.09
perftools 5.3.2
See
http://docs.cray.com/books/S-9401-1204//S-9401-1204.pdf
for full changelog.
xt-mpich2 5.4.5
Cray compiler cce 8.0.4
papi 4.3.0.1
PGI 12.3.0
GCC 4.6.3
lgdb 1.5
PETSc 3.2.01
xt-asyncpe 5.09
perftools 5.3.2
See
http://docs.cray.com/books/S-9401-1204//S-9401-1204.pdf
for full changelog.
Hexagon: immediate maintenance
We encountered errors on the /home file system. Therefore we have to
shutdown machine immediately for the maintenance.
We will use this opportunity to rerun HPL benchmark on whole machine
right after maintenance, this means that your submitted jobs will start
5 hours after maintenance is finished.
We apologize for any inconvenience.
Update 13:00: Machine is back online. The inconsistency on /home has been fixed.
shutdown machine immediately for the maintenance.
We will use this opportunity to rerun HPL benchmark on whole machine
right after maintenance, this means that your submitted jobs will start
5 hours after maintenance is finished.
We apologize for any inconvenience.
Update 13:00: Machine is back online. The inconsistency on /home has been fixed.
Hexagon: system crash
Hexagon is down due to a power issue. We are investigating.
Update 13:00 : Hexagon is back online. we found that panel breaker that tripped causing loss of power and crash of the system. All jobs which were running needs to be re-submitted.
Update 13:00 : Hexagon is back online. we found that panel breaker that tripped causing loss of power and crash of the system. All jobs which were running needs to be re-submitted.
Fimm.bccs.uib.no maintenance
Dear fimm cluster user :
We will have scheduled down time for cluster fimm.bccs.uib.no. on First
Of April at 08:00 am. cluster is reserved for this downtime today 13:30.
Reservation will last 24 hours until 08:00 04/02/2012
We will enforce quota on home file system during the maintenance, we
ask all users to check their home file system usage (repquota.sh), and
compare your quota(hardquota) and your actual usage, and
remove files accordingly.
If you don't do so, you home file system will be "locked" and you wont
be able to do anything even if you logged in after all.
We will also perform hardware and software maintenance which
includes upgrading firmware, reinstalling all compute nodes, some
cable and switch changes.
All jobs which will not be finished by 08:00 am , 04/01/2012
* WILL BE KILLED *, we kindly ask you to save/remove/take care of your
job if it will not finish on time.
If you submit a job after reservation (reservation set today 13:30),
system will check if your job can be finished before down time , if not
it will be queued until maintenance is over, if it can be finished
it will just run.
We will keep any update posted here.
Let us know if you have any further question.
Update : Down time extended until 18:00 02/04/2012
Update 15:05/02: maintenance is finished. due to network driver issue we have reserved some of the nodes for further maintenance, reservation on cluster is removed, but less nodes are in cluster.
We will have scheduled down time for cluster fimm.bccs.uib.no. on First
Of April at 08:00 am. cluster is reserved for this downtime today 13:30.
Reservation will last 24 hours until 08:00 04/02/2012
We will enforce quota on home file system during the maintenance, we
ask all users to check their home file system usage (repquota.sh), and
compare your quota(hardquota) and your actual usage, and
remove files accordingly.
If you don't do so, you home file system will be "locked" and you wont
be able to do anything even if you logged in after all.
We will also perform hardware and software maintenance which
includes upgrading firmware, reinstalling all compute nodes, some
cable and switch changes.
All jobs which will not be finished by 08:00 am , 04/01/2012
* WILL BE KILLED *, we kindly ask you to save/remove/take care of your
job if it will not finish on time.
If you submit a job after reservation (reservation set today 13:30),
system will check if your job can be finished before down time , if not
it will be queued until maintenance is over, if it can be finished
it will just run.
We will keep any update posted here.
Let us know if you have any further question.
Update : Down time extended until 18:00 02/04/2012
Update 15:05/02: maintenance is finished. due to network driver issue we have reserved some of the nodes for further maintenance, reservation on cluster is removed, but less nodes are in cluster.