Due to hardware update on fimm login node and master node , we will have short down time on fimm cluster coming Wednesday, 9th of December, fimm login node will not be available from 13:00~16:00, all the running jobs which is not be able to finish until that time will crash , and has to be resubmitted, reservation set on fimm cluster, so that jobs will not finish before downtime will not be able to run.
We will keep information updated.
/work fs hang on hexagon
One of the OST /work FS nodes crashed. We are working on it. /work fs currently is unavailable.
Update:13:12 OST was recovered , /work FS should be back online
Update:25.11 15:44 new crash of the same node in filesystem. We are working to fix FS ASAP.
Update:25.11 16:15 /work is back alive. We had to disable quota.
Update:26.11 5:30 This time another OST crashed, fs is online, we are investigating root cause for OST crashes.
Update:13:12 OST was recovered , /work FS should be back online
Update:25.11 15:44 new crash of the same node in filesystem. We are working to fix FS ASAP.
Update:25.11 16:15 /work is back alive. We had to disable quota.
Update:26.11 5:30 This time another OST crashed, fs is online, we are investigating root cause for OST crashes.
Updated software on hexagon, Nov. 20
The following software/libraries have been updated on Hexagon:
* Libsci 10.4.1
Bug fix.
* PETSc 3.0.0.8
Bug fix.
* MPT 3.5.1
Bug fix.
* xt-asyncpe 3.4
Bug fixes.
* GCC 4.4.2
Version update.
* Intel Compiler 11.1.059
New Release.
* TotalView 8.7
Bug fixes and feature release.
* Libsci 10.4.1
Bug fix.
* PETSc 3.0.0.8
Bug fix.
* MPT 3.5.1
Bug fix.
* xt-asyncpe 3.4
Bug fixes.
* GCC 4.4.2
Version update.
* Intel Compiler 11.1.059
New Release.
* TotalView 8.7
Bug fixes and feature release.
Work file system on fimm will be down monday
Hi,
Due to firmware update on the storage system, We have to take down work file system on fimm.
We will start update firmware from 12:00 Monday (23th NOV), it will last for 3-4 hours, during that time fimm will be accessible without work file system. All the compute nodes reserved from now for update, job which can not finish before the update will not run.
We will keep information updated as it goes.
12:30 UPDATE work file system unmounted from cluster, preparing for
firmware update .
18:00 UPDATE firmware update on storage system failed some of the disc firmware update , we are working on it.
20:45 UPDATE firmware update finished. work file system mounted back to the cluster.
20:50 UPDATE reservation is canceled, all jobs will start to run.
Due to firmware update on the storage system, We have to take down work file system on fimm.
We will start update firmware from 12:00 Monday (23th NOV), it will last for 3-4 hours, during that time fimm will be accessible without work file system. All the compute nodes reserved from now for update, job which can not finish before the update will not run.
We will keep information updated as it goes.
12:30 UPDATE work file system unmounted from cluster, preparing for
firmware update .
18:00 UPDATE firmware update on storage system failed some of the disc firmware update , we are working on it.
20:45 UPDATE firmware update finished. work file system mounted back to the cluster.
20:50 UPDATE reservation is canceled, all jobs will start to run.
Fimm cluster crashed
Hi ;
Fimm cluster crashed around 17:45, due to file system issue, since all file system are unmounted all running jobs are crashed as well , running jobs before the crash has to be resubmitted.
We are investigating the issue and sorry for all inconvenience.
Fimm cluster crashed around 17:45, due to file system issue, since all file system are unmounted all running jobs are crashed as well , running jobs before the crash has to be resubmitted.
We are investigating the issue and sorry for all inconvenience.
Scheduled maintenance for hexagon, Mon. Nov. 23rd
Hexagon will have a scheduled maintenance on Monday Nov. 23rd from 13:00 to approx. 19:00. Some software updates and hardware replacements will be made. The queue have a reservation in place such that only jobs that can complete (according to asked for walltime) before the maintenance will start.
This note will be updated when we have more information.
Update: 19:08 Maintenance finished, system is up and open for users.
This note will be updated when we have more information.
Update: 19:08 Maintenance finished, system is up and open for users.
Hexagon Lustre file system hang, Oct. 18th
Some of the Lustre IO-nodes have hung. We are working on diagnostics.
The cause of the hang was a HSN hardware failure between two nodes.
Update 13:25, Hexagon is now running again after a reboot.
The cause of the hang was a HSN hardware failure between two nodes.
Update 13:25, Hexagon is now running again after a reboot.
Updated software/libraries on hexagon, Oct. 16th
The following software/libraries have been updated on Hexagon:
* Libsci 10.4.0
Cray Adaptive Fast Fourier Transform (CRAFFT) 2.0
* PETSc 3.0.0.7
Bug fix.
* MPT 3.5.0
MPI I/O Collective buffering enhancement and Bug Fixes.
* FFTW 3.2.2.1
Bug Fix.
* PGI 9.0.4
Bug fixes.
* Intelsup 11.1.056
Module file support for the Intel 11.1.056 compilers.
* netcdf 3.6.2
Re-release of netCDF with a name change to netcdf.
* hdf5-netcdf 1.5
Bug fix.
xt-libsci
The xt-libsci 10.4.0 release contains CRAFFT 2.0.
Cray Adaptive Fast Fourier Transform (CRAFFT) 2.0, packaged
with xt-libsci, adds new functionality to calculate 2d and 3d
double precision, complex-to-complex distributed memory Fourier
transforms. Compared to other parallel FFT libraries, CRAFFT
offers a simpler interface to improve application developer
productivity. In many cases the performance of the CRAFFT 2.0
distributed transforms is better than FFTW2 MPI transforms.
For example, using 2d FFT with transposed output for power-of-two
sizes, performance improvements can be from 10% up to 50% better
than FFTW2 MPI.
Users requiring more information on usage should see the
intro_crafft manpage.
PETSc
Bug fixed in PETSc 3.0.0.7: 753164 CASK Performance problem
xt-mpt
Features:
SHMEM_SWAP_BACKOFF enabled by default
A backoff algorithm has been in the shmem_swap and shmem_cswap
routines since MPT 3.0.0 but was not enabled by default.
It is now enabled by default with a multiplier value of 100.
This multiplier can be adjusted using the SHMEM_SWAP_BACKOFF
environment variable. The number of shmem_swap and shmem_cswap
calls and the number of backoffs done can be displayed by setting
SHMEM_SWAP_BACKOFF_BACKOFF_STATS to a value greater than 1.
MPI I/O Collective buffering enhanced for read
The collective buffering algorithm number 2, which is default,
has been enhanced for reads. This improves read performance in
some cases.
Bugs Fixed:
752391 No libmpich_threadm.a for PrgEnv-gnu
753298 shmem_set_lock fails when -N > 1
753540 MPI-IO related error with xt-mpt/3.3.0 and above
pgi
Features of PGI 9.0.4 are documented at:
http://www.pgroup.com/doc/pgiwsrn904.pdf
The following bugs are fixed in the PGI 9.0.4 release.
730860 SUPPORT FORTRAN 2003 "PROCEDURE" STATEMENT IN PGF90
COMPILER [TPR 3450]
752119 pgf90 -gopt produces symbols that gdb can't process [16040]
752407 PGI internal error when using ipa=fast [16068]
752456 PGI 9 compilation fails with long path to fail [16061]
hdf5 netcdf
Bugs Fixed: 753300 - HDF5 1.8.3.0 missing libraries compared to 1.8.2.3
* Libsci 10.4.0
Cray Adaptive Fast Fourier Transform (CRAFFT) 2.0
* PETSc 3.0.0.7
Bug fix.
* MPT 3.5.0
MPI I/O Collective buffering enhancement and Bug Fixes.
* FFTW 3.2.2.1
Bug Fix.
* PGI 9.0.4
Bug fixes.
* Intelsup 11.1.056
Module file support for the Intel 11.1.056 compilers.
* netcdf 3.6.2
Re-release of netCDF with a name change to netcdf.
* hdf5-netcdf 1.5
Bug fix.
xt-libsci
The xt-libsci 10.4.0 release contains CRAFFT 2.0.
Cray Adaptive Fast Fourier Transform (CRAFFT) 2.0, packaged
with xt-libsci, adds new functionality to calculate 2d and 3d
double precision, complex-to-complex distributed memory Fourier
transforms. Compared to other parallel FFT libraries, CRAFFT
offers a simpler interface to improve application developer
productivity. In many cases the performance of the CRAFFT 2.0
distributed transforms is better than FFTW2 MPI transforms.
For example, using 2d FFT with transposed output for power-of-two
sizes, performance improvements can be from 10% up to 50% better
than FFTW2 MPI.
Users requiring more information on usage should see the
intro_crafft manpage.
PETSc
Bug fixed in PETSc 3.0.0.7: 753164 CASK Performance problem
xt-mpt
Features:
SHMEM_SWAP_BACKOFF enabled by default
A backoff algorithm has been in the shmem_swap and shmem_cswap
routines since MPT 3.0.0 but was not enabled by default.
It is now enabled by default with a multiplier value of 100.
This multiplier can be adjusted using the SHMEM_SWAP_BACKOFF
environment variable. The number of shmem_swap and shmem_cswap
calls and the number of backoffs done can be displayed by setting
SHMEM_SWAP_BACKOFF_BACKOFF_STATS to a value greater than 1.
MPI I/O Collective buffering enhanced for read
The collective buffering algorithm number 2, which is default,
has been enhanced for reads. This improves read performance in
some cases.
Bugs Fixed:
752391 No libmpich_threadm.a for PrgEnv-gnu
753298 shmem_set_lock fails when -N > 1
753540 MPI-IO related error with xt-mpt/3.3.0 and above
pgi
Features of PGI 9.0.4 are documented at:
http://www.pgroup.com/doc/pgiwsrn904.pdf
The following bugs are fixed in the PGI 9.0.4 release.
730860 SUPPORT FORTRAN 2003 "PROCEDURE" STATEMENT IN PGF90
COMPILER [TPR 3450]
752119 pgf90 -gopt produces symbols that gdb can't process [16040]
752407 PGI internal error when using ipa=fast [16068]
752456 PGI 9 compilation fails with long path to fail [16061]
hdf5 netcdf
Bugs Fixed: 753300 - HDF5 1.8.3.0 missing libraries compared to 1.8.2.3
File system crash on Fimm
work file system on fimm crashed at 12:30, work was unmounted from most of the compute node, your job using work file system probably all crashed and you have to resubmit them.
We are investigating the issue and sorry for all inconvenience.
We are investigating the issue and sorry for all inconvenience.
Updated software on hexagon, Sep. 21st
The craypat and apprentice2 software used for performance-profiling your code is updated to a new major release (5.0) on hexagon.
See http://docs.cray.com/books/S-9403-50/S-9403-50.pdf for more information.
See http://docs.cray.com/books/S-9403-50/S-9403-50.pdf for more information.