We had to reboot login3 because of OOM, all jobs executed via login3 have been terminated.
Author Archives: lsz075
Hexagon: Updated software/libraries
The following software and libraries have been updated on hexagon:
MPI
xt-mpt 4.1.1 -> 5.0.0
MPI 2.2 compliance, except Dynamic Process Management
Compilers
PGI 10.4 -> 10.5
GCC 4.4.3 -> 4.4.4
xt-asyncpe 3.9 -> 4.0
java (security update) jdk1.6.0_20
Math/libs
xt-libsci 10.4.4 -> 10.4.5
LibSci 10.4.5 includes minor increases in functionality/support
of distributed CRAFFT routines. The enhancement improves the
coverage of 2d and 3d FFT routines by allowing real type work
arrays and input arguments for all transform types.
CRAFFT offers a simpler interface to improve application developer
productivity. In some cases the distributed CRAFFT 2.1 transforms
exhibit up to 10% speedup over comparable FFTW2 distributed
transforms.
Users requiring more information on usage should see the
intro_crafft manpage.
PETSc 3.0.0.10 -> 3.1.00
This new version of PETSc includes several changes including
performance enhancements of the sparse kernels used in the
incomplete LU preconditioning for AIJ and BAIJ matrix formats.
In Cray PETSc, these new kernels are further improved through
the new routines from Cray Adaptive Sparse Kernels (CASK).
In addition, the latest SuperLU-4.0 is included in this new
PETSc product.
More detailed information about the official PETSc-3.1 release is
available at
http://www.mcs.anl.gov/petsc/petsc-as/documentation/changes/31.html
SuperLU-4.0 information can be found at
http://crd.lbl.gov/~xiaoye/SuperLU/#superlu
MPI
xt-mpt 4.1.1 -> 5.0.0
MPI 2.2 compliance, except Dynamic Process Management
Compilers
PGI 10.4 -> 10.5
GCC 4.4.3 -> 4.4.4
xt-asyncpe 3.9 -> 4.0
java (security update) jdk1.6.0_20
Math/libs
xt-libsci 10.4.4 -> 10.4.5
LibSci 10.4.5 includes minor increases in functionality/support
of distributed CRAFFT routines. The enhancement improves the
coverage of 2d and 3d FFT routines by allowing real type work
arrays and input arguments for all transform types.
CRAFFT offers a simpler interface to improve application developer
productivity. In some cases the distributed CRAFFT 2.1 transforms
exhibit up to 10% speedup over comparable FFTW2 distributed
transforms.
Users requiring more information on usage should see the
intro_crafft manpage.
PETSc 3.0.0.10 -> 3.1.00
This new version of PETSc includes several changes including
performance enhancements of the sparse kernels used in the
incomplete LU preconditioning for AIJ and BAIJ matrix formats.
In Cray PETSc, these new kernels are further improved through
the new routines from Cray Adaptive Sparse Kernels (CASK).
In addition, the latest SuperLU-4.0 is included in this new
PETSc product.
More detailed information about the official PETSc-3.1 release is
available at
http://www.mcs.anl.gov/petsc/petsc-as/documentation/changes/31.html
SuperLU-4.0 information can be found at
http://crd.lbl.gov/~xiaoye/SuperLU/#superlu
Fimm: login node downtime, June 26th
On Saturday, June 26th, login node on Fimm will have downtime approximately from 10:00 to 11:00.
Please logout before this time window.
Running jobs will not be affected.
Reason: GPFS version on fimm login node is going to be updated.
Please logout before this time window.
Running jobs will not be affected.
Reason: GPFS version on fimm login node is going to be updated.
Fimm: login node crash
Fimm login node has been crashed. Reason GPFS filesystem hang. Login node is up and running. All open sessions has been killed.
Hexagon: Updated software/libraries
Hexagon has updates libraries.
* MPT 4.1.1
Bug fixes.
* xt-asyncpe 3.9
Bug fixes.
* Cray Scientific and Math Libraries 4.13
LibSci 10.4.4
CRAFFT update
Trilinos 10.2.0
CASK Update
* PGI 10.4
Bug fix release update from PGI.
* Cray Debugger Supporting Tools 1.0.2
ATP 1.0.2
Bug fixes
* TotalView 8.8.0
Replay Engine Feature release.
* MPT 4.1.1
Bug fixes.
* xt-asyncpe 3.9
Bug fixes.
* Cray Scientific and Math Libraries 4.13
LibSci 10.4.4
CRAFFT update
Trilinos 10.2.0
CASK Update
* PGI 10.4
Bug fix release update from PGI.
* Cray Debugger Supporting Tools 1.0.2
ATP 1.0.2
Bug fixes
* TotalView 8.8.0
Replay Engine Feature release.
Fimm: bjerknode/Compute-1-32 is down
Due to over memory usage , compute-1-32 crashed last night, we are trying reinstall that node , but currently having some hardware failure issue,
we are still working on it , hope fully compute-1-32 will be up tomorrow, we will keep you updated.
We are sorry for incontinence.
UPDATE 26th May 11:47 compute-1-32 is up and running.
we are still working on it , hope fully compute-1-32 will be up tomorrow, we will keep you updated.
We are sorry for incontinence.
UPDATE 26th May 11:47 compute-1-32 is up and running.
Fimm: file server crashed
One of the file server serving work and home file system on fimm cluster crashed 14:10, all jobs using work file system crashed due to Stale NFS file handle.
File server rebooted , and home file system and work file system mounted back to all compute nodes.
File server rebooted , and home file system and work file system mounted back to all compute nodes.
Hexagon: ncl with aprun support
Hexagon has got NCL version which is capable to run with aprun. Latest module version 5.2.0 is aprun compatible. This version is loaded by default if you do module load ncl_ncarg.
If you miss some features and you want to run ncl on login node, then module load ncl_ncarg/5.2.0-login shall be used.
If you miss some features and you want to run ncl on login node, then module load ncl_ncarg/5.2.0-login shall be used.
Hexagon: down, HSN link problem
Failure in HSN link, hexagon is down. We are working on problem.
Update: 20:50 Machine is back online
Update: 20:50 Machine is back online
Hexagon: scheduled maintenance, Mon. May 10
Hexagon will have a scheduled maintenance on Thursday May. 6th from 12:00.
This is to fix problem with cabinet 7.
The queue have a reservation in place such that only jobs that can complete (according to asked for walltime) before the maintenance will start.
This note will be updated when we have more information.
Update: Maintenance has been moved to Monday May 10th, from 12:00
Update: 10.05, 18:20 Maintenance finished, machine is back online.
This is to fix problem with cabinet 7.
The queue have a reservation in place such that only jobs that can complete (according to asked for walltime) before the maintenance will start.
This note will be updated when we have more information.
Update: Maintenance has been moved to Monday May 10th, from 12:00
Update: 10.05, 18:20 Maintenance finished, machine is back online.