Hexagon crashed today at 07:20 due to HSN panic. We are working on getting the system up again.
Update 09:25: Hexagon is now booted. All running jobs at the time of the crash has to be resubmitted.
Moving of /migrate and /bcmhsm October 16th
Thursday Oct. 16th at 09:00 will /migrate and /bcmhsm be mounted read only as /migrate-old and /bcmhsm-old on both fimm and hexagon.
A new EMPTY version will be mounted as /migrate and /bcmhsm on fimm and hexagon which will be writeable, but empty (except for directories). We will then start the process of moving the files of every user from the old file system to the new. This will be done one user at the time. Each user will be informed individually when all hers/his files has been moved to the new file system.
We would appreciate if users of /migrate and /bcmhsm would clean up their directories before we mount them read only. This will reduce the amount of data to move, hence, speed up the time consuming moving process.
We will keep the /migrate-old and /bcmhsm-old for some time after the moving has been completed. We will come back with a date for the termination of these later.
If you have any questions regarding this, please send your request to support-uib@notur.no
Update Oct. 16th 09:30: The old /migrate and /bcmhsm are now mounted read only under /migrate-old and /bcmhsm-old, while the new empty version are mounted under /migrate and /bcmhsm. We will now start the moving of data from the old to the new.
A new EMPTY version will be mounted as /migrate and /bcmhsm on fimm and hexagon which will be writeable, but empty (except for directories). We will then start the process of moving the files of every user from the old file system to the new. This will be done one user at the time. Each user will be informed individually when all hers/his files has been moved to the new file system.
We would appreciate if users of /migrate and /bcmhsm would clean up their directories before we mount them read only. This will reduce the amount of data to move, hence, speed up the time consuming moving process.
We will keep the /migrate-old and /bcmhsm-old for some time after the moving has been completed. We will come back with a date for the termination of these later.
If you have any questions regarding this, please send your request to support-uib@notur.no
Update Oct. 16th 09:30: The old /migrate and /bcmhsm are now mounted read only under /migrate-old and /bcmhsm-old, while the new empty version are mounted under /migrate and /bcmhsm. We will now start the moving of data from the old to the new.
Quota set on /var/spool/torque on hexagon
We have now activated Quota on /var/spool/torque on hexagon.
The soft limit is 3 GB and the hard limit is 6GB.
If the output of your job exceeds these limits the job will stop. You must then redirect the output of a job to a file.
I.e:
aprun -n 1 ./program > /work/$USER/output.txt
The soft limit is 3 GB and the hard limit is 6GB.
If the output of your job exceeds these limits the job will stop. You must then redirect the output of a job to a file.
I.e:
aprun -n 1 ./program > /work/$USER/output.txt
New NOTUR cpu-quota
The new NOTUR cpu-quota have been added to hexagon. They will automatically activate at Oct 1st at 00:00. The old quota will automatically be removed 1 second earlier. The command "cost" should give you the available hours.
New and updated libraries on hexagon
The libsci library is updated to version 10.3.0 and includes optimizations and new libraries. Users are encouraged to recompile their applications to benefit from optimazation and bugfixes.
Description of new features in xt-libsci 10.3.0:
CRAFFT (Cray Adaptive FFT) is a new feature in libsci-10.3.0. CRAFFT uses
offline and online testing information to adaptively select the best FFT
algorithm from the available FFT options. CRAFFT provides a very simple
user interface into advanced FFT functionality and performance. Planning
and execution are combined into one call with CRAFFT. The library comes
packaged with pre-computed plans so that in many cases the planning stage
can be omitted. Please see the manual page intro_crafft for more information.
Usage note : for the most optimal usage of CRAFFT, please copy the file
/opt/xt-libsci/10.3.0/fftw_wisdom into the luster directory from which the
executable is run from.
LibGoto 1.26 includes enhanced BLAS performance. There are several libsci
library variants installed with the libsci-10.3.0 package.
To use threaded BLAS, the thread-enabled libsci library whose name is
suffixed with '_mp' should be linked explicitly
e.g. ftn -o myexec -lsci_quadcore_mp
Dependencies:
=============
Libsci-10.3 and fftw-3.1.1 are now dependent. If you wish to use fftw
version 2.1.5 then do the following
module swap fftw/3.1.1 fftw/2.1.5.1
Description of new features in xt-libsci 10.3.0:
CRAFFT (Cray Adaptive FFT) is a new feature in libsci-10.3.0. CRAFFT uses
offline and online testing information to adaptively select the best FFT
algorithm from the available FFT options. CRAFFT provides a very simple
user interface into advanced FFT functionality and performance. Planning
and execution are combined into one call with CRAFFT. The library comes
packaged with pre-computed plans so that in many cases the planning stage
can be omitted. Please see the manual page intro_crafft for more information.
Usage note : for the most optimal usage of CRAFFT, please copy the file
/opt/xt-libsci/10.3.0/fftw_wisdom into the luster directory from which the
executable is run from.
LibGoto 1.26 includes enhanced BLAS performance. There are several libsci
library variants installed with the libsci-10.3.0 package.
To use threaded BLAS, the thread-enabled libsci library whose name is
suffixed with '_mp' should be linked explicitly
e.g. ftn -o myexec -lsci_quadcore_mp
Dependencies:
=============
Libsci-10.3 and fftw-3.1.1 are now dependent. If you wish to use fftw
version 2.1.5 then do the following
module swap fftw/3.1.1 fftw/2.1.5.1
Scheduled maintenance for hexagon on Aug. 18th
Monday August 18th at 14:00, Hexagon will be unavailable for approx. two hours, while an upgrade of the firmware on the /home file system is installed, this re-flash is necessary due to a failed firmware flash during the last maintenance window.
We are sorry about any inconvenience, and the short notice.
Update: Upgraded has been postponed to 16:00.
Update: After another delay the machine is taken down at 16:50
Update, 18:50, hexagon is now up again, but unavailable for users while checking the system.
Update, 19:15, hexagon is now available for users.
We are sorry about any inconvenience, and the short notice.
Update: Upgraded has been postponed to 16:00.
Update: After another delay the machine is taken down at 16:50
Update, 18:50, hexagon is now up again, but unavailable for users while checking the system.
Update, 19:15, hexagon is now available for users.
Updated software on hexagon
Since the last big software update on June 16th several libraries and programs have been updated.
MPT (MPI) 3.0.2
pgi 7.2.3
pathscale 3.2
CrayPat 4.3.1
libfast 1.0 (new library with some optimized math functions)
fftw 2.1.5.1
PAPI 3.6
Totalview 8.4.1b
gcc 4.2.4 (only for login-node programs)
xt-asyncpe 1.0c (new compiler wrappers)
xt-binutils-quadcore 2.0.1 (binutils for AMD quadcore)
Moab 5.2.3 scheduler (remember to log out and in again)
Users will need to log out and in again to get the above as default modules.
Because all applications that run on the compute nodes are statically compiled, we encourage re-compiling of applications and libraries, especially if you have experienced problems.
MPT (MPI) 3.0.2
pgi 7.2.3
pathscale 3.2
CrayPat 4.3.1
libfast 1.0 (new library with some optimized math functions)
fftw 2.1.5.1
PAPI 3.6
Totalview 8.4.1b
gcc 4.2.4 (only for login-node programs)
xt-asyncpe 1.0c (new compiler wrappers)
xt-binutils-quadcore 2.0.1 (binutils for AMD quadcore)
Moab 5.2.3 scheduler (remember to log out and in again)
Users will need to log out and in again to get the above as default modules.
Because all applications that run on the compute nodes are statically compiled, we encourage re-compiling of applications and libraries, especially if you have experienced problems.
Scheduled maintenance for hexagon on Aug. 11th
Monday, August 11th at 14:00 is hexagon scheduled for maintenance. The current failed nodes, and a module will be replaced. The machine will be unavailable for approximately two hours.
Update: August 11th 14:05, hexagon is shutdown for maintenance
Update: 15:20, hardware part is finished, fw-update, diagnostics and checking starts.
Update: 17:35, hexagon is now up and running. Note that due to reserved time for benchmarking (final part of Acceptance test) it will take some hours before jobs will start (but the queue will accept new jobs).
Update: August 11th 14:05, hexagon is shutdown for maintenance
Update: 15:20, hardware part is finished, fw-update, diagnostics and checking starts.
Update: 17:35, hexagon is now up and running. Note that due to reserved time for benchmarking (final part of Acceptance test) it will take some hours before jobs will start (but the queue will accept new jobs).
fimm file system crash
The global file systems on fimm crashed today at 14:45. We are working on solving the problem.
Update, 16:40: File systems are now up again. All jobs running at the time of the crash has to be resubmitted.
Update, 16:40: File systems are now up again. All jobs running at the time of the crash has to be resubmitted.
Batch-system problem on hexagon
The batch system on hexagon have some problems. We are investigating.
Update, 11:50: hexagon have problems with nodes mistakenly shown down.
Update, 12:20: hexagon is now OK again.
Update, 11:50: hexagon have problems with nodes mistakenly shown down.
Update, 12:20: hexagon is now OK again.