We will have a short scheduled maintenance for hexagon on Monday November 17th at 13:30. Estimated downtime is 1 hour. Task is to apply a patch for a memory bug.
Update Monday 17th 12:15, due to empty queue system and an issue with the batch system scheduler we did the restart early, we are sorry for any inconvenience this may have caused.
Update 12:30, machine is now running again.
Author Archives: lsz075
Hexagon crash on October 21st
hexagon got a failed voltage regulator on one of the modules at 16:20, this in turn caused a crash on several of the io-nodes responsible for /work.
We are collecting debug information and rebooting (replacing hardware at next scheduled maintenance).
Update 17:30, hexagon is running again. All jobs must be resubmitted.
We are collecting debug information and rebooting (replacing hardware at next scheduled maintenance).
Update 17:30, hexagon is running again. All jobs must be resubmitted.
Scheduled maintenance on hexagon moved to Nov. 5th.
October 27th at 14:00 will hexagon be unavailable due a scheduled maintenance. Some faulty hardware will be replaced and some software will be updated. The maintenance will probably take 5 hours.
Update: The scheduled maintenance have been moved to 10:00 at November the 5th.
Update Nov 5th 10:00: Maintenance is now started.
Update 18:30: We have some problems with the file system /work. We are working on solving this problem.
Update 23:00: Hexagon is now running again.
Update: The scheduled maintenance have been moved to 10:00 at November the 5th.
Update Nov 5th 10:00: Maintenance is now started.
Update 18:30: We have some problems with the file system /work. We are working on solving this problem.
Update 23:00: Hexagon is now running again.
Hexagon crash on October 14th
Hexagon crashed today at 07:20 due to HSN panic. We are working on getting the system up again.
Update 09:25: Hexagon is now booted. All running jobs at the time of the crash has to be resubmitted.
Update 09:25: Hexagon is now booted. All running jobs at the time of the crash has to be resubmitted.
Moving of /migrate and /bcmhsm October 16th
Thursday Oct. 16th at 09:00 will /migrate and /bcmhsm be mounted read only as /migrate-old and /bcmhsm-old on both fimm and hexagon.
A new EMPTY version will be mounted as /migrate and /bcmhsm on fimm and hexagon which will be writeable, but empty (except for directories). We will then start the process of moving the files of every user from the old file system to the new. This will be done one user at the time. Each user will be informed individually when all hers/his files has been moved to the new file system.
We would appreciate if users of /migrate and /bcmhsm would clean up their directories before we mount them read only. This will reduce the amount of data to move, hence, speed up the time consuming moving process.
We will keep the /migrate-old and /bcmhsm-old for some time after the moving has been completed. We will come back with a date for the termination of these later.
If you have any questions regarding this, please send your request to support-uib@notur.no
Update Oct. 16th 09:30: The old /migrate and /bcmhsm are now mounted read only under /migrate-old and /bcmhsm-old, while the new empty version are mounted under /migrate and /bcmhsm. We will now start the moving of data from the old to the new.
A new EMPTY version will be mounted as /migrate and /bcmhsm on fimm and hexagon which will be writeable, but empty (except for directories). We will then start the process of moving the files of every user from the old file system to the new. This will be done one user at the time. Each user will be informed individually when all hers/his files has been moved to the new file system.
We would appreciate if users of /migrate and /bcmhsm would clean up their directories before we mount them read only. This will reduce the amount of data to move, hence, speed up the time consuming moving process.
We will keep the /migrate-old and /bcmhsm-old for some time after the moving has been completed. We will come back with a date for the termination of these later.
If you have any questions regarding this, please send your request to support-uib@notur.no
Update Oct. 16th 09:30: The old /migrate and /bcmhsm are now mounted read only under /migrate-old and /bcmhsm-old, while the new empty version are mounted under /migrate and /bcmhsm. We will now start the moving of data from the old to the new.
Quota set on /var/spool/torque on hexagon
We have now activated Quota on /var/spool/torque on hexagon.
The soft limit is 3 GB and the hard limit is 6GB.
If the output of your job exceeds these limits the job will stop. You must then redirect the output of a job to a file.
I.e:
aprun -n 1 ./program > /work/$USER/output.txt
The soft limit is 3 GB and the hard limit is 6GB.
If the output of your job exceeds these limits the job will stop. You must then redirect the output of a job to a file.
I.e:
aprun -n 1 ./program > /work/$USER/output.txt
New NOTUR cpu-quota
The new NOTUR cpu-quota have been added to hexagon. They will automatically activate at Oct 1st at 00:00. The old quota will automatically be removed 1 second earlier. The command "cost" should give you the available hours.
New and updated libraries on hexagon
The libsci library is updated to version 10.3.0 and includes optimizations and new libraries. Users are encouraged to recompile their applications to benefit from optimazation and bugfixes.
Description of new features in xt-libsci 10.3.0:
CRAFFT (Cray Adaptive FFT) is a new feature in libsci-10.3.0. CRAFFT uses
offline and online testing information to adaptively select the best FFT
algorithm from the available FFT options. CRAFFT provides a very simple
user interface into advanced FFT functionality and performance. Planning
and execution are combined into one call with CRAFFT. The library comes
packaged with pre-computed plans so that in many cases the planning stage
can be omitted. Please see the manual page intro_crafft for more information.
Usage note : for the most optimal usage of CRAFFT, please copy the file
/opt/xt-libsci/10.3.0/fftw_wisdom into the luster directory from which the
executable is run from.
LibGoto 1.26 includes enhanced BLAS performance. There are several libsci
library variants installed with the libsci-10.3.0 package.
To use threaded BLAS, the thread-enabled libsci library whose name is
suffixed with '_mp' should be linked explicitly
e.g. ftn -o myexec -lsci_quadcore_mp
Dependencies:
=============
Libsci-10.3 and fftw-3.1.1 are now dependent. If you wish to use fftw
version 2.1.5 then do the following
module swap fftw/3.1.1 fftw/2.1.5.1
Description of new features in xt-libsci 10.3.0:
CRAFFT (Cray Adaptive FFT) is a new feature in libsci-10.3.0. CRAFFT uses
offline and online testing information to adaptively select the best FFT
algorithm from the available FFT options. CRAFFT provides a very simple
user interface into advanced FFT functionality and performance. Planning
and execution are combined into one call with CRAFFT. The library comes
packaged with pre-computed plans so that in many cases the planning stage
can be omitted. Please see the manual page intro_crafft for more information.
Usage note : for the most optimal usage of CRAFFT, please copy the file
/opt/xt-libsci/10.3.0/fftw_wisdom into the luster directory from which the
executable is run from.
LibGoto 1.26 includes enhanced BLAS performance. There are several libsci
library variants installed with the libsci-10.3.0 package.
To use threaded BLAS, the thread-enabled libsci library whose name is
suffixed with '_mp' should be linked explicitly
e.g. ftn -o myexec -lsci_quadcore_mp
Dependencies:
=============
Libsci-10.3 and fftw-3.1.1 are now dependent. If you wish to use fftw
version 2.1.5 then do the following
module swap fftw/3.1.1 fftw/2.1.5.1
Scheduled maintenance for hexagon on Aug. 18th
Monday August 18th at 14:00, Hexagon will be unavailable for approx. two hours, while an upgrade of the firmware on the /home file system is installed, this re-flash is necessary due to a failed firmware flash during the last maintenance window.
We are sorry about any inconvenience, and the short notice.
Update: Upgraded has been postponed to 16:00.
Update: After another delay the machine is taken down at 16:50
Update, 18:50, hexagon is now up again, but unavailable for users while checking the system.
Update, 19:15, hexagon is now available for users.
We are sorry about any inconvenience, and the short notice.
Update: Upgraded has been postponed to 16:00.
Update: After another delay the machine is taken down at 16:50
Update, 18:50, hexagon is now up again, but unavailable for users while checking the system.
Update, 19:15, hexagon is now available for users.
Updated software on hexagon
Since the last big software update on June 16th several libraries and programs have been updated.
MPT (MPI) 3.0.2
pgi 7.2.3
pathscale 3.2
CrayPat 4.3.1
libfast 1.0 (new library with some optimized math functions)
fftw 2.1.5.1
PAPI 3.6
Totalview 8.4.1b
gcc 4.2.4 (only for login-node programs)
xt-asyncpe 1.0c (new compiler wrappers)
xt-binutils-quadcore 2.0.1 (binutils for AMD quadcore)
Moab 5.2.3 scheduler (remember to log out and in again)
Users will need to log out and in again to get the above as default modules.
Because all applications that run on the compute nodes are statically compiled, we encourage re-compiling of applications and libraries, especially if you have experienced problems.
MPT (MPI) 3.0.2
pgi 7.2.3
pathscale 3.2
CrayPat 4.3.1
libfast 1.0 (new library with some optimized math functions)
fftw 2.1.5.1
PAPI 3.6
Totalview 8.4.1b
gcc 4.2.4 (only for login-node programs)
xt-asyncpe 1.0c (new compiler wrappers)
xt-binutils-quadcore 2.0.1 (binutils for AMD quadcore)
Moab 5.2.3 scheduler (remember to log out and in again)
Users will need to log out and in again to get the above as default modules.
Because all applications that run on the compute nodes are statically compiled, we encourage re-compiling of applications and libraries, especially if you have experienced problems.