Hexagon will have a scheduled maintenance starting at Tuesday September 28th 12:00. Initial estimate of downtime is 6 hours. This note will be updated when we have more information.
We will upgrade to the latest Cray software release and replace some hardware.
Note that a reservation is set in the queue system. Jobs must have a walltime set so that they may finish before the maintenance to be allowed to start.
Update: New HW bug have been discovered, preliminary maintenance is moved to September 28th 12:00.
Update: Due to important security bug we have to do maintenance as soon as possible, Sept 17 10:30. Actual shutdown of machine was 11:00.
Update: 16:00 Maintenance have been performed. Machine is available for users.
Author Archives: lsz075
Hexagon: power spike causes power off
There was a power spike / drop in the building causing hexagon to power off.
We are looking into it.
Update 16:30, The power spike also caused cooling issues. This means that we have to keep the machine off until the cooling can be fixed.
Update 21:00, 2nd cooling machine has been started again and machine is now running.
We are looking into it.
Update 16:30, The power spike also caused cooling issues. This means that we have to keep the machine off until the cooling can be fixed.
Update 21:00, 2nd cooling machine has been started again and machine is now running.
Maintenance stop of /migrate and /bcmhsm tape-filesystems
There will be a maintenance stop to install new hardware in the tape-robot.
This will make /migrate, /bcmhsm and restore of backup files unavailable from Monday 26th 12:00 until around 15:00.
Any questions can be sent to support-uib@notur.no.
Update, 15:30: Restore and recall of files are now available again.
This will make /migrate, /bcmhsm and restore of backup files unavailable from Monday 26th 12:00 until around 15:00.
Any questions can be sent to support-uib@notur.no.
Update, 15:30: Restore and recall of files are now available again.
Hexagon: 10Gb network connection
Hexagon has got two login nodes with 10Gb network connection. We hope this upgrade will speed up transfers to and from hexagon.
These two nodes are dedicated only for file transfers. Please do not use
them for compiling or running applications.
We remind you that hexagon has HPN enabled OpenSSH. To get the best
possible transfer speeds please consider reading:
Data high performance tools on hexagon
Address to use: hexagon-ftp.bccs.uib.no
These two nodes are dedicated only for file transfers. Please do not use
them for compiling or running applications.
We remind you that hexagon has HPN enabled OpenSSH. To get the best
possible transfer speeds please consider reading:
Data high performance tools on hexagon
Address to use: hexagon-ftp.bccs.uib.no
Hexagon: Updated software/libraries
The following software have been updated on hexagon:
xt-mpt MPI
5.0.0 -> 5.0.1: Bug fixes
xt-libsci math lib
10.4.5 -> 10.4.6:
Bug fixes.
LibSci 10.4.6 includes new CRAFFT routines to compute
real-to-complex/complex-to-real distributed 3d FFTs of any size.
These routines are crafft_pd2z3d and crafft_pz2d3d. Also,
two routines, crafft_total_size_2d_r2c and crafft_total_size_3d_r2c,
were added to assist users in calculating the local size of
the distributed data on each process.
Users requiring more information on usage should see the
intro_crafft manpage.
xt-asyncpe compiler wrappers
Differences:
Both the content and logic of driver script generated INFO messages is changed.
INFO messages are handled as additions to "verbose" output and do not display
unless
1)the user specifies "-v",
2)the user specifies "-V" or
3)the user sets XTPE_INFO_MESSAGE_ON to some value.
Otherwise, beginning with xt-asyncpe 4.1.7
(release 4.1), the INFO messages are not displayed by default.
The old environment variable, XTPE_INFO_MESSAGE_OFF, which was
used to turn off INFO messages is deprecated at xt-asyncpe/4.1. A new
environment variable, XTPE_INFO_MESSAGE_ON, can be set to "something"
to make INFO messages display by default.
Modules
3.1.6.5 -> 3.1.6.6: Bug fixes
PGI compiler
10.5.0 -> 10.6.0: Bug fixes
Totalview debugger
8.8.0 -> 8.8.0a: Activate Replay Engine
Chapel
1.0.2 -> 1.1.1
See /opt/chapel/1.1.1/CHANGES for more information
xt-craypat Performance Tools
5.0.1 -> 5.1.0
* PAPI has been updated to 3.7.2.0.5
* Beginning with the 5.1 release, CrayPat includes license check
support through the FLEXnet license server. Sites installing the 5.1
performance tools software will need to obtain and install a license key
before use.
* New imbalance calculation in Call Tree (imbalance for all functions
except for those represented by MPI collective sync time is calculated
as MAX-AVE, sync time is calculated as AVE - MIN)
* Support for the following predefined trace groups has been added:
aio (functions that perform asynchronous IO)
adios (Adaptable I/O System API)
armci (Aggregate Remote Memory Copy)
chapel (Chapel language compile and runtime library API)
dmapp (Distributed Memory Application API for Gemini)
ga (Global Array API)
pblas (Parallel Basic Linear Alegbra Subroutines)
petsc (Portable Extensible Toolkit for Scientific Computation)
pgas (Parallel Global Address Space)
realtime (POSIX realtime extensions)
* Support for dynamically linked applications. Dynamically linked programs
can be instrumented and use all of the experiments and features that are
supported for statically linked programs.
* Back button in Cray Apprentice2 Call Tree display - accessed by right
clicking in the display background, allows the user to revert to previous
displays after filtering tree.
* Path to maximum load imbalance or "hot path" now highlighted in Cray
Apprentice2 Call Tree display
* Performance improvement to PE sort in Cray Apprentice2 Load Balance
display (off of Overview)
* Add program wallclock time added in Cray Apprentice2 caliper area for
files containing RTS data
* Faster load of initial data into Cray Apprentice2
xt-mpt MPI
5.0.0 -> 5.0.1: Bug fixes
xt-libsci math lib
10.4.5 -> 10.4.6:
Bug fixes.
LibSci 10.4.6 includes new CRAFFT routines to compute
real-to-complex/complex-to-real distributed 3d FFTs of any size.
These routines are crafft_pd2z3d and crafft_pz2d3d. Also,
two routines, crafft_total_size_2d_r2c and crafft_total_size_3d_r2c,
were added to assist users in calculating the local size of
the distributed data on each process.
Users requiring more information on usage should see the
intro_crafft manpage.
xt-asyncpe compiler wrappers
Differences:
Both the content and logic of driver script generated INFO messages is changed.
INFO messages are handled as additions to "verbose" output and do not display
unless
1)the user specifies "-v",
2)the user specifies "-V" or
3)the user sets XTPE_INFO_MESSAGE_ON to some value.
Otherwise, beginning with xt-asyncpe 4.1.7
(release 4.1), the INFO messages are not displayed by default.
The old environment variable, XTPE_INFO_MESSAGE_OFF, which was
used to turn off INFO messages is deprecated at xt-asyncpe/4.1. A new
environment variable, XTPE_INFO_MESSAGE_ON, can be set to "something"
to make INFO messages display by default.
Modules
3.1.6.5 -> 3.1.6.6: Bug fixes
PGI compiler
10.5.0 -> 10.6.0: Bug fixes
Totalview debugger
8.8.0 -> 8.8.0a: Activate Replay Engine
Chapel
1.0.2 -> 1.1.1
See /opt/chapel/1.1.1/CHANGES for more information
xt-craypat Performance Tools
5.0.1 -> 5.1.0
* PAPI has been updated to 3.7.2.0.5
* Beginning with the 5.1 release, CrayPat includes license check
support through the FLEXnet license server. Sites installing the 5.1
performance tools software will need to obtain and install a license key
before use.
* New imbalance calculation in Call Tree (imbalance for all functions
except for those represented by MPI collective sync time is calculated
as MAX-AVE, sync time is calculated as AVE - MIN)
* Support for the following predefined trace groups has been added:
aio (functions that perform asynchronous IO)
adios (Adaptable I/O System API)
armci (Aggregate Remote Memory Copy)
chapel (Chapel language compile and runtime library API)
dmapp (Distributed Memory Application API for Gemini)
ga (Global Array API)
pblas (Parallel Basic Linear Alegbra Subroutines)
petsc (Portable Extensible Toolkit for Scientific Computation)
pgas (Parallel Global Address Space)
realtime (POSIX realtime extensions)
* Support for dynamically linked applications. Dynamically linked programs
can be instrumented and use all of the experiments and features that are
supported for statically linked programs.
* Back button in Cray Apprentice2 Call Tree display - accessed by right
clicking in the display background, allows the user to revert to previous
displays after filtering tree.
* Path to maximum load imbalance or "hot path" now highlighted in Cray
Apprentice2 Call Tree display
* Performance improvement to PE sort in Cray Apprentice2 Load Balance
display (off of Overview)
* Add program wallclock time added in Cray Apprentice2 caliper area for
files containing RTS data
* Faster load of initial data into Cray Apprentice2
Problems with /bcmhsm and /migrate
We are experiencing problems with the tape robot. Until they not be resolved /bcmhsm and /migrate filesystems will be unavailable.
Update: the filesystems are available now. The /bcmhsm is almost full, please allow it to drain to the tape robot up to 50% before copying to it.
Update: the filesystems are available now. The /bcmhsm is almost full, please allow it to drain to the tape robot up to 50% before copying to it.
Hexagon: failed seastar in one module
Seastar failed in one module on hexagon and machine went down. We are working on the problem.
Update:08:45 machine is UP
Update:10:15 There was a problem with alps after reboot so jobs can't start. Now it is fixed.
Update:08:45 machine is UP
Update:10:15 There was a problem with alps after reboot so jobs can't start. Now it is fixed.
Fimm: job submission will be suspended on July, 2nd at 14:00 for 30 minutes
Fimm master node is going to be maintained on July, 2nd at 14:00 for 30 minutes.
During that time slot job submission system will be temporary suspended. It will be not possible for users to submit new jobs.
During that time slot job submission system will be temporary suspended. It will be not possible for users to submit new jobs.
Hexagon: another power spike
New power spike on hexagon. 8 cabinets went down. We are working to bring machine up ASAP.
Update: 15:33 Machine is up
Update: 15:33 Machine is up
Hexagon: failure on 4 cabinets
Hexagon has got failure on 4 cabinets and went down. We are working to start machine ASAP.
The possible reason is power spike/power outage.
Update: 10:09 Hexagon is up.
The possible reason is power spike/power outage.
Update: 10:09 Hexagon is up.