To use checkpointing feature application must be compiled with blcr and Cray MPT version 3.0.1 and up:
module load blcr
With loaded module all necessary options will be automatically added to the compiler wrapper. Only MPI and SHMEM programming models are supported.
Job script must have at least the following parameter:
#PBS -c enabled
See man qsub for more parameters.
To checkpoint and hold the job user executes:
qhold JOBID
To continue:
qrls JOBID
The Cray checkpoint/restart solution uses BLCR software from Berkley Lab's and inherits its limitations. For more information, refer to the BLCR documentation: http://upc-bugs.lbl.gov/blcr/doc/html/index.html.