Launch Plugin API

Overview

This document describes the launch plugin that is responsible for launching a parallel task in SLURM and the API that defines them. It is intended as a resource to programmers wishing to write their own launch plugin.

const char plugin_name[]="launch SLURM plugin"

const char plugin_type[]="launch/slurm"

  • aprun—Use Cray's aprun command to launch tasks - used on Cray systems with ALPS installed.
  • poe—Use IBM's poe command to launch tasks - used on systems IBM's parallel environment (PE) installed.
  • runjob—Use IBM's runjob command to launch tasks - used on BlueGene/Q systems.
  • slurm—Use SLURM's default launching infrastructure

The programmer is urged to study src/plugins/launch/slurm/launch_slurm.c for a sample implementation of a SLURM launch plugin.

API Functions

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the SLURM init(), and the SLURM fini() is called before the system's _fini().

int launch_p_setup_srun_opt(char **rest)

Description:
Sets up the srun operation.

Arguments:
rest: extra parameters on the command line not processed by srun

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int launch_p_handle_multi_prog_verify(int command_pos)

Description:
Is called to verify a multi-prog file if verifying needs to be done.

Arguments:
command_pos: to be used with global opt variable to tell which spot the command is in opt.argv.

Returns:
1 if handled, or
0 if not.

int launch_p_create_job_step(srun_job_t *job, bool use_all_cpus, void (*signal_function)(int), sig_atomic_t *destroy_job)

Description:
Creates the job step.

Arguments:
job: the job to run.
use_all_cpus: choice whether to use all cpus.
signal_function: function that handles the signals coming in.
destroy_job: pointer to a global flag signifying if the job was canceled while allocating.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

launch_p_step_launch(srun_job_t *job, slurm_step_io_fds_t *cio_fds, uint32_t *global_rc)

Description:
Launches the job step.

Arguments:
job: the job to launch.
cio_fds: filled in io descriptors
global_rc: srun global return code.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int launch_p_step_wait(srun_job_t *job, bool got_alloc)

Description:
Waits for the job to be finished.

Arguments:
job: the job to wait for.
got_alloc: if the resource allocation was created inside srun.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int launch_p_step_terminate(void)

Description:
Terminates the job step.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void launch_p_print_status(void)

Description:
Gets the status of the job.

void launch_p_fwd_signal(int signal)

Description:
Sends a forward signal to any underlying tasks.

Arguments:
signal: the signal that needs to be sent.

Last modified 8 May 2014