Test Slurm Script

The test-slurm script is set up to deploy a single Slurm execution engine and then give it a job to complete.

Main slurm job submission

The script is designed to proceed as follows:

  • Claims processing block

  • Sets values for parameters from processing block

  • Sets processing block status to 'WAITING'

  • Waits for resources_available to be True

    • This is the signal from the processing controller that the script can start

  • Sets processing block status to 'RUNNING'

  • Deploys a single slurm execution engine with the parameters script (the default is a simple slurm script which prints ‘Hello Slurm’), tasks, nodes partition and current_working_directory

  • Waits for the processing block status to be 'FINISHED'

Full description of processing block parameters of this script can be found at the bottom of this page.

(Optional) Uploading and downloading files

Optionally, the script also supports downloading and uploading files from an S3 bucket on AWS to and from the PVC where the script is being deployed. Currently, we have an example of uploading the data by default. You can configure this via setting the “copy” parameter as true.

Processing block parameters

pydantic settings TestSlurmParams

test-slurm script parameters

Show JSON schema
{
   "title": "test-slurm",
   "description": "test-slurm script parameters",
   "type": "object",
   "properties": {
      "tasks": {
         "default": 1,
         "description": "Number of slurm tasks",
         "title": "Slurm tasks",
         "type": "integer"
      },
      "nodes": {
         "default": 1,
         "description": "Number of slurm nodes",
         "title": "Slurm nodes",
         "type": "integer"
      },
      "slurm_script": {
         "default": "#!/bin/bash echo 'Hello Slurm'",
         "description": "Script to run in slurm job",
         "title": "Slurm job script",
         "type": "string"
      },
      "partition": {
         "default": null,
         "description": "HPC cluster partition",
         "title": "HPC cluster partition",
         "type": "string"
      },
      "slurm_dir_prefix": {
         "default": "/shared/fsx1/shared",
         "description": "Prefix of the slurm working directory on the HPC cluster",
         "title": "Slurm working directory prefix",
         "type": "string"
      },
      "copy_data": {
         "default": false,
         "description": "Whether data should be copied to S3 from a k8s PVC",
         "title": "Migrate data to S3",
         "type": "boolean"
      },
      "local_path": {
         "default": null,
         "description": "The path to data to be copied to S3, relative to the mount directory on the PVC",
         "title": "Path on the PVC to data to be copied",
         "type": "string"
      },
      "s3_bucket": {
         "default": "skao-sdp-testdata",
         "description": "The name of the S3 bucket where the dataare copied to from the PVC local_path",
         "title": "S3 bucket name to copy data to",
         "type": "string"
      }
   },
   "additionalProperties": false
}

Config:
  • strict: bool = True

  • extra: str = forbid

  • arbitrary_types_allowed: bool = False

  • validate_assignment: bool = True

  • title: str = test-slurm

Fields:
field copy_data: bool = False

Whether data should be copied to S3 from a k8s PVC

field local_path: str = None

The path to data to be copied to S3, relative to the mount directory on the PVC

field nodes: int = 1

Number of slurm nodes

field partition: str = None

HPC cluster partition

field s3_bucket: str = 'skao-sdp-testdata'

The name of the S3 bucket where the dataare copied to from the PVC local_path

field slurm_dir_prefix: str = '/shared/fsx1/shared'

Prefix of the slurm working directory on the HPC cluster

field slurm_script: str = "#!/bin/bash echo 'Hello Slurm'"

Script to run in slurm job

field tasks: int = 1

Number of slurm tasks

Changelog

0.2.2

  • use ska-sdp-scripting 2.0.0 (MR360)

0.2.1

  • use ska-sdp-scripting 1.1.2 (MR295)

0.2.0

  • Add additional parameters and update README (MR302)

  • Add the usage of copying data chart to the script (MR270)

0.1.0

0.0.0

  • Initial version of the test slurm script (MR223)