Processing Script Definitions

Overview

Processing script definitions are the entry point for the SDP to know what processing scripts it is allowed to run and with what processing block parameters. They are the bases of various validation steps within the Science Data Processor. These include, e.g., whether a script is compatible with a given SDP version, or whether the requested processing block parameters in the Assign Resources configuration string are defined correctly for the requested processing script version.

The script definitions (scripts.yaml file), together with their processing block parameter definitions (JSON schema files) are stored in Telescope Model Data, in the tmdata/ska-sdp/scripts directory, and are published in the Central Artefact Repository. Although fixed versions of the definitions are published, the SDP obtains this information directly from the master branch of the ska-sdp-script repository. This allows updates and bug-fixes in processing scripts to be published, and used in an already deployed SDP. This process results in quick releases and allows users to access the latest versions of processing scripts without the need to redeploy their SDP sub-system. This also follows the concept of processing scripts being released independently of the SDP.

Regressions in processing script definitions

However, the above set up also means that any change in script definitions on the master branch of the ska-sdp-script repository, will automatically affect “production” deployments. We need to control these changes and make sure we do not introduce any regressions into the definitions.

A Python script, in conjunction with a bash setup, has been developed to check whether there are any potential regressions in the tmdata/ska-sdp/scripts directory. It compares the existing script definitions with incoming ones (including the JSON files), detects changes, and ensures that no unwanted changes have been introduced.

If any potential regressions have been detected, the Python script will terminate with an error message, advising the developer on how to proceed. There are two options:

Fixing or reverting accidental changes in scripts.yaml or any of the JSON parameters files.
Making the updates by allowing these changes (adding the Allow Scripts Overwrite label to the Merge Request).

The process is automatically executed in GitLab CI as its own job, which fails by default if any progressions are detected. See Allowing changes for overwriting this behaviour.

Potential regressions

An unwanted change is a change that could affect the operation of the processing script in the SDP. In scripts.yaml, the combination of name and version (unique together) is compared between the main and incoming versions.

Types of Changes:

Changes only specify maximum sdp version (sdp_version) => No unwanted changes.
Changes in any other key (including minimum SDP version number or removal of a script) => Unwanted change detected.
Changes in a parameter JSON schema file that is referenced in scripts.yaml => Unwanted change detected.
Changes in a parameter JSON schema file that is not referenced in scripts.yaml
1. The parameters file is older than the latest version => Unwanted changes detected
2. The parameters file is newer than the latest in use => No unwanted changes detected
Removed a scripts version definition => Unwanted change detected

Allowing changes

In certain cases, these changes may be required. For example, errors may be discovered in the original specifications. In these cases there needs to be a way to bypass this check. This should be done intentionally in a way that is ephemerally linked to the current changeset.

The mechanism used to handle this is to execute the GitLab CI job with conditions. Specifically, if the merge request contains the label Allow Scripts Overwrite, the job is allowed to fail (i.e. it will not break the whole CI pipeline). However, please think carefully before applying this label. It should only be added when the merge request is fully ready to merge and after confirming that the changes are intentional. This ensures the label is applied only when the changes are fully approved, preventing accidental merges of unverified changes.

Step-by-step process

Detection of Unwanted Changes
- Identify the target branch to compare commits to:
  - If this is part of a merge request (MR), the target branch is the MR’s target branch.
  - In the default case, the repository’s default branch (master) is used.
Ingest Files from the Target and Incoming Branches
- Fetch scripts.yaml from the target branch.
- Fetch a list of parameter JSON files that were updated in the incoming branch.
Classify Changes based on various types of changes.
Notify user of any regressions

Running the checks locally

As a developer, it is important to be warned about potential errors before pushing code to the repository, thus avoiding pipeline failures. To facilitate this, make target is provided to run these checks on your local machine. This is implemented in the Makefile and can be executed using the following command:

make run-regression-test

Note

This make target will create a tmp directory (together with a diffs_in_config.txt file) to store a list of JSON files, which have been changed. The directory is removed as a last step of the command run.