Processing Script Definitions
Overview
Processing script definitions are the entry point for the SDP to know what processing scripts it is allowed to run and with what processing block parameters. They are the bases of various validation steps within the Science Data Processor. These include, e.g., whether a script is compatible with a given SDP version, or whether the requested processing block parameters in the Assign Resources configuration string are defined correctly for the requested processing script version.
The script definitions (scripts.yaml file), together with their processing block
parameter definitions (JSON schema files) are stored in Telescope Model Data, in the
tmdata/ska-sdp/scripts directory,
and are published in the Central Artefact Repository.
Although fixed versions of the definitions are published, the SDP obtains this
information directly from the master branch of the ska-sdp-script repository.
This allows updates and bug-fixes in processing scripts to be published, and used
in an already deployed SDP. This process results in quick releases and allows users
to access the latest versions of processing scripts without the need to redeploy
their SDP sub-system. This also follows the concept of processing scripts being
released independently of the SDP.
Regressions in processing script definitions
However, the above set up also means that any change in script definitions on the master
branch of the ska-sdp-script repository, will automatically affect “production”
deployments. We need to control these changes and make sure we do not introduce any
regressions into the definitions.
A Python script, in conjunction with a bash setup, has been developed to check whether there are any
potential regressions in the tmdata/ska-sdp/scripts directory. It compares the existing script
definitions with incoming ones (including the JSON files), detects changes,
and ensures that no unwanted changes have been introduced.
If any potential regressions have been detected, the Python script will terminate with an error message, advising the developer on how to proceed. There are two options:
Fixing or reverting accidental changes in
scripts.yamlor any of the JSON parameters files.Making the updates by allowing these changes (adding the
Allow Scripts Overwritelabel to the Merge Request).
The process is automatically executed in GitLab CI as its own job, which fails by default if any progressions are detected. See Allowing changes for overwriting this behaviour.
Potential regressions
An unwanted change is a change that could affect the operation of the processing script in the SDP.
In scripts.yaml, the combination of name and version (unique together) is
compared between the main and incoming versions.
Types of Changes:
Changes only specify maximum sdp version (
sdp_version) => No unwanted changes.Changes in any other key (including minimum SDP version number or removal of a script) => Unwanted change detected.
Changes in a parameter JSON schema file that is referenced in
scripts.yaml=> Unwanted change detected.Changes in a parameter JSON schema file that is not referenced in
scripts.yamlThe parameters file is older than the latest version => Unwanted changes detected
The parameters file is newer than the latest in use => No unwanted changes detected
Removed a scripts version definition => Unwanted change detected
Allowing changes
In certain cases, these changes may be required. For example, errors may be discovered in the original specifications. In these cases there needs to be a way to bypass this check. This should be done intentionally in a way that is ephemerally linked to the current changeset.
The mechanism used to handle this is to execute the GitLab CI job with conditions. Specifically, if the merge request contains
the label Allow Scripts Overwrite, the job is allowed to fail (i.e. it will not break the whole CI pipeline).
However, please think carefully before applying this label.
It should only be added when the merge request is fully ready to merge and after confirming that the changes are intentional.
This ensures the label is applied only when the changes are fully approved, preventing accidental merges of unverified changes.
Step-by-step process
Detection of Unwanted Changes
Identify the target branch to compare commits to:
If this is part of a merge request (MR), the target branch is the MR’s target branch.
In the default case, the repository’s default branch (
master) is used.
Ingest Files from the Target and Incoming Branches
Fetch
scripts.yamlfrom the target branch.Fetch a list of parameter JSON files that were updated in the incoming branch.
Classify Changes based on various types of changes.
Notify user of any regressions
Running the checks locally
As a developer, it is important to be warned about potential errors before pushing
code to the repository, thus avoiding pipeline failures. To facilitate this, make target
is provided to run these checks on your local machine. This is implemented in
the Makefile and can be executed using the following command:
make run-regression-test
Note
This make target will create a tmp directory (together with a diffs_in_config.txt file)
to store a list of JSON files, which have been changed. The directory is removed as
a last step of the command run.