Functionality
=============

The supported functionality of the scripting library is as follows.

Starting, monitoring and ending a script
----------------------------------------

At the start
^^^^^^^^^^^^

- Claim the processing block.
- Get the parameters defined in the processing block. They should be checked
  against the parameter schema defined for the script.


Resource requests
^^^^^^^^^^^^^^^^^

- Make requests for input and output buffer space. The script will
  calculate the resources it needs based on the parameters, then request them
  from the processing controller.

Declare script phases
^^^^^^^^^^^^^^^^^^^^^

- Scripts will be divided into phases such as preparation, processing,
  and clean-up. In the current implementation, only one phase can be
  declared, which we refer to as the 'work' phase.

Execute the work phase
^^^^^^^^^^^^^^^^^^^^^^

- On entry to the work phase, it creates the resource requests, any
  dependencies on data product flows and waits until the resources
  have been allocated and dependencies finished.
  Meanwhile it monitors the processing block to see it has been cancelled.
  For real-time scripts, it also checks if the execution block has been cancelled.
- Deploys execution engines to execute a script/function.
- Monitors the execution engines' deployment states and processing block state.
  Waits until the execution is finished, failed or the processing block is cancelled.
- Continuously updates the processing block state with the status of execution
  engine deployments. It provides aggregate information about these statuses
  to inform other components about the readiness of deployments.

At the end
^^^^^^^^^^

- Remove the execution engines to release the resources.
- Removes any resource requests still in place.
- Sets any FLOWING data-product flows to COMPLETED.
- Removes every, non data-product and non data-product-persist flows it added on entry.
- Update processing block state with information about the success or failure
  of the script.

Receive scripts
---------------

- Get IP and MAC addresses for the receive processes.
- Monitor receive processes. If any get restarted, then the addresses may need to be updated.
- Write the addresses in the appropriate format into the processing block state.

Compatibility with the schemas library
--------------------------------------

We keep the scripting library compatible with the latest version
of the :doc:`schemas library <ska-schemas:index>`.
If you use a configuration string that is based on an older version
of the schemas, you may experience errors or unexpected behaviour.

Data flow entries
-----------------

The :ref:`ProcessingBlock <procblock>` object provides methods to create
specific data flow entries in the Configuration Database.
These cover objects that provide configuration for data generation and
usage for different processes.

Currently, the following types can be created with the scripting library:

- ``data_product``: data products, e.g. Measurement Sets
- ``data_queue``: Kafka queue configuration
- ``qa_display``: QA Display configuration
- ``tango_attribute``: configuration with a tango attribute as a sink

On exit the scripting library sets the ``data_product`` and ``data_product_persist``
flow entries to completed, other flow types that the scripting library creates are simply removed.

Flow Monitoring and Phase Exit
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The script continuously monitors the execution block state. When the
execution block status becomes ``FINISHED`` or ``CANCELLED``, the scripting library monitors
all data product flows it created to see if their statuses have reached a final status:
``COMPLETED``, ``INCOMPLETE``, or ``FAILED``.
If all flows reach any of these final statuses within a configurable grace period, the script
triggers phase exit.

If the grace period expires and not all flows are in a final status, an error is logged
and the script continues to remove the pods as before.

If a script does not use any data product flows, phase exit is triggered immediately
when the execution block is ``FINISHED`` or ``CANCELLED``.


Processing Block parameter validation
---------------------------------------

The :ref:`ProcessingBlock <procblock>` class provides a ``validate_parameters``
method. This method retrieves the processing block parameters from the Configuration DB
(which were saved as part of the request set up by the Processing Controller and derived
from the AssignResources configuration string) and validates that information
against a user-defined Pydantic model. Usage:

.. code-block:: python

    from ska_sdp_scripting.processing_block import ProcessingBlock, ParameterBaseModel

    # example
    class MyPydanticModel(ParameterBaseModel):
        args: list
        kwargs: dict

    pb = ProcessingBlock()

    # returns the Pydantic class with loaded parameters
    parameters = pb.validate_parameters(model=MyPydanticModel)

It throws a Pydantic ValidationError if the validation fails, else
logs the success and returns.


Monitoring
----------

There are several avenues of monitoring in place which should help to pinpoint any errors,
deployment failures, or bugs you may come across.

In any of the following scenarios, the 'error_messages' key in the relevant processing block's
state will be updated with information to help you debug:

- any of the pods associated with the processing scripts and their deployments fail or fail to start.
- an error occurs in a processing script.
- an error occurs in an execution engine.

In every case except an error at the application level, the processing block state's status will
be set to FAILED. This, along with the presence of an error_messages value, will be picked up by
the subarray and reported appropriately there.

Added monitoring capability for slum deployment:

- Slurm deployments are only relevant for batch processing.
- Monitoring for slurm deployments only happens for errors.
- The state for slurm deployments contains the following keys:
   - num_job (equivalent to num_pod in Kubernetes)
   - jobs (equivalent to pods in Kubernetes)
   - `More Details: <https://confluence.skatelescope.org/pages/viewpage.action?spaceKey=SE&title=Slurm+Deployer+Schema l>`_
- Currently, slurm deployments do not contain the ``error_state`` key.
   - Implemented in a way that allows easy expansion when error_state is eventually added.
   - Default handling ensures that missing ``error_state`` does not break existing functionality.


Buffer Resource Request
-----------------------

The :ref:`ProcessingBlock <procblock>` object provides methods to create
specific resource requests for buffer storage, allowing processing scripts
to declare what storage is required before a job starts. This helps ensure
resources are allocated appropriately by the processing controller.

Currently, the following functionality is supported:

- `buffer_request`: generates a resource request for buffer storage
- Resource requests are committed to the Configuration Database during the
  `__enter__` phase of a :class:`~ska_sdp_scripting.phase.Phase` object, before any allocation checks are
  performed.
- A feature flag (`RESOURCE_MANAGEMENT_TOGGLE)` controls whether these requests are created.
  By default, this feature is disabled. If enabled, the scripting system generates
  resource requests for each buffer defined by the Processing Block.
- Resource requests are compared to allocations and only once allocated, the phase can
  complete successfully.
- The resource allocation is time bound and if the allocation is not completed within
  `TIMEOUT_RESOURCE_ALLOCATION` the allocation is considered unsuccessful.
- Once a processing script finishes running, it cleans up all the resource requests
  it generated. This is done via the `__exit__` method of the Phase class.


Multiple Phases
---------------

The :ref:`ProcessingBlock <procblock>` object provides methods to manage and track multiple
phases for a processing block. Phases represent logical steps in the processing script lifecycle,
such as staging, processing, and delivery, and are especially useful for scheduling and resource management.

Currently, the following phase states are supported by the scripting library:

- ``PLANNED``: phase is defined but not yet started
- ``ACTIVE``: phase is currently running
- ``FAILED``: phase has failed
- ``CANCELLED``: phase was cancelled before completion
- ``COMPLETED``: phase finished successfully

Phases are represented as dictionaries with the following keys:

- ``name``: the phase name (string)
- ``length``: expected duration in seconds (integer)
- ``state``: the current phase state (must be one of the valid states above)

To manage phases:

- Use :meth:`ProcessingBlock.add_phase` to add a phase to the internal list. This does not immediately
  update the Configuration Database.
- After adding all desired phases, call :meth:`ProcessingBlock.set_phases` to persist the current list
  of phases to the ConfigDB.
- To update an existing phase, use :meth:`ProcessingBlock.update_phase`, passing the phase dict and
  any fields to update (such as ``state`` or ``length``). This will validate and update the changes to the
  configuration database.