.. _data_processing: Data Processing =============== The SDP processes raw data and generates various data products via science pipeline workflows. The execution of these workflows is orchestrated by processing scripts (script and workflow together are referred to as a pipeline). An operator tells the SDP the details of what kind of processing is required via the AssignResources configuration string ``processing_blocks`` entry. Here, they can choose the processing script to run and its parameters, and the script will then make sure that the chosen workflow(s) are executed if the right resources are available. The full list of available scripts can be found at the :external+ska-sdp-script:ref:`Processing Script Documentation pages `. Currently, two observation-ready scripts are being developed: - :external+ska-sdp-script:doc:`Visibility Receive `: It deploys one or more receivers to obtain data from the Correlator Beamformer (CBF). It also deploys different processors to process the incoming raw data, e.g. to write into MeasurementSets or to show them on the QA display. - :ref:`Pointing Offset `: It deploys the :external+ska-sdp-wflow-pointing-offset:doc:`pointing offset calibration pipeline ` to perform near-realtime pointing calibration. The pointing offset results are published and used for correcting dish pointing throughout an observation. Real-time workflows ------------------- Real-time and near-realtime workflows are executed while the telescope is observing. These include capturing of correlator data, pre-processing, and calibration activities. A real-time workflow runs while the telescope is scanning and records science data products. Data may flow through the system continuously (including during READY state for calibration), but are only saved as a data product when there is an active scan (the subarray's observing state is SCANNING). A near-real-time workflow runs after one ore more scans have finished, but the execution block has not been terminated yet. Currently available workflows: - `Receiver `_: part of the :external+ska-sdp-realtime-receive-modules:doc:`Realtime Receive Modules ` package. Receives data streams and stores them in the `plasma store `_. Various consumers (also deployed alongside the receiver) read the data from the plasma store and process them. - :external+ska-sdp-realtime-calibration:doc:`RCAL `: real-time calibration pipeline deployed as a processor along-side the receiver. It generates gain calibration solutions to be used by the beamformer. - :external+ska-sdp-wflow-pointing-offset:doc:`Pointing calibration `: It runs in a near-realtime manner, meaning that it does not process data via the plasma store, but only once all of the relevant data are available in MeasurementSets. It generates pointing offset calibration solutions used for correcting dish pointing before and during science observations. Batch workflows --------------- Batch processing happens when an observation (execution block) has finished and all of the necessary resources for processing are available (i.e. storage, CPU/GPU, memory). These pipelines perform calibration and imaging. Some of the batch processing is handled by a Slurm cluster. When a processing script requests a slurm script to be executed, the Slurm Deployer will submit the job to the Slurm cluster and monitor its state. The ``slurmdeploy`` entry in ``charts/ska-sdp/values.yaml`` is used to configure the Slurm Deployer. For a test processing script that runs a slurm job, check out `the test-slurm script `__. .. note:: The Slurm API entrypoint is currently configured to point to the one on the DP AWS HPC Cluster. Only in versions of SDP 1.2.0 and above is the entry point up to date. If you are using a previous version, or a different entrypoint, make sure to change this in your own deployment's ``values.yaml`` file. See :ref:`inst_slurmdpl`. Workflows that are currently being developed: - Self-calibration: - :external+ska-sdp-wflow-selfcal:doc:`Documentation ` - `Repository `__ - Distributed self-calibration prototype: - :external+ska-sdp-distributed-self-cal-prototype:doc:`Documentation ` - `Repository `__ - Continuum imaging: - :external+ska-sdp-continuum-imaging-pipeline:doc:`Documentation ` - `Repository `__ - Spectral line imaging: - :external+ska-sdp-spectral-line-imaging:doc:`Documentation ` - `Repository `__ - Batch pre-processing: - :external+ska-sdp-batch-preprocess:doc:`Documentation ` - `Repository `__ In addition, a detailed description of current and future pipelines can be found in the `SKA Solution Intent `_.