Low CBF Alarms ============== Low CBF Alarms will be generated and configured via the Elettra ``AlarmHandler`` Tango device. Purpose ******* *Why do all control systems have alarms?* To bring an operator's attention to something that needs intervention within a short time. .. include:: alarms_vs_health.rst Alerts ------ Alerts are like an alarm that is not time critical. They probably go to a log for later viewing, rather than triggering flashing lights and sirens. These will be handled via ``AlarmHandler`` too. Its configuration will control whether a given scenario triggers an alarm or alert. Scope ***** Alarm management must be a multi-disciplined approach including operations and maintenance people, and is therefore impossible for the Perentie team to perform in isolation. Our scope is to provide the *ability* for alarms to be generated on as many diagnostic parameters as we think might be useful for successful commissioning, operation and maintenance. Alarm Implementation in Low CBF ******************************* Alarms will be generated and configured via the Elettra ``AlarmHandler`` Tango device. For the purpose of this document, "Low CBF Tango devices" means Tango devices written specifically for Low CBF. That's the Controller, Subarray, Connector, Processor, and Allocator (CNIC, being a test/debug tool, is not expected to have alarms). In contrast, the ``AlarmHandler`` is **not** a Low CBF device. Low CBF Tango devices have a range of Tango attributes for various purposes: internal control parameters, health monitoring, telescope operation monitoring, diagnostics, troubleshooting of faults, and so on. However, the Low CBF devices themselves will not raise alarms to an operator (or maintainer or anyone else). Alarms will be solely administered and controlled via ``AlarmHandler``. Configuration of ``AlarmHandler`` is outside the scope of this document. Processor Alarms ---------------- The following ``LowCbfProcessor`` health related Tango attributes have alarms associated with them. There are `rules `_ associated with these attributes which are used to configure ``AlarmHandler`` in order to present alarms to the operator using a relevant front end (GUI) system. ----------- .. note:: This is work in progress - more alarms may be added, existing alarms are subject to change. ----------- .. topic:: Hardware category When current values of the following Tango attributes fall outside normal operating parameters * ``hardware_fpga_temperature`` * ``hardware_fpga_power`` * ``hardware_hbm_temperature`` * ``hardware_power_supply_12v_voltage`` * ``hardware_power_supply_12v_current`` * ``hardware_pcie_12v_voltage`` * ``hardware_pcie_12v_current`` The operational limits are specified in a ``YAML`` file (see `LINK `_) .. code-block:: yaml monitoring_points: fpga_temperature: min: 5.0 max: 105.0 label: FPGA temperature units: "℃" fpga_power: min: 0.0 max: 150.0 label: FPGA power units: W hbm_temperature: min: 5.0 max: 100.0 label: HBM temperature units: "℃" power_supply_12v_voltage: min: 11.4 max: 12.6 label: 'AUX 12V supply voltage' units: V power_supply_12v_current: min: 0.0 max: 12.5 label: 'AUX 12V supply current' units: A pcie_12v_voltage: min: 11.4 max: 12.6 label: PCIe 12V supply voltage units: V pcie_12v_current: min: 0.0 max: 12.5 label: PCIe 12V supply current units: A .. topic:: Functional category +----------------------------+---------------------------------------------------------------------------------+ | Tango Attribute | Alarm condition | +============================+=================================================================================+ | ``function_driver_ok`` | When FPGA firmware is loaded and XRT (Xilinx) driver cannot read FPGA registers | +----------------------------+---------------------------------------------------------------------------------+ .. topic:: Processing category These alarms can get activated only while subarray containing ``LowCbfProcessor`` devices is in SCANNING mode. +-----------------------------------+-----------------------------------------------------------------------+ | Tango Attribute | Alarm condition | +===================================+=======================================================================+ | ``process_delay_poly_valid`` | When received delay polynomials are invalid | +-----------------------------------+-----------------------------------------------------------------------+ | ``process_delay_subscription_ok`` | When subscribed delay polynomials are not received in a timely manner | +-----------------------------------+-----------------------------------------------------------------------+ | ``process_spead_packets_ok`` | When SPS SPEAD packets are not arriving at FPGA input | +-----------------------------------+-----------------------------------------------------------------------+ Attributes & Quality -------------------- The set of attributes exposed by Low CBF devices will be static (Tango's dynamic attribute feature will not be used). This means that in some circumstances, there will be attributes that are not applicable to the current state of the system. Low CBF Tango devices use the Tango :py:class:`~tango.AttrQuality` mechanism to indicate when attributes are *VALID* for use, or *INVALID* (i.e. irrelevant because of current configuration or state). Alarms therefore must consider attribute quality in their evaluation. .. topic:: Processor Attributes Some Processor device attributes are specific to certain firmware personalities. These attributes will always be present on the Processor device, but will be marked as *INVLAID* if the relevant personality is not loaded. Refer to the :external+ska-low-cbf-proc:doc:`Processor device documentation ` for specific details.