Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The near-RT RIC (RIC for short) Self-Health-Check flow fulfills flows fulfill the requirement that all systems need to monitor their own health – internal subsystems, hosted software, and external interfaces. 

Internal Self-Check - At configurable intervals, the RIC is to trigger Health-Check requests to its internal common platform modules and hosted xAPPs.   Each platform module and each xAPP Platform modules and xAPPs are required to support Health-Check requests and to perform a self-check. 

Alarms and Notifications - Based on Health-Check results, the RIC is required to maintain a list of anomaly conditions - alarms and alerts - that represent alarms which represents the state of its the overall RIC health.  Alarm /Alert conditions are to be raised and sent as notifications.

...

The RIC is responsible to check the health of RIC Platform modules and xAPP instances hosted on the RIC.  Specific requirements are as follows: 

  • Ability to internally initiate Healthself-checks on each of the common platform modules within the RIC (e.g., .  Examples of platform modules are: O1 Termination, A1 Mediator, E2 Termination, E2 Manager, xAPP Manager, Subscription Manager, etc.).
  • Internal self-checks are to be done at default intervals.  Intervals are to be configurable during run-time.  
  • Each platform module is required to support a health-check requestrequests.  Initially, the modules may simply need to send a response message to indicate that the connectivity is still up and messaging pathway still operational.  (In later releases, additional diagnostics may be needed to ensure RIC lifecycle management is robust and carrier-grade.)  
  • Self-check results on platform modules are to be logged.

Implementation Option[1]: The self-check can potentially leverage Kubernetes Liveness and Readiness probes. Liveness probes can be configured to execute a command against the pod (and/or open TCP socket, issue a http-get, and open a TCP socket against the container/pod).   Readiness probes can be configured to ensure the pod is ready before allowing it handle traffic.  To further check a module’s (pod) ability to communicate with other modules over RMR (RIC Message Router), each module could subscribe to its own topic, send a hello-world message regularly to itself and ensure it can send and receive messages.

[1] Implementation options are suggested at the use case level, to be further refined/finalized during user stories phase.

Near-RT RIC angle: RIC-139 for platform parts.

Alarms, Clearings and Notifications

  • Anomaly conditions may be As anomaly conditions are encountered as part of the self-check process or during normal operation (e.g., cannot send message to another module/xApp via RMR).  For each anomaly condition, the RIC and/or that RIC the specific platform module/xAPP needs to determine the severity and whether it is they are mappable to an alarm type.  
  • New alarms are to be stored and captured as part of the alarm list of the RIC.
  •  If mappable to an alarm type, the RIC needs to declare the alarm condition.  
  • The alarm type definitions for platform modules and xAPPs should be consistent with 3GPP TS 28.545 Fault Supervision technical specification.  
  • New alarms declared Alarms found in either case (self-check or normal operation) require a notification notifications to be sent immediately via the O1 VES interface, after verifying against the current RIC alarm list that the alarm is indeed new.
  • Similarly, the RIC needs to send alarm clearing notification over O1 VES when an alarm is not longer on the alarm list. 
  • RMR
  • Health of xAPPs
    • Ability of RIC to invoke Health-check requests to each of the xAPP instances deployed on the RIC [9]
    • Ability of each xAPP to perform Health-checks on itself and respond back to the RIC [10]
      • Implementation Option: See Implementation Option above for platform modules.
    • Any alarm/alert conditions or clearing of alarms/alerts are sent immediately via the O1 VES interface. [7-8]
    • normalize alarms conditions

External Interfaces

For external interfaces, the RIC is responsible to check its interface functions - O1 Termination, A1 Mediator, and E2 Termination modules.  

In addition, heartbeats or keep-alive signals over O1 are verified by the NB clients.

The RIC also checks heartbeat message come from RAN resources over the E2 interface.

Note: Since the role of RIC is to enable near realtime control loop actions, latency is an important set of telemetry to be collected and reported - E2 latency and RIC processing latency.  As RIC matures release over release, latency telemetry should be defined and implemented.

...

  • Any alarm/alert conditions or clearing of alarms/alerts are sent immediately via the O1 VES interface. [16]

...

  • New alarms are to be stored and captured as part of the alarm list of the RIC.  To support queries from NB clients for outstanding alarms via O1 Netconf interface, the RIC needs to make the alarm list available in the yang operational tree.  The yang model may need to be updated to support the alarm queries.  
    • Since alarms are sent as VES events over O1 VES, a mapping or translation function between VES alarms and Netconf/Yang model might needed. 
  • Similarly, the RIC needs to identify alarms that are not longer present by comparing self-check results against the current alarm list.  Any cleared alarms need to be removed from the RIC alarm list and clearing notifications sent over O1 VES. 

Near-RT RIC angle: RIC-56 for alarms

Health-Check Function

To support these flows

...

To support this flow, a new Health-Check functional block within the RIC is being proposed, which .  This Health-Check functional block can be implemented as a separate software module, as a distributed function across one or more existing modules, and/or as existing capabilities already available from the underlying container infrastructure such as Kubernetes' container/pod lifecycle management.  The Health-Check functional block has to perform the following:

  • Perform Trigger health-checks on the common RIC platform functions/modules and on xAPP instances hosted on the RIC (self-checks at configured intervals and on-demand requests)
  • Map failures and anomalies to alarms and alertsalarms 
  • Send out notifications for alarms and alertsnew alarms 
  • Determine the state of the RIC based on alarms and alertsalarms 
  • Store Log health-check results and update alarm list for queries
  • Clear alarms and alerts when conditions clear

Near-RT RIC angle: RIC-56 for alarms, incl list and notifications

The sequence diagram Figure 1 below shows the flow of RIC Self-Checks – regular heartbeats over O1 and A1, the Health-Check Module initiating health:

  • Alarm or alarm clear notification over O1VES as platform modules and xAPPs encounter anomalies or failure in their operation (apart from self-checks).
  • Health-Check function initiating self-check requests within the RIC to assess its overall health, and issuing alarms

...

  • , as appropriate based on health results.


PlantUML Macro
border3
aligncenter
titleRIC Self-Check
@startuml
Autonumber
Skinparam sequenceArrowThickness 2
skinparam ParticipantPadding 5
skinparam BoxPadding 10

Box SMO #gold
    Participant SMO_O1 as “O1” <<OAM>>
    Participant RPGE as “Non-RT RIC” <<NONRTRIC>>
End box

Box “O-RAN RIC” #lightpink
    Participant A1TERM as “A1 MED” <<RIC>>
    Participant O1TERM as “O1 TERM” <<RIC>>
    Participant HC as "HealthCk Function" <<RIC>>
    Participant MOD as "Platform Modules" <<RIC>>
    Participant xAPP as “xAPPs” <<RICAPP>>
    Participant E2SIME2TERM as “E2 Node” <<SIM>>
End box TERM” <<RIC>>
End box

Box “O-RAN Managed Function (MF)” #lightpink
    Participant MF as “Managed Function” <<MF>>
End box
Note over MF : MF = O-CU, O-DU

=== Alarms/Alerts from individual RIC Platform Modules and xAPPs ==
Note over RPGE,HC
Note: Apart from Health-Checks, a platform module or xAPP may generate an alarm/alert (or
its clearing) when it encounters a failure/error (e.g., failure to reach another module)
End Note

MOD -> O1TERM : Platform Module Alarm/Alert/Clear
O1TERM -> SMO_O1 : <<O1VES>> Alarm/Alert or Clear
xAPP -> O1TERM : xAPP Alarm/Alert/Clear
O1TERM -> SMO_O1 : <<O1VES>> Alarm/Alert or Clear

=== RIC Self-Checks @ Regular Intervals ==

Note over A1TERM,HC #lightsalmon
 Support HealthCheck Telemetry (FM, Heartbeat, PM)
End note

Note over HC 
 RIC Self-Checks Initiated
End note

Group loop for each Platform Module
HC -> MOD : Perform HealthCheck
Note Left 
 Support Platform Module HealthCheck
End Note
MOD -> HC : HealthCheck Status
End

HC -> O1TERM : Platform Module Alarm/Alert/Clear
O1TERM -> SMO_O1 : <<O1VES>> Alarm/Alert or Clear


Group loop for each xAPP instance deployed
HC -> xAPP : Perform HealthCheck
xAPP -> HC : HealthCheck Status
Note Left #lightsalmon
 Support xAPP HealthCheck
End note
End

HC -> O1TERM : xAPP Alarm/Alert/Clear
O1TERM -> SMO_O1 : <<O1VES>> Alarm/Alert or Clear

HCMF -> E2SIME2TERM : <<E2>> Generate Reportkeep-alive
Note Left #lightsalmon : Support E2 Test Message Processing
E2SIME2TERM -> HC : <<E2>> REPORT
HC -> HC :
Note Right : Calculate report size response times as PM measuresmissed heartbeat

HC -> O1TERM : E2 Alarm/Alert/Clear
O1TERM -> SMO_O1 : <<O1VES>> Alarm/Alert or Clear

HC -> O1TERM : Store allLog HC results & update \nalarm-list in yang model
Note Left #lightsalmon : Support Alarm Retrieval from SMO, Dashboard



Alt If HC is performed on-demand, make file available to client
Note over RPGE : Publish Results
HC -> O1TERM : Performance File Available
O1TERM -> SMO_O1 : <<O1VES>> HealthCheck Performance File Available
SMO_O1 -> O1TERM : <<O1>>Get PM Report File 
SMO_O1 -> SMO_O1 : Make PM File Available for Sharing

End


@enduml

Note 1:Salmon colored notes represent functions corresponding to Bronze release EPICs.  Yellow colored notes represent comments/notes or implicit functions.   

Note 2: The flows above assume Figure 1 above shows the flows assuming that SMO is the northbound client that triggers the near-RT RIC.  The SMO consists of the O1 OAM adapter (supporting both O1VES and O1NetConf related messages/data) and the non-RT RIC (containing the A1 adapter).  The O-RAN SC implementation of the flows associated with this Health-check use case should create a simulated SMO for invoking requests and processing responses.  The simulated SMO should also provide a Test Driver (shown in Figures 2-4) for initiating requests to SMO and receive response from SMO.  Alternatively, a Dashboard can also be the NB client to trigger these requests.

External Interfaces

For health of external interfaces, the RIC is responsible to check its interface module health and the reachability/connectivity by external systems.  For RIC interface modules - O1 Termination, A1 Mediator, and E2 Termination modules - the requirement is already described above as part of the RIC Self-Check.  For connectivity, the RIC needs to support heartbeats or keep-alive signals to ensure northbound (SMO-RIC including other NB clients) and southbound connectivity (RIC-RAN Resources).  

  • O1 interface - SMO-RIC connectivity over O1 Netconf and O1 VES is described in O1 RIC Health-Check (Flow #2)
  • A1 interface - SMO-RIC connectivity over A1 is not defined at this time as connectivity checks are done by NB clients (OTF or Test Driver) when making policy queries and policy creation/deletion.  See A1 RIC Health-Check (Flow #3).
  • E2 interface - RIC-RAN Managed Functions (O-CU, O-DU) over E2 is depicted in the diagram above.  For the initial implementation, the RIC's E2 Termination module checks keep-alive messages coming from RAN Managed Functions at regular intervals.  If the E2 Termination fails to receive keep-alive messages for a defined time period, it will declare an alarm on that E2 connection to the corresponding Managed Function.  

Latency

Since the role of RIC is to enable near realtime control loop actions, latency is an important set of telemetry to be collected and reported - E2 latency and RIC processing latency.  E2 latency includes telemetry that measures incoming messages/data to the RIC as well as outgoing messages from the RIC to Managed Functions.  RIC processing latency is telemetry that measures the RIC/xAPPs control loop processing time to receive messages, apply analytics, determine control loop actions and generate outgoing messages to RAN Managed Function over E2.  As the RIC matures release over release, latency telemetry must be defined and implemented.

Near-RT RIC angle: RIC-30, RIC-33, RIC-35, RIC-69 for latency measurements

Note: It may be appropriate to group xAPP health checks for a subset of xAPPs that have dependencies on each other.  But for the initial implementation, each xAPP is treated independently. [1] Implementation options are suggested at the use case level, to be further refined/finalized during user stories phase.