On-board Computers (OBC) are at the centre of space-faring systems. They provide computational performance to the system with high availability and dependability. However, these systems typically consist of expensive, slow, fault-tolerant hardware to cope with errors or failures during a mission. Commercial-off-the-shelf (COTS) components offer higher performance but do not provide the fault-tolerance mechanisms. The ScOSA (Scalable On-board Computing for Space Avionics) architecture uses COTS and rad-hard components as a distributed system, with the advantage of providing more computing performance than current OBCs while maintaining the dependability properties.ScOSA uses a middleware to manage the COTS components as a distributed system of nodes, which, in the event of a node failure, mitigates the effects by reconfiguring the system to a configuration that excludes the failed node using a pre-determined configuration. These configurations are computed offline and have an exponentially growing memory usage depending on the number of nodes in the system, which limits the system's scalability. This paper presents an online reconfiguration algorithm as a solution to this scalability problem. Upon the occurrence of a node failure event, the online algorithm makes scheduling decisions at run-time, eliminating the need for pre-determined configurations. A novel online scheduling mechanism, consisting of six phases, which includes a combination of fault-tolerance, parallelism, and the use of the real-time state of the system, is a step towards higher dependability in distributed on-board computing. The online reconfiguration is evaluated by comparing it to the offline reconfiguration in terms of time and network traffic, showing that it is not only capable of generating configurations dynamically but also provides a solution to the scalability problem.

Te Hofsté, G., Lund, A., Ottavi, M., Lüdtke, D. (2024). Towards the online reconfiguration of a dependable distributed on-board computer. In B.S. Dietmar Fey (a cura di), Architecture of Computing Systems (pp. 127-141). Cham : Springer [10.1007/978-3-031-66146-4_9].

Towards the online reconfiguration of a dependable distributed on-board computer

Ottavi, M.;
2024-01-01

Abstract

On-board Computers (OBC) are at the centre of space-faring systems. They provide computational performance to the system with high availability and dependability. However, these systems typically consist of expensive, slow, fault-tolerant hardware to cope with errors or failures during a mission. Commercial-off-the-shelf (COTS) components offer higher performance but do not provide the fault-tolerance mechanisms. The ScOSA (Scalable On-board Computing for Space Avionics) architecture uses COTS and rad-hard components as a distributed system, with the advantage of providing more computing performance than current OBCs while maintaining the dependability properties.ScOSA uses a middleware to manage the COTS components as a distributed system of nodes, which, in the event of a node failure, mitigates the effects by reconfiguring the system to a configuration that excludes the failed node using a pre-determined configuration. These configurations are computed offline and have an exponentially growing memory usage depending on the number of nodes in the system, which limits the system's scalability. This paper presents an online reconfiguration algorithm as a solution to this scalability problem. Upon the occurrence of a node failure event, the online algorithm makes scheduling decisions at run-time, eliminating the need for pre-determined configurations. A novel online scheduling mechanism, consisting of six phases, which includes a combination of fault-tolerance, parallelism, and the use of the real-time state of the system, is a step towards higher dependability in distributed on-board computing. The online reconfiguration is evaluated by comparing it to the offline reconfiguration in terms of time and network traffic, showing that it is not only capable of generating configurations dynamically but also provides a solution to the scalability problem.
2024
Settore IINF-01/A - Elettronica
English
Rilevanza internazionale
Articolo scientifico in atti di convegno
Fault-Tolerance; On-board Computers; Embedded Systems; Reconfiguration; Middleware; Distributed Systems; Dependability; Self-Configuration; Self-Healing
Te Hofsté, G., Lund, A., Ottavi, M., Lüdtke, D. (2024). Towards the online reconfiguration of a dependable distributed on-board computer. In B.S. Dietmar Fey (a cura di), Architecture of Computing Systems (pp. 127-141). Cham : Springer [10.1007/978-3-031-66146-4_9].
Te Hofsté, G; Lund, A; Ottavi, M; Lüdtke, D
Contributo in libro
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/463826
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact