Classification, which means discrimination between examples belonging to different classes, is a fundamental aspect of most scientific and engineering activities. Machine Learning (ML) tools have proved to be very performing in this task, in the sense that they can achieve very high success rates. However, both "realism" and interpretability of their models are low, leading to modest increases of knowledge and limited applicability, particularly in applications related to nonlinear and complex systems. In this paper, a methodology is described, which, by applying ML tools directly to the data, allows formulating new scientific models that describe the actual "physics" determining the boundary between the classes. The proposed technique consists of a stack of different ML tools, each one applied to a specific subtask of the scientific analysis; all together they form a system, which combines all the major strands of machine learning, from rule based classifiers and Bayesian statistics to genetic programming and symbolic manipulation. To take into account the error bars of the measurements generating the data, an essential aspect of scientific inference, the novel concept of the Geodesic Distance on Gaussian manifolds is adopted. The properties of the methodology have been investigated with a series of systematic numerical tests for different types of classification problems. The potential of the approach to handle real data has been tested with various experimental databases, built using measurements collected in the investigations of complex systems. The obtained results indicate that the proposed method permits to find physically meaningful mathematical equations, which reflect the actual phenomena under study. The developed techniques therefore constitute a very useful information processing system to bridge the gap between data, machine learning models and scientific theories.

Murari, A., Gelfusa, M., Lungaroni, M., Gaudio, P., Peluso, E. (2022). A systemic approach to classification for knowledge discovery with applications to the identification of boundary equations in complex systems. ARTIFICIAL INTELLIGENCE REVIEW, 55(1), 255-289 [10.1007/s10462-021-10032-0].

A systemic approach to classification for knowledge discovery with applications to the identification of boundary equations in complex systems

Gelfusa, M;Lungaroni, M;Gaudio, P;Peluso, E
2022-01-01

Abstract

Classification, which means discrimination between examples belonging to different classes, is a fundamental aspect of most scientific and engineering activities. Machine Learning (ML) tools have proved to be very performing in this task, in the sense that they can achieve very high success rates. However, both "realism" and interpretability of their models are low, leading to modest increases of knowledge and limited applicability, particularly in applications related to nonlinear and complex systems. In this paper, a methodology is described, which, by applying ML tools directly to the data, allows formulating new scientific models that describe the actual "physics" determining the boundary between the classes. The proposed technique consists of a stack of different ML tools, each one applied to a specific subtask of the scientific analysis; all together they form a system, which combines all the major strands of machine learning, from rule based classifiers and Bayesian statistics to genetic programming and symbolic manipulation. To take into account the error bars of the measurements generating the data, an essential aspect of scientific inference, the novel concept of the Geodesic Distance on Gaussian manifolds is adopted. The properties of the methodology have been investigated with a series of systematic numerical tests for different types of classification problems. The potential of the approach to handle real data has been tested with various experimental databases, built using measurements collected in the investigations of complex systems. The obtained results indicate that the proposed method permits to find physically meaningful mathematical equations, which reflect the actual phenomena under study. The developed techniques therefore constitute a very useful information processing system to bridge the gap between data, machine learning models and scientific theories.
2022
Pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore FIS/01 - FISICA SPERIMENTALE
Settore ING-IND/18 - FISICA DEI REATTORI NUCLEARI
English
Machine learning tools
Data driven theory
Support vector machines
Symbolic regression
CART
Knowledge discovery
Boundary equations
Complex systems
Murari, A., Gelfusa, M., Lungaroni, M., Gaudio, P., Peluso, E. (2022). A systemic approach to classification for knowledge discovery with applications to the identification of boundary equations in complex systems. ARTIFICIAL INTELLIGENCE REVIEW, 55(1), 255-289 [10.1007/s10462-021-10032-0].
Murari, A; Gelfusa, M; Lungaroni, M; Gaudio, P; Peluso, E
Articolo su rivista
File in questo prodotto:
File Dimensione Formato  
s10462-021-10032-0.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 2.45 MB
Formato Adobe PDF
2.45 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/314318
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 5
social impact