Probability density estimation is an essential task not only in statistics but also in machine learning and many applied scientific disciplines. Traditional methods to derive the probability density function from data belong to either of two classes: parametric or non-parametric techniques. The methodology developed in the present work overcomes this dichotomy by combining original deep learning tools with advanced symbolic regression. In the first step, neural networks with a specific architecture, called PDF-Nets, are trained to predict the local pdf given a set of observations using a self-tuning inhomogeneous binning approach. In the second step, symbolic regression extracts the mathematical expression exploiting the PDF-Net results. Various improvements of symbolic regression for this task have been devised to overcome the weaknesses of the available algorithms, more oriented towards classification and regression. The final result is an analytic expression describing the pdf with minimal prior constraints about its mathematical form. The performances of the developed tools, substantiated by a series of systematic numerical tests, prove that they are very competitive with the most advanced available techniques in terms of accuracy and that their competitive advantages become more evident the sparser the data and the higher the level of noise affecting the data.
Murari, A., Rossi, R., Wyss, I., Puleio, A., Rutigliano, N., Gaudio, P., et al. (2026). Probability density estimation beyond the dichotomy Parametric/Non-Parametric methods. INFORMATION SCIENCES, 750 [10.1016/j.ins.2026.123538].
Probability density estimation beyond the dichotomy Parametric/Non-Parametric methods
Rossi, Riccardo;Wyss, Ivan;Puleio, Alessandro;Rutigliano, Novella;Gaudio, Pasquale;Gelfusa, Michela
2026-01-01
Abstract
Probability density estimation is an essential task not only in statistics but also in machine learning and many applied scientific disciplines. Traditional methods to derive the probability density function from data belong to either of two classes: parametric or non-parametric techniques. The methodology developed in the present work overcomes this dichotomy by combining original deep learning tools with advanced symbolic regression. In the first step, neural networks with a specific architecture, called PDF-Nets, are trained to predict the local pdf given a set of observations using a self-tuning inhomogeneous binning approach. In the second step, symbolic regression extracts the mathematical expression exploiting the PDF-Net results. Various improvements of symbolic regression for this task have been devised to overcome the weaknesses of the available algorithms, more oriented towards classification and regression. The final result is an analytic expression describing the pdf with minimal prior constraints about its mathematical form. The performances of the developed tools, substantiated by a series of systematic numerical tests, prove that they are very competitive with the most advanced available techniques in terms of accuracy and that their competitive advantages become more evident the sparser the data and the higher the level of noise affecting the data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


