Accurate ultra-short-term photovoltaic (PV) power forecasting is critical for the secure and efficient operation of power systems with high levels of renewable generation. Rapid cloud-cover variability, however, induces large and sudden fluctuations in PV output, which existing models struggle to capture and for which they provide limited measures of uncertainty. In this work, we propose PV-MM-Diffusion, an end-to-end multimodal diffusion model that jointly generates multiple future sky images and corresponding PV power distributions via two coupled denoising autoencoders. A cross-modal attention mechanism fuses complementary information from sky imagery and historical PV time series, and an empty-frame placeholder strategy allows the model to operate with image-only, PV-only, or combined inputs. Experiments on a 30 kW rooftop system show that PV-MM-Diffusion substantially improves probabilistic forecasting: it achieves a continuous ranked probability score (CRPS) of 2.63 kW (an 18 % reduction relative to the SkyGPT→U-Net baseline) and a Winkler score (WS) of 21.46 kW (a 40 % reduction). The model delivers tighter and more reliable prediction intervals, especially during extreme ramp events and rapid cloud transitions. These results demonstrate the promise of diffusion-based multimodal frameworks for flexible, uncertainty-aware PV integration in future low-carbon power systems.
Huang, J., Shao, B., Ke, Y., Gao, Y., Mazzoni, S. (2026). PV-MM-diffusion: An end-to-end multi-modal diffusion model for ultra-short-term probabilistic photovoltaic forecasting. APPLIED ENERGY, 413 [10.1016/j.apenergy.2026.127816].
PV-MM-diffusion: An end-to-end multi-modal diffusion model for ultra-short-term probabilistic photovoltaic forecasting
Stefano Mazzoni
2026-01-01
Abstract
Accurate ultra-short-term photovoltaic (PV) power forecasting is critical for the secure and efficient operation of power systems with high levels of renewable generation. Rapid cloud-cover variability, however, induces large and sudden fluctuations in PV output, which existing models struggle to capture and for which they provide limited measures of uncertainty. In this work, we propose PV-MM-Diffusion, an end-to-end multimodal diffusion model that jointly generates multiple future sky images and corresponding PV power distributions via two coupled denoising autoencoders. A cross-modal attention mechanism fuses complementary information from sky imagery and historical PV time series, and an empty-frame placeholder strategy allows the model to operate with image-only, PV-only, or combined inputs. Experiments on a 30 kW rooftop system show that PV-MM-Diffusion substantially improves probabilistic forecasting: it achieves a continuous ranked probability score (CRPS) of 2.63 kW (an 18 % reduction relative to the SkyGPT→U-Net baseline) and a Winkler score (WS) of 21.46 kW (a 40 % reduction). The model delivers tighter and more reliable prediction intervals, especially during extreme ramp events and rapid cloud transitions. These results demonstrate the promise of diffusion-based multimodal frameworks for flexible, uncertainty-aware PV integration in future low-carbon power systems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


