Deep neural networks (DNNs) are pervasive across various domains, with inference requests often generated at the network edge, where resources are limited and energy efficiency is critical. Techniques like Post-Training Quantization (PTQ) also emerged to facilitate inference at the edge, trading off resource demand with accuracy. However, running inference entirely on devices can lead to high latency and excessive battery drain, while executing it exclusively in the cloud introduces communication delays and may result in a significant environmental impact. As such, inference tasks must carefully exploit both edge and cloud computing resources, leveraging DNN model splitting (or partitioning). In this work, we present a multi-objective optimization problem to distribute DNN model inference across the edge–cloud continuum while integrating PTQ. We develop a prototype architecture to profile DNN models and the underlying computing infrastructure, and we address the issue of estimating quantization noise. Evaluated on YOLO11 vision models, our approach achieves significant reductions in both inference times and energy consumption (up to 30% for both metrics) compared to device-only inference execution.

Nicosanti, S., Russo Russo, G., Cardellini, V. (2026). Energy- and quantization-aware DNN partitioning in the edge-cloud continuum (work in progress paper). In ICPE Companion '26: companion of the 17th ACM/SPEC International Conference on Performance Engineering (pp.47-54). New York : ACM [10.1145/3777911.3801106].

Energy- and quantization-aware DNN partitioning in the edge-cloud continuum (work in progress paper)

Nicosanti, Simone;Russo Russo, Gabriele;Cardellini, Valeria
2026-05-03

Abstract

Deep neural networks (DNNs) are pervasive across various domains, with inference requests often generated at the network edge, where resources are limited and energy efficiency is critical. Techniques like Post-Training Quantization (PTQ) also emerged to facilitate inference at the edge, trading off resource demand with accuracy. However, running inference entirely on devices can lead to high latency and excessive battery drain, while executing it exclusively in the cloud introduces communication delays and may result in a significant environmental impact. As such, inference tasks must carefully exploit both edge and cloud computing resources, leveraging DNN model splitting (or partitioning). In this work, we present a multi-objective optimization problem to distribute DNN model inference across the edge–cloud continuum while integrating PTQ. We develop a prototype architecture to profile DNN models and the underlying computing infrastructure, and we address the issue of estimating quantization noise. Evaluated on YOLO11 vision models, our approach achieves significant reductions in both inference times and energy consumption (up to 30% for both metrics) compared to device-only inference execution.
ACM/SPEC International Conference on Performance Engineering (ICPE '26)
Florence, Italy
2026
17
Rilevanza internazionale
contributo
mag-2026
3-mag-2026
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
English
Deep learning; Edge computing; Model quantization
Intervento a convegno
Nicosanti, S., Russo Russo, G., Cardellini, V. (2026). Energy- and quantization-aware DNN partitioning in the edge-cloud continuum (work in progress paper). In ICPE Companion '26: companion of the 17th ACM/SPEC International Conference on Performance Engineering (pp.47-54). New York : ACM [10.1145/3777911.3801106].
Nicosanti, S; Russo Russo, G; Cardellini, V
File in questo prodotto:
File Dimensione Formato  
icpe2026.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 2.62 MB
Formato Adobe PDF
2.62 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/465925
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact