Energy- and quantization-aware DNN partitioning in the edge-cloud continuum (work in progress paper)

IRIS

Deep neural networks (DNNs) are pervasive across various domains, with inference requests often generated at the network edge, where resources are limited and energy efficiency is critical. Techniques like Post-Training Quantization (PTQ) also emerged to facilitate inference at the edge, trading off resource demand with accuracy. However, running inference entirely on devices can lead to high latency and excessive battery drain, while executing it exclusively in the cloud introduces communication delays and may result in a significant environmental impact. As such, inference tasks must carefully exploit both edge and cloud computing resources, leveraging DNN model splitting (or partitioning). In this work, we present a multi-objective optimization problem to distribute DNN model inference across the edge–cloud continuum while integrating PTQ. We develop a prototype architecture to profile DNN models and the underlying computing infrastructure, and we address the issue of estimating quantization noise. Evaluated on YOLO11 vision models, our approach achieves significant reductions in both inference times and energy consumption (up to 30% for both metrics) compared to device-only inference execution.

Nicosanti, S., Russo Russo, G., Cardellini, V. (2026). Energy- and quantization-aware DNN partitioning in the edge-cloud continuum (work in progress paper). In ICPE Companion '26: companion of the 17th ACM/SPEC International Conference on Performance Engineering (pp.47-54). New York : ACM [10.1145/3777911.3801106].

Energy- and quantization-aware DNN partitioning in the edge-cloud continuum (work in progress paper)

Nicosanti, Simone;Russo Russo, Gabriele;Cardellini, Valeria

2026-05-03

Abstract

Deep neural networks (DNNs) are pervasive across various domains, with inference requests often generated at the network edge, where resources are limited and energy efficiency is critical. Techniques like Post-Training Quantization (PTQ) also emerged to facilitate inference at the edge, trading off resource demand with accuracy. However, running inference entirely on devices can lead to high latency and excessive battery drain, while executing it exclusively in the cloud introduces communication delays and may result in a significant environmental impact. As such, inference tasks must carefully exploit both edge and cloud computing resources, leveraging DNN model splitting (or partitioning). In this work, we present a multi-objective optimization problem to distribute DNN model inference across the edge–cloud continuum while integrating PTQ. We develop a prototype architecture to profile DNN models and the underlying computing infrastructure, and we address the issue of estimating quantization noise. Evaluated on YOLO11 vision models, our approach achieves significant reductions in both inference times and energy consumption (up to 30% for both metrics) compared to device-only inference execution.

Scheda breve

Scheda completa

Scheda completa (DC)

	Nome del convegno
	
				ACM/SPEC International Conference on Performance Engineering (ICPE '26)
			
	Luogo del convegno
	
				Florence, Italy
			
	Anno del convegno
	
				2026
			
	Numero del convegno
	
				17
			
	Rilevanza del convegno
	
				Rilevanza internazionale
			
	Tipo di relazione
	
				contributo
			
	Data dell'intervento
	
				mag-2026
			
	Data di pubblicazione
	
				3-mag-2026
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/3777911.3801106
			
	Settore disciplinare dell'intervento (valido dal 09/05/2024)
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				Deep learning; Edge computing; Model quantization
			
	Tipologia
	
				Intervento a convegno
			
	Citazione
	
				Nicosanti, S., Russo Russo, G., Cardellini, V. (2026). Energy- and quantization-aware DNN partitioning in the edge-cloud continuum (work in progress paper). In ICPE Companion '26: companion of the 17th ACM/SPEC International Conference on Performance Engineering (pp.47-54). New York : ACM [10.1145/3777911.3801106].
			
	Tutti gli autori
	
						Nicosanti, S; Russo Russo, G; Cardellini, V
					
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
icpe2026.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.62 MB Formato Adobe PDF Visualizza/Apri	2.62 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/465925

Citazioni

ND

ND

ND

social impact