APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

IRIS

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments.

Ammendola, R., Biagioni, A., Frezza, O., Lo Cicero, F., Lonardo, A., Paolucci, P., et al. (2010). APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters. POS PROCEEDINGS OF SCIENCE.

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

Ammendola, R;Biagioni, A;Frezza, O;Lo Cicero, F;Lonardo, A;Paolucci, P;Petronzio, R;Rossetti, D;Salamon, A;Salina, G;Simula, F;Tantalo, N;Tosoratto, L;Vicini, P

2010-01-01

Abstract

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2010
			
	Status di pubblicazione
	
				Pubblicato
			
	Rilevanza
	
				Rilevanza internazionale
			
	Tipo
	
				Articolo
			
	Referee
	
				Comitato scientifico
			
	Settore disciplinare dell'articolo (valido fino a 24/06/2024)
	
				Settore FIS/02 - FISICA TEORICA, MODELLI E METODI MATEMATICI
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				High Energy Physics - Lattice (hep-lat); Distributed, Parallel, and Cluster Computing
			
	Altre informazioni significative
	
				https://arxiv.org/abs/1012.0253
			
	Citazione
	
				Ammendola, R., Biagioni, A., Frezza, O., Lo Cicero, F., Lonardo, A., Paolucci, P., et al. (2010). APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters. POS PROCEEDINGS OF SCIENCE.
			
	Tutti gli autori
	
						Ammendola, R; Biagioni, A; Frezza, O; Lo Cicero, F; Lonardo, A; Paolucci, P; Petronzio, R; Rossetti, D; Salamon, A; Salina, G; Simula, F; Tantalo, N; ...espandi
						
	Tipologia
	
				Articolo su rivista
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/193421

Citazioni

ND

3

ND

social impact