Estimating the number of remaining links in traceability recovery

IRIS

Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs to be manually inspected. In this paper we introduce an approach called Estimation of the Number of Remaining Links (ENRL) which aims at estimating, via Machine Learning (ML) classifiers, the number of remaining positive links in a ranked list of candidate traceability links produced by a Natural Language Processing techniques-based recovery approach. We have evaluated the accuracy of the ENRL approach by considering several ML classifiers and NLP techniques on three datasets from industry and academia, and concerning traceability links among different kinds of software artifacts including requirements, use cases, design documents, source code, and test cases. Results from our study indicate that: (i) specific estimation models are able to provide accurate estimates of the number of remaining positive links; (ii) the estimation accuracy depends on the choice of the NLP technique, and (iii) univariate estimation models outperform multivariate ones. © 2016 Springer Science+Business Media New York

Falessi, D., Di Penta, M., Canfora, G., Cantone, G. (2017). Estimating the number of remaining links in traceability recovery. EMPIRICAL SOFTWARE ENGINEERING, 22(3), 996-1027 [10.1007/s10664-016-9460-6].

Estimating the number of remaining links in traceability recovery

Falessi, D;Di Penta, M;Canfora, G;CANTONE, GIOVANNI

2017-06-01

Abstract

Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs to be manually inspected. In this paper we introduce an approach called Estimation of the Number of Remaining Links (ENRL) which aims at estimating, via Machine Learning (ML) classifiers, the number of remaining positive links in a ranked list of candidate traceability links produced by a Natural Language Processing techniques-based recovery approach. We have evaluated the accuracy of the ENRL approach by considering several ML classifiers and NLP techniques on three datasets from industry and academia, and concerning traceability links among different kinds of software artifacts including requirements, use cases, design documents, source code, and test cases. Results from our study indicate that: (i) specific estimation models are able to provide accurate estimates of the number of remaining positive links; (ii) the estimation accuracy depends on the choice of the NLP technique, and (iii) univariate estimation models outperform multivariate ones. © 2016 Springer Science+Business Media New York

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				giu-2017
			
	Status di pubblicazione
	
				Pubblicato
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1007/s10664-016-9460-6
			
	Rilevanza
	
				Rilevanza internazionale
			
	Tipo
	
				Articolo
			
	Referee
	
				Esperti anonimi
			
	Settore disciplinare dell'articolo (valido fino a 24/06/2024)
	
				Settore ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				information retrieval; traceability link recovery; metrics and measurement
			
	Citazione
	
				Falessi, D., Di Penta, M., Canfora, G., Cantone, G. (2017). Estimating the number of remaining links in traceability recovery. EMPIRICAL SOFTWARE ENGINEERING, 22(3), 996-1027 [10.1007/s10664-016-9460-6].
			
	Tutti gli autori
	
						Falessi, D; Di Penta, M; Canfora, G; Cantone, G
					
	Tipologia
	
				Articolo su rivista
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
EMSE-D-15-00203-remarks_BP2.pdf solo utenti autorizzati Tipologia: Documento in Pre-print Licenza: Non specificato Dimensione 922.31 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	922.31 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/182574

Citazioni

ND

32

28

social impact