Snoring: A noise in defect prediction datasets

IRIS

In order to develop and train defect prediction models, researchers rely on datasets in which a defect is often attributed to a release where the defect itself is discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon as 'sleeping defects'. We call 'snoring' the phenomenon where classes are affected by sleeping defects only, that would be treated as defect-free until the defect is discovered. In this paper we analyze, on data from 282 releases of six open source projects from the Apache ecosystem, the magnitude of the sleeping defects and of the snoring classes. Our results indicate that 1) on all projects, most of the defects in a project slept for more than 20% of the existing releases, and 2) in the majority of the projects the missing rate is more than 25% even if we remove 50% of releases.

Ahluwalia, A., Falessi, D., Di Penta, M. (2019). Snoring: A noise in defect prediction datasets. In IEEE International Working Conference on Mining Software Repositories (pp.63-67). 1515 BROADWAY, NEW YORK, NY 10036-9998 USA : IEEE Computer Society [10.1109/MSR.2019.00019].

Snoring: A noise in defect prediction datasets

Ahluwalia A.;Falessi D.;Di Penta M.

2019-01-01

Abstract

In order to develop and train defect prediction models, researchers rely on datasets in which a defect is often attributed to a release where the defect itself is discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon as 'sleeping defects'. We call 'snoring' the phenomenon where classes are affected by sleeping defects only, that would be treated as defect-free until the defect is discovered. In this paper we analyze, on data from 282 releases of six open source projects from the Apache ecosystem, the magnitude of the sleeping defects and of the snoring classes. Our results indicate that 1) on all projects, most of the defects in a project slept for more than 20% of the existing releases, and 2) in the majority of the projects the missing rate is more than 25% even if we remove 50% of releases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Nome del convegno
	
				16th IEEE/ACM International Conference on Mining Software Repositories, MSR 2019
			
	Anno del convegno
	
				2019
			
	Organizzatore/i del convegno
	
				Association for Computing Machinery (ACM)
			
	Rilevanza del convegno
	
				Rilevanza internazionale
			
	Data di pubblicazione
	
				2019
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/MSR.2019.00019
			
	Settore disciplinare dell'intervento (valido fino a 24/06/2024)
	
				Settore ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				Dataset bias
Defect prediction
Fix-inducing changes
			
	Tipologia
	
				Intervento a convegno
			
	Citazione
	
				Ahluwalia, A., Falessi, D., Di Penta, M. (2019). Snoring: A noise in defect prediction datasets. In IEEE International Working Conference on Mining Software Repositories (pp.63-67). 1515 BROADWAY, NEW YORK, NY 10036-9998 USA : IEEE Computer Society [10.1109/MSR.2019.00019].
			
	Tutti gli autori
	
						Ahluwalia, A; Falessi, D; Di Penta, M
					
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
08816788.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 118.8 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	118.8 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/273900

Citazioni

ND

17

13

social impact