Defect categorization is the basis of many works that relate to software defect detection. The assumption is that different subjects assign the same category to the same defect. Because this assumption was questioned, our following decision was to study the phenomenon, in the aim of providing empirical evidence. Because defects can be categorized by using different criteria, and the experience of the involved professionals in using such a criterion could affect the results, our further decisions were: (i) to focus on the IBM Orthogonal Defect Classification (ODC); (ii) to involve professionals after having stabilized process and materials with students. This paper is concerned with our basic experiment. We analyze a benchmark including two thousand and more data that we achieved through twenty-four segments of code, each segment seeded with one defect, and by one hundred twelve sophomores, trained for six hours, and then assigned to classify those defects in a controlled environment for three continual hours. The focus is on: Discrepancy among categorizers, and orthogonality, affinity, effectiveness, and efficiency of categorizations. Results show: (i) training is necessary to achieve orthogonal and effective classifications, and obtain agreement between subjects, (ii) efficiency is five minutes per defect classification in the average, (iii) there is affinity between some categories.
Falessi, D., Cantone, G. (2008). Exploring feasibility of software defects orthogonal classification. In J. Filipe: B. Shishkov, M. Helfert (a cura di), Communications in Computer and Information Science (pp. 136-152). Heidelberg : Springer-Verlag [10.1007/978-3-540-70621-2_12].
Exploring feasibility of software defects orthogonal classification
FALESSI, DAVIDE;CANTONE, GIOVANNI
2008-01-01
Abstract
Defect categorization is the basis of many works that relate to software defect detection. The assumption is that different subjects assign the same category to the same defect. Because this assumption was questioned, our following decision was to study the phenomenon, in the aim of providing empirical evidence. Because defects can be categorized by using different criteria, and the experience of the involved professionals in using such a criterion could affect the results, our further decisions were: (i) to focus on the IBM Orthogonal Defect Classification (ODC); (ii) to involve professionals after having stabilized process and materials with students. This paper is concerned with our basic experiment. We analyze a benchmark including two thousand and more data that we achieved through twenty-four segments of code, each segment seeded with one defect, and by one hundred twelve sophomores, trained for six hours, and then assigned to classify those defects in a controlled environment for three continual hours. The focus is on: Discrepancy among categorizers, and orthogonality, affinity, effectiveness, and efficiency of categorizations. Results show: (i) training is necessary to achieve orthogonal and effective classifications, and obtain agreement between subjects, (ii) efficiency is five minutes per defect classification in the average, (iii) there is affinity between some categories.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.