Test Case Quality Factors

Article History: Received: 10 November 2020; Revised: 12 January 2021; Accepted: 27 January 2021; Published online: 05 April 2021 Abstract: The guarantee of software quality is very important. Thus, before a software is released to the end users, the flaws in the software should be detected by using high quality test cases. Currently, gauging the quality of test cases is carried out without any particular model and the criteria for good test cases is still unclear. Therefore, this study studies the literature using Systematic Literature Review (SLR) technique to identify the criteria of good test cases. The SLR considered papers between 2010 and 2018 in IEEE Xplore, ACM Digital Library, and Science Direct databases. Through the searching, it was found 310 papers are related. After filtering using exclusion and insertion criteria, 14 papers were considered for analysis. As a result, the test managed to identify 30 quality factors from the selected articles. These quality factors were additionally inspected, arranged and finished to be incorporated as the quality factors of test cases evaluation metrics.


Introduction
The most important phase for detecting software defects in producing high-quality software is software testing phase [1]. It could determine the risk reduction. The effective and successful software testing has been a worth explored issue because it really affects the success of a project. In fact, Lai [2] found that the success of a software is always under 40%. The effectiveness of the testing relates to the quality of the test cases, which depends on the amount of errors being revealed [3], [4]. This implies that the testing should reveal as many errors as possible during the testing so that the requirements are not jeopardized and meeting the acceptable level of quality [5]- [7]. Various reasons have been identified leading to software failures, including misled understanding (among team members) upon different contexts, immature experience in designing test case, and immature understanding [8]- [10]. Those identified factors are clearly possible because designing good test cases is a complex art. There is no simple formula in generating test cases [11]. However, testers could focus on two things in improving the quality of software testing and productivity; identifying the most effective quality metrics and measuring the test case quality [2]. Both the quality and testing metrics are important [12]. In fact, various applications have used the test case quality metrics, especially in evaluating existing test suites in ensuring sufficient number of testing are performed [13].
In accordance, this study gathers previous works reported in papers published in IEEE Xplore, ACM Digital Library, and Science Direct for identifying appropriate and usable testing metrics in measuring and evaluating test cases quality. For this, Systematic Literature Review (SLR) has been executed. Altogether, 310 papers have been discovered meeting the purpose of this study. The procedure and details of the SLR protocol this study has gone through are discussed at length in the fourth section.
The structure of this paper is recognized as follows: Section 2 explains the background of test cases. It is followed with a review on related works in section 3. After that, section 4 describes the procedure for this study. Then, section 5 displays the results, and the last one summarized this paper.

Background of Test Case
Generally, testing a software is costly. Hence, it aims to gather maximum number of flaws [5]- [7], [14]. It has to be extensive, covering all possible ways the system can be used [15]. Accordingly, deciding on the adequate number of testing really matters. [16] recommends to continue testing covering both functional and non-functional aspects until all critical dangers are solved.
Among the major risks that are difficult to handle include incomplete analysis on the requirement, evolved technology and context of use, swiftly changed requirements, and imperfect and inflexible management of resources allocation. In conjunction to that, [2] recommends that designers and developers should plan for early detection and prevention from flaws. It could help reducing the possibilities of flaws in the developed software. In such situations, the test cases have to be well understood. Also, it has to be effectively designed [17].

A Test Case
A test case, which consists of expected results based on the inputs (including actions, where applicable), and a set of preconditions, is constructed in determining whether or not the specified part of the test item has been correctly implemented [18], [19]. It is a very significant asset in software testing and in the software development generally. It impacts best when it is able to detect flaws very confidently, especially flaws that are hardly found. On top of that, the test cases are better, in situations where they could come out with more reliable results, improved performance, and lowered cost in terms of scheduling reliability, testability, and productivity [2], [6], [20], [21].

A Test Case Design
The quality of test cases is paramount in software testing. It substantially determines the wellness of the tests, the flaws discovered and the ultimate achievements. They eventually leading to the discovery of flaws, especially in the coding [2], [10]. This implies that it has to be well-designed, and comprehensive for the desired software being tested [17]. There are common tests being carried out in varying software [11]. On top of the common ones, writing test cases from scratch is very important. However, it is very difficult. In designing test cases, it is notable for ensuring that testing could achieve a certain level of thoroughness [22]. Hence, software testers must have sufficient skills to write good test cases [21], [23], [24], which really requires them to have a transparent knowledge on the system being tested [25]. According to Paruch et al. [24], the testers should be creative, curious, structured, able to understand the big picture, friendly and providing constructive feedback. Regarding the reasons for the difficulties in writing test cases, [11] believes it includes:  According to some styles of testing, test cases generating by people, for example, domain or risk-based testing.  There are different ways for good test cases. But not found test case that will be good in all of them.  Test cases help to discover information. Different test types are more effective for different information classes.
Besides, [10] found that a deep understanding and avoiding test case construction is necessary for producing good test cases. Meanwhile, [23] discovered that the difficulties also come from unclear requirements. This has to be avoided, because the better the test cases, the more flaws are discovered, and it eventually results in higher quality [9], [13].

A Test Case Quality
The failure of software development around the world, which consequences in tremendous losses in monetary and time has increased the awareness on software quality. It creates a major research area and should be unavoidable [26]. As a result, a universal standard has been stipulated regarding the software quality. Specifically, ISO/IEC 9126 and ISO/IEC 25010 define quality as -the extent to which the system satisfies the stated and implied needs of its various users‖ [27]- [29]. In testing, a test case quality is the attribute for fault level in testing phase [14].
As there are standards for software quality, the tasks in measuring it are daunting [30]. The difficulties in software testing vary depending on the size and complexity of the software being tested [31].
For every software testing, the software tester must regard the quality of the test cases as a very important goal [9], [32], [33]. They have to be carefully generated. In generating them, the tester has to carefully select [34] and prioritize [13], [35] so that the software is free of failure when in operation [36], [37]. In such situation, it could increase software productivity [38] and reliability [26], [33], [39].
There are many criteria for the quality of test cases. One of them lies on the breadth coverage of the functionalities in the system being tested [40]. Then, [14] added that various dimensions have to be considered in ensuring the quality of test cases. Among the common dimensions include code defect density, failure rate, cumulative failure profile, coverage factor, fault days number, fault density, modular test coverage, minimal unit test case determination, and requirement specification change request. Additionally, user satisfaction is also a quality attribute [41].
The standards (ISO/IEC 9126 and ISO/IEC 25010) can be used to validate the test cases, as to ensure they are acceptable [42]. Some quality characteristics can be referred to in ISO-IEC 25010:2011. However, [43] found that applying them is quite challenging for some testers due to some operational complications. This implies that there is a need for quality factors/metrics that can be easily referred to by beginner testers in producing high-quality test cases. As a response to that, this study takes the challenge, aims at identifying good quality factors/metrics for test cases.

Related Work
A metric is a function assigned to a value of an attribute [44]. Meanwhile, software metric refers to the way of measuring software, including its development process [45] that utilizes a metric. Further, the IEEE 1061-1998 defines a software quality metric as -A function whose inputs are software data and whose output is a single numerical value that can be interpreted as the degree to which software possesses a given attribute that affects its quality‖ [46]. [2] Emphasizes that effective quality metrics of a test case is paramount in uplifting the quality and productivity of a software. Various researchers have investigated the related perspectives of quality and quality metrics. One of the common example is the work by [47]. They concentrated on examining test case quality features generated by using test-first method. They used for comparison of software development approaches the quality of test cases. They gauged the produced code quality by test-first and test-last approaches and examined the variance of the quality of test cases in these two approaches. Total number of failing assertions, mutation score, and code coverage were used as three quality indicators in measuring the designed test cases. Moreover, the interface was also enforced. It allowed for the execution of test cases of a participant on the other participants code.
Regarding that, [2] has proposed a measurement model for the quality of test cases called Iterative and Incremental Development (IID). The model comprises thirteen features. They are classified into manageability, qualified documentation, reusability quality characteristics and maintainability indicators. [30] proposed for a quality of test cases a multi-dimensional measuring. For them, not just the detected flaws number is important but also other features such as source code and usage profiles.
Earlier, [44] came out with a set of ten questions regarding software engineering metrics. It is coupled with a framework on the procedure to perform the evaluation. Meanwhile recently, [48] proposed a metric-driven approach comprising 20 20 metrics in order to assess the inherent quality features of a dataset before released to the Linked Open Data Cloud. Based on an SLR and the ISO/IEC 25012 standard, they selected five inherent quality characteristics, which are syntactic accuracy, semantic accuracy, consistency, uniqueness, and completeness.
Later, [12] underlined the reasons for and effects of using metrics in industrial agile development. They extracted 102 metrics from previous works reported in the literature. In their study, they only considered on the metrics used by agile teams. They found that the use of metrics may lead to behavior functional damage due to negative effects that it had.
Although those metrics have been shared, researchers believe they are debatable. Hence, researchers keep studying for appropriate metrics for ensuring the quality of test cases [49]- [52].

Research Methodology
This study decided to use Systematic Literature Review (SLR) as the research methodology. It is appropriate as this study aims at knowing a problem, but not at making an attempt to address it [12]. Regarding that, this study particularly intends in order to distinguish the former research gaps, synthesize the existing research topic knowledge, provide a continues research method which may provide sufficient details when applied in a suitable way to be used by other researchers, and supply background information to start exploring a new research topic [12]. For such purpose, this study adapted the guideline provided by [53]. Generally, the guideline acts as a basis for developing the protocol of the SLR. In the execution, this study collected and reviewed works on test case quality between 2010 and 2018 and produced good test cases by identifying their factors and metrics.

Research Questions
The core purpose of this study is to determine the factors that affect the quality of test cases. Particularly, this study focuses on the metrics and measurements of the test cases in making high-quality testing. In supports for that, the following research questions need to be answered: RQ1: How much are the conducted research activities between 2010 and 2018 related to the quality of test cases? RQ2: What are the quality factors/metrics for producing a good test case? RQ3: Is the effectiveness of test case affected by the quality factors/metrics?

Search and Selection Process
The search and selection process have been carried out to select the primary studies. It contains three steps as detailed in Table 1.
Step 1: Selecting Source Repositories Suitable databases were selected in this step. This study considered IEEE Xplore, ACM Digital Library, and Science Direct only, which are the most appropriate for the field of study, software engineering. It was decided based on the recommendation by [34] that IEEE and ACM cover almost all prominent conferences in software engineering, while Science Direct covers nearly all important journals in software engineering. The execution was begun with entering the reserved words related to the research questions. To obtain the most relevant search results, this study switched the string with (OR, AND) operators suitable with the time span between 2010 and 2018. Two stages of searching were used in this study. Firstly, with string (-test case‖ OR -test case quality‖) AND (-metrics‖ OR -factors‖ OR -indicators‖), which resulted in 268 papers, as detailed in Table 1. Having read the articles, this study discovered that some of the studies use the term -effectiveness of test cases‖ instead of the -quality of test cases‖. Therefore, the second stage was performed with the string "test case effectiveness" OR "the effectiveness of test case". It resulted in 42 papers, as detailed in Table 1.
Step 2: Reading Titles and Abstracts In this step, according to inclusion and exclusion criteria (section C), 39 papers were extracted from the first stage and 15 were selected from the second stage. The titles and abstracts of the included and excluded papers had been read. In case the abstract is unclear, the content of the paper is scanned. Through this process, 54 papers were selected, as detailed in Table 1.
Step 3: Reading Full Text The full paper of the selected abstract was then gathered. They were carefully read. Eventually, considering the selection criteria, 14 of them were selected, as they meet the requirement for this study. Step 2 Step 3 Step 1 Step 2 Step 3

Data Extraction
This study extracted data by carefully and critically reading through the full papers. This data extraction involved two phases. Firstly, standard information [53] was collected, which include the publication year, author names, title, and summary of the study. Secondly, information that is directly related to the research questions of this study was collected.

Results
This section provides the research questions answers together with the SLR results. RQ1: How much are the conducted research activities between 2010 and 2018 related to the quality of test cases? The answer for this question is depicted in Tables 1 and 2. The total number of papers that are related to quality testing cases is 310. However, only 14 papers are deemed to be the most related as listed in Table 2. Subsection 5.1 provides more details about the selected studies.

Overview of Studies
This section details the overview of the primary studies related to quality test cases. It was found that there are 14 papers in IEEE Xplore, ACM Digital Library, and Science Direct databases between 2010 and 2018 reporting on quality test cases (as detailed in Table 1). Most of the studies (8) are published in the ACM Digital Library, followed by IEEE Xplore (5), and Science Direct (1). Further, Table 2 presents the details of the 14 papers. The most similar study is S3, which was conducted in 2017. However, the study only focuses on the test case selection techniques instead of the quality of test cases. Thus, for the past eight years, this was the first study performed to identify the quality factors and metrics in producing high-quality test cases as well as good testing.
The that almost all studies describe the quality of test cases in terms of structural design (code-based), whilst only one provides the test case generation quality in the specification (black box) and white box methods.

RQ2:
What are the quality factors/metrics for producing a good test case? The answer to this question is described in Referring to Table 3, it could be seen that 30 of quality metrics have been identified from the 14 primary studies. The most used metric is Coverage [S3, S4, S6, S8, S9, S12 and S14], which has various types such as branch, statement, condition, and method. Among all 14 studies, there is only one has used coverage metric [S3], while others used only some of it [S4, S6, S8, S9, S12, and S14]. Coverage is considered as a good indicator to be used as a proxy for evaluating the quality and the completeness of test suites [34]. However, S3 and S12 do not recommended other studies to merely use coverage because it is insufficient as it is not a good quality measurement for testing suite's effectiveness. For them, coverage has to be used together with other metrics. Meanwhile, S9 and S14 used branch coverage metric for comparison with their proposed metrics. On the other hand, S10 used mutations rather than coverage because the former not only know where to test but also what to test for. In contrast, S13 tried to improve the quality of test cases by analyzing the mistakes of test cases based on the knowledge of the test case writers instead of providing any quality metrics for usage. They found that most of the test cases have a deficiency quality in the light of the absence of understanding in regard to the relating knowledge, which is essential for test case design.
In general, all identified quality metrics from the selected primary studies are used for producing good test cases. The metrics were identified either based on current release of the system, the previous release, experience of the test team, diagnosability of the test cases, or similarity.
RQ3: Is the effectiveness of test case affected by the quality factors/metrics? Referring to S4 and S11, the test case effectiveness refers to the test case ability to detect more flaws or determine the number of flaws revealed. By revealing more failures, the chances of producing a more quality test cases will be higher. Thus, the results show that the effectiveness of test cases is affected by the quality of test case metrics. However, the coverage metric should not be used alone due to it is poor predictor of test case effectiveness [S3, S12]. It gives an assessment of its efficiency by pinpointing the root driver of defect given when the fault is recognized. S14

Conclusion
Towards building a high-quality software testing, thirty quality metrics have been identified from 14 primary studies through SLR. As stated by former studies, the test cases effectiveness in discovering flaws in most applications are influenced significantly by software quality metrics. In addition, for different applications the metrics may be able to create good test cases quality besides evaluating test case quality. In future, the scope of the research will be expanded to include extra data repositories to obtain as many related articles as possible. In addition, the plan will include the construction of standard for quality of test cases that can be utilized in different applications.