Assessing Statistical Literacy Level of Postgraduate Education Research Students in Malaysian Research Universities

Statistical literacy is an essential component of research literacy demanded for the research students to master as they are required to read, comprehend, and evaluate research articles. Misinterpretations of data and research findings are among the unfavorable results in the lack of research literacy. Consequently, it will affect the quality of their research and eventually leads to ripple impact on other researchers. However, this study is still under researched especially among postgraduate research students. Therefore, this paper examined statistical literacy level among postgraduate research students. This study was a survey conducted with a sample of 236 education postgraduate research students by using a set of statistical literacy items. The data were analysed using Rasch Analysis approach which includes item and person measures. Findings suggested that postgraduate students‘ statistical literacy level is at Moderate Low level. The findings also revealed that the hardest items to be answered by the students are related to hypothesis testing (significant value). In order for students to improve their statistical literacy, this study suggested that the instructors and institutions to reexamine and to explore new methods of teaching and learning statistics.


Introduction
Undergoing the process of reading, evaluating, and interpreting primary research articles are inevitable for postgraduate students. Postgraduate students are expected to be both the consumer and producer of a research (e.g., thesis, research article). Primary research articles generally comprised of an abstract, introduction, methodology, findings, and discussion and conclusion. Commonly, in research articles, especially quantitative research; research findings are presented in the forms of charts, graphs, tables and others. Quantitative approaches usually employ measurement, and statistical experiment analysis to test a hypothesis and to answer research questions. Therefore, reading research articles that involve statistical analysis and results requires students to be able to understand, interpret, and rationalise the methods applied, correlate them with data and discussion presented. All these mentioned key skills are associated with statistical literacy.
Research articles are more complex than any simple reporting as it is intended for professional audiences such as researchers, lecturers, and also practitioners of particular fields. There are some evidences that show the hardest sections to understand from research articles perceived by students are the methodology and data interpretation sections (Lie et al., 2016;Round & Campbell, 2013). Due to the lack of essential knowledge of statistics, students took a longer time to understand the figures and data section (Round & Campbell, 2013). Hence, students resorted to spending more time reading texts in the articles and considered them as facts instead of understanding the data and figures (Lie et al., 2016). In addition, there is also a study reported that the lack of statistical literacy resulted in the misinterpretation and misuse of the research result (Gardenier & Resnik, 2002), hence, affecting the quality of their study and eventually other researchers who refer to their article. Although the misuse of statistics in empirical research may be due to various other reasons such as deliberate deception and negligence, most of the studies reciprocally agreed on one contributing factor to this mistake which is a lack of statistical and research knowledge (Ercan et al., 2007;Gardenier & Resnik, 2002;Repišti, 2015). Therefore, acquiring statistical literacy is imperative for students to be able to interpret and use the data being presented to them. However, studies on statistical literacy are still insufficient, specifically among postgraduate research students in Malaysia. Generally, this may be due to the assumptions that postgraduate students, especially doctoral students, ought to have only a little or no difficulties at all with academic reading skills (Burgess et al., 2012;Singh, 2014). Evidently, the difficulty of interpreting research articles is not only experienced by undergraduate students, but also other higher levels of study as well, such as postgraduate students and postdoctoral students (Hubbard & Dunbar, 2017). Therefore, this paper attempts to assess statistical literacy of postgraduate research students in Malaysia based on the ability to interpret statistical analysis of data presented in the form of numerical information, charts, tables, and graphs.

The Concepts of Statistical Literacy
The importance of statistical literacy is mentioned in the Australian Bureau of Statistics (ABS) (2013): "Statistics help you to understand and learn from the past, make sense of the present, and make inferences about the future. The value of statistics is only as great as your ability to accurately understand, interpret and evaluate the available information" Australian Bureau of Statistics (ABS) (2013) The ABS (2013) advocated that statistically literate individuals are able to make sense of statistics by thinking critically of the numbers being presented, understand, and discuss the data. ABS distinguishes statistically numerate as being able to work with numbers, while statistically literate as being able to make sense of the numbers. Additionally, ABS suggested that statistical literacy is necessary for all data consumers. However, the level of literacy can be different. For instance, people who are required to analyze data and interpret it (e.g. data analyst, researchers, students, etc.) will require more advanced skills compared to those who need to use information involving statistics.
Though statistical literacy has not been clearly defined; a common definition of statistical literacy has been used by several authors such as Gal (2002), Ferligoj (2015), Sharma (2017) and Wallman (1993). Gal (2002) proposed that statistical literacy consists of two components; (i) the ability to interpret and critically evaluate statistical information within varied contexts, and (ii) the ability to discuss statistical information. Both Wallman (1993) and Gal (2002) defined statistical literacy in the context of data consumers, not of those who engage directly in the empirical investigation of actual data. Their notion of statistical literacy has been used and adapted in many studies in the educational contexts such as Lery et al. (2015), Martinez-dawson (2010), Sharma (2017), and Yotongyos et al. (2015).
Based on the reviews of statistical literacy concepts (Ben-Zvi & Garfield, 2004;Gonulal, 2016, Groẞ Ophof et al., 2017Reston, 2005;Shank & Brown, 2007), this paper proposed three main components of statistical literacy that should be mastered by research students. The components include knowledge of statistical concepts and terminologies, types of statistical tests, and interpret statistical analysis. Statistical concepts and terminologies can include concepts of population, samples and representativeness (Ben-Zvi & Garfield, 2004;Gonulal, 2016;Reston, 2005;Shank & Brown, 2007), types of data (Shank & Brown, 2007), central tendency (mode, median, mean) and others such as range, standard deviation, frequency (Gonulal, 2016;Reston, 2005;Shank & Brown, 2007), and significant test (p-value and hypothesis). These components are the most fundamental knowledge that should be mastered by students (Ben-Zvi & Garfield, 2004). This is crucial because, for an example, if a student is using a nominal scale, students have to know that "mean" is inapplicable to describe their data.
The second component is the type of statistical tests. This component focuses more on whether students can identify which technique appropriate to analyze data (Gonulal, 2016;Reston, 2005;Shank & Brown, 2007). There are two types of analysis approach, such as descriptive statistics and inferential statistics. The descriptive analysis includes the mean, mode, median, standard deviation, frequency, range, etc., while inferential statistic consists of the non-parametric and parametric test. Non-parametric tests include Chi-square, Kruskal-Wallis, Mann U Whitney, etc., while parametric tests include t-test, ANOVA, MANOVA, etc. The latter component requires students to have adequate knowledge of the first component first, especially on the statistical concepts.
The last component is the ability to interpret statistical analysis. Knowledge of statistical concepts and understanding statistical approach are prerequisite prior to having the ability to interpret statistical analysis (Ben-Zvi & Garfield, 2004). Thus, the research concluded that assessing the ability to interpret statistical analysis can include both of knowledge of statistical concepts and statistical test approaches. Students should be able to interpret statistical information and can make a conclusion out of it (Ben-Zvi & Garfield, 2004;Gal, 2002;Sharma, 2017).
Thus, this paper proposes three domains in statistical literacy: (i) Familiarity of basic statistical concepts and terminologies: a. central tendency (mode, median, mean) and dispersion (range, standard deviation) b. significant test (hypothesis testing, p-value) c. types of measurement data (ii) Statistical tests (parametric and non-parametric) (iii) Interpret statistical analysis (data presented in the table, graph)

Methodology
This survey was conducted in five Malaysian research universities. As for the ethical consideration, prior to data collection, a permission letter to conduct the survey was emailed to the deputy dean of each faculty involved. The sample was stratified into two groups: (i) doctoral students and (ii) master students. A total of 236 education postgraduate research students were purposively selected with 126 of them are doctoral students while another 110 are master students.

Instrument
The items in the test consisted of 15 of multiple-choice questions as shown in Table 1. All components of statistical literacy have a total of five items each. All items were validated prior to distributing the test to the participants.

Data Analysis
The data was analyzed using Rasch Analysis approach. Analyses involved are item and person measures, item separation, item strata, and frequency. Item separation index and person measures were generated from Winstep software. The threshold of statistical literacy level was determined using item strata which can be calculated using this formula: H = (4Q + 1)/3 where Q is item separation index obtained in Winsteps.
With the value of item separation of 4.13, the value of strata becomes six (6) which then classified as Very Low, Low, Moderate Low, Moderate High, High and Very High. With the mean (M) of 0.00, and standard deviation (SD) of 0.62, each threshold and strata specification are shown in Table 2. The classification of respondents is based on their person measure logit and distributed based on demographic variables such as Postgraduate Level, Gender, and Mode of Study.  Figure 1 shows the average logit for each demographic group. The figure shows only doctoral students group is at Moderate High with the mean ability logit of 0.23. In contrast, the master students group has the lowest mean

Moderate High
Mean ≤ MH < 1SD 0.00 ≤ MH < 0.62 High 1SD ≤ High < 2SD 0.62 ≤ High < 1.24 Very High ≥ 2SD ≥ 1.24 ability logit with logit of -0.71 (Low). All other groups, male, female, full-time and part-time, are also at Moderate Low level with logit of -0.26, -0.18, -0.23 and -0.16 respectively. Further analysis was conducted to identify which statistical literacy item is the most difficult and which item is the easiest. Table 3 shows item difficulty logit of each item. The highest difficulty logit is the items assessing the concept of significant value with item difficulty logit of 0.98 (Item 10) and also interpreting significant value (Item 9) with item difficulty logit of 0.85. All items related to significant value have a positive logit (difficult). The easiest item is the item measuring statistical test (Item 7) with item difficulty logit of -1.59.

Figure 1. Statistical Literacy Level based on Demographic Variable
Based on the results above, it is evident that only doctorate students are at Moderate High level with an average person measure logit of 0.23. While, master students are at Low with an average person measure logit of -0.71. The results also indicate that postgraduate students, regardless their gender and mode of study are at Moderate Low level. Overall, these results show that postgraduate students still lack of statistical literacy. These results are supported by the previous studies such as Yotongyos et al. (2014) Lie et al. (2016) and Round and Campbell (2013) that students perceived that the hardest section to understand is data analysis and results.
Additionally, among all statistical literacy items, majority of the difficult items are generally on hypothesis testing (significant value). The findings revealed that most postgraduate students are still unable to grasp the concepts of hypothesis testing fully (e.g. Item 10 (Figure 2)), therefore, it affects their ability to interpret data on significant value (e.g. Item 11, Item 15 ( Figure 3). Similarly, Lie et al. (2016), and Hubbard and Dunbar (2017) revealed that postgraduate students perceived interpreting "experimental data" is difficult. A study by Gonulal (2016) also revealed that though doctoral students have a good understanding of both descriptive and inferential statistics, they still struggle to interpret inferential data. Similarly, Sotos et al. (2009) study also indicated that university students still superficially understand of the concepts of hypothesis testing. Anderson et al. (2013) who examined statistical literacy among Obstetrics and Gynaecology (Ob-Gyn) postgraduate students also found that only 46% of 4713 participants were able to answer item hypothesis testing item correctly.

Conclusion
Statistical literacy has been associated with research literacy, which research literacy is defined as the ability to identify, access, interpret, and evaluate research articles. Though it has become an essential component in research literacy, it is still challenging for university students including postgraduate students to develop statistical literacy. As statistical literacy has always been associated with the quantitative research methodology, this could justify the low performance in statistical literacy in this particular study. There is a probability that students who have quantitative orientation perceive statistics as more essential compared to students who have qualitative orientation. Other than research orientation, it would also be beneficial to know in the future study if other factors such as students' experience like attending statistics courses, motivation and perception on statistics could be also the root of this issue.
In essence, it is also worth to mention that statistical concepts and terminologies are the most basic knowledge that should be mastered by students. This is fundamental prior to applying the knowledge in choosing the right statistical test and interpreting data findings. Therefore, this paper provides an insight for instructors, and even institutions, to reassess learning process and explore various learning methods and tools to improve statistical literacy of their students.