Mathematics Test Development By Item Response Theory Approach And Its Measrument On Elementary School Students

This research is a development and measurement research. This research is conducted to develop a good instrument in measuring elementary school students’ skill by IRT approach which based on; 1) validity and reliability of instrument; 2) assumption test; unidimensional, local independence and parameter invariance; 3) characteristics of item test and 4) measuring students’ ability. This research is developed by the development model of DjemariMardapi, such as: 1) arranging test specification, 2) writing the test, 3) examining the test, 4) conducting trial test, 5) analyzing the item test, 6) improving the test, 7) assembling the test, 8) implementing the test and 9) interpreting the result of test. The research founding; 1) the range of aiken value developed for instrument test of grade 3 is 0.83-1. While the reliability coefficient of instrument test of math skill in grade 3 is 0.883. the result indicates that the item test developed has good validity and reliability. 2) unidimensional test is fulfilled since the test is proven only measure one dominant dimension i.e., the same skill. The assumption test of local independence is also fulfilled due to the value of covariant among interval of skill is small or close to zero. The calculation result between difficulty level from response includes in high category, therefore the assumption of parameter invariance of skill is fulfilled; 3) based on analysis of the three instruments of fit model result, it fits on 2Pl model, so on the parameter estimation of the item, overall package estimates on 2PL model or parameter b and a (difficulty level and trick). Based on the analysis result of parameter item test in grade 3 indicates that the overall item is on good categories of difficulty level and trick; 4) the measurement of students’ skill indicates that the average of students’ skill is 0.451 with the maximum skill score of students is 3.185 and the minimum students’ skill is -2.282. if it is observed from students’ average score, students in the research samples have good average of math skill.


INTRODUCTION
Trends in Mathematics and Science Study (TIMSS) results (2012) attended by grade VIII Indonesian students in 2011. For mathematics field, Indonesia is ranked 38th with the score of 386 of 42 countries. Indonesia's score is down 11 points from the assessment in 2007. Likewise, the Programme for International Student Assessment (PISA) under the Organization Economic Cooperation and Development (OECD) (2013) conducted a survey on student skill and education systems. Students' skill is assessed in this survey, such as math skill, reading skill and scientific skill (science) that reflects the education system in their each country. The survey results show that the math skill of students in Indonesia ranked 64th out of 65 countries or second from the bottom with a score of 375.
The results of the TIMSS and PISA survey show that Indonesian students' math skill is still low, both in the content dimension and cognitive dimension. Assessment of the dimension of content on domains: numbers, algebra, geometry, data and opportunities, while the assessment of cognitive dimensions on the domain: 1) knowledge, includes facts, concepts and procedures that students should know; 2) application, focusing on students' skill to apply knowledge and understanding of concepts to solve problems or answer questions; 3) reasoning, focusing on solving non-routine problems, complex contexts and performing many problem-solving steps.
In line with the statement above, based on the results of report from the research conducted by the Center for Development of Mathematics Teachers examining in several elementary schools in Indonesia revealed that 51 percent of students have difficulty in the aspect of counting, 50 percent of students have difficulty in mastering concepts, and 49 percent of students have difficulty in solving story problems (PPPG Mathematics Team, 2001: 18). Furthermore, in 2002 based on the results of research from the PPPG Mathematics Team revealed that in some areas of Indonesia, most elementary school students have difficulty in solving story problems and interpreting the story problems into mathematical models (PPPG Mathematics Team, 2002: 71) There has been no unanimous agreement among mathematicians until today, what it should be called mathematician. According to Hans Freudental (Marsigit, 2013: 10) mathematics is human activity and must be associated with reality. Therefore, when students do learning activity of mathematic, there is mathematical process.
There are two types of matematization, such as: (1) horizontal matematization and (2) vertical matematization. Horizontal matematization process from the real world into mathematical symbols. The process occurs in the student when he/she is faced with real life/situation problem. While vertical matematization is a process occurring within the mathematical system itself; for instance: finding strategy to solve problem, linking relationship between mathematical concepts or applying formulas/ formula findings. Begle (1979: 6) classifies direct objects in learning mathematics into fact, concept, skill, and principle. Fact is mathematical object which is convention that can be expressed in symbols. A concept is an idea or idea formed by looking at the same character of a set of appropriate object. Skill is a procedure or set of rule used to solve math problems. A principle is a statement of true value, containing two or more concepts and stating the relationship between the concepts.
In order to students have the skill to solve math problem, so the fact, concept, skill and principle are needed. For instance, if students are asked to calculate the area of a flat plane in the form of isosceles triangle, of course they should understand the concept of isosceles triangle, use certain symbol (fact) when constructing a formula for the area of isosceles triangle, have skill in performing calculations of the area of isosceles triangle, and understand the principles in determining and using the formula of the area of the isosceles triangle.
Teacher should facilitate students to learn mathematic through the process of experiencing, therefore students will understand and conceive about the fact, concept, skill, and principle that can be used for problem solving both routine and non-routine. The learning process should provide students with first-person experience to construct knowledge, skill, and ethical attitude (Ramadhan, S., Nasran, S. A., Utomo, H. B., Musyadad, F., &Ishak, S. 2019). Therefore, students will have competence and are able to use or utilize mathematics to solve problem they face in their daily lives.
The learning process in educational unit is organized interactively, inspiring, fun, challenging, motivating learners to participate actively, as well as providing sufficient space for initiative, creativityand independence which are appropriate with the talent, interestand physical and psychological development of learners (Regulation of the Minister of Education and Culture No. 65 of 2013). Learners are encouraged to be able to develop their own knowledge through guidance provided by teachers. This view is based on the assumption that mathematics is the activity of human life (Turmudi, 2008 : 7) or "mathematics as human sense-making and problem solving activity" . In mathematics learning, students should be stimulated to find themselves, conduct their own investigation, prove conjecture themselves, and find out the answers to their friends or teachers' questions.
The material scope and level of competence of learners that should be fulfilled or achieved in an educational unit at a certain level and type of education are formulated in the Content Standards for each subject. Content Standards are criteria regarding the scope of materials and the level of competence to achieve the competence of graduate at a certain level and type of education. The scope of material is formulated based on mandatory content criteria stipulated which is appropriate with the provision of legislation, scientific conceptand characteristic of educational unit and educational program. Furthermore, the level of competence is formulated based on the criteria of the level of learners' development, the qualification of Indonesian competency, and the mastery of tiered competency (Government Regulation No. 32 of 2013).
Core Competency is the translation or operationalization of SKL in the form of quality that should be owned by those that have completed education at a certain educational unit or certain level of education, an overview of the main competencyclassified into aspect of attitude, knowledge, and skills (affective, cognitive, and psychomotor) that should be learned by students for a level of school, class and subjects. Core Competencies should describe the balanced quality between the achievement of hard skills and soft skills (Ministry of Education and Culture, 2013: 5). Core competencyis designed in four interconnected groups, such as with regard to religious attitude (core competency 1), social attitude (competency 2), knowledge (core competency 3) and the application of knowledge (competency 4).
In the curriculum of elementary school mathematics education (Curriculum, 2004), it is mentioned that the effort of improving the quality of education needs to be implemented thoroughly which include aspects of knowledge, skill, attitude and others. The development of these aspects is implemented to improve and develop life-skill through a set of competencies, so that students can survive, adjust and succeed in the future. These skills require systematic, logical, critical thinking skills that can be developed through problem solving in mathematics learning. Therefore, the construction of mathematics problem encourages students to maximize their thinking skill and give flexibility for students to develop problem solving skill based on their experience in daily life.
Students' thinking skill in solving math problem will be reflected in solving math problem. Therefore, the steps to solve math problem tend to be unlimited and vary in order, depending on students' skill in mastering math materials. Math problem tends to be expressed through questions or statements combined with various forms such as story, table, graph and diagram. Aspect of skill measured in math problem include aspect of memory, understanding, application, analysis, syntheses, and evaluation which is appropriate with Bloom's taxonomy.
According to DjemariMardapi (2012: 110) there are nine steps that should be taken in arranging a standardized study test, such as: (1) compiling test specification, (2) writing test, (3) examining test, (4) conducting test, (5) analyzing test item, (6) improving test, (7) assembling test, (8) implementing the test and (9) interpreting test result. Setting the test specification is elaborating the overall characteristics that a test should have. The procedure for drafting test specification include: determining the test objective, arranging the test grid, determining the test form and determining the length of the test.
Hambleton &Swaminathan (1985: 226) state that the process of developing test with item response model, includes: (1) preparation of test specification, (2) preparation of question pool, (3) implementation of test in the field, (4) selection of test question, (5) compilation of norm reference (for norm-referenced tests), (6) specification of pass limit score (for criterion-referenced test), (7) reliability study, (8)  The parameter of invariance item has an important consequence, if the item is relatively numerous, the item parameter can be estimated for items that are not answered by the testee. This is known as person free item calibration or free individual item calibration. Wright & Stone (1979) describe that the calibration process could be used in detail on Rasch model. The way conducted isif all items are fit for logistic model 2 parameters, if selected A and B as group 1 and part B and C as group 2. Group 1 has an average score of 0 and a standard deviation of 1, this is same with group 2 if it has an average of 0 and a standard deviation of 1.
The response function of an item contains two parameters such as item parameter and skill parameter. According to Baker (2001: 134) calibration in IRT is the process of determining the parameter of an item and the skill parameter of the item response function. Wells, Subcoviak, &Serlin (2002) state that the calibration process is used to estimate the parameter of the problem grain and observe the skill of the item in distinguishing between latent trait level. Meanwhile, according to Yen & Fitzpatrick (2006: 129) state that calibration is the process of determining the estimation of item parameter and the skill of item response data on IRT. So, calibration is the process of determining the estimation of grain parameters and the parameters of the ability to be known its position in the test instrument.
Until now, there are still a number of problems that are often encountered in schools related to the quality and implementation of assessment activity, especially in the elementary school level. This problem is related to the objective, planning, implementation, result and follow-up of assessment result conducted by both teachers and schools. The problem is based on the result of research conducted by DjemariMardapi, et al. (1999: 45) reveal that there are still many teachers in making test questions not based on the test grid, but tend to only use the questions on the books. Likewise, the results of research conducted by Kumaidi (2005: 5-6) reveal that teacher in compiling test less or not even make the grid first. Many teachers are less in utilizing the test result data to improve the learning process, but it is more used to give the label for students as graduating or not graduating or giving a report number. The form of the number or status of the student is a label given by the teacher which may subsequently have a poor implication.
The result indicates that teacher in preparing testonly used to directly writing the detail of the question without being accompanied by good planning. The planning relate with the determination of behavior aspect or skill tested, determination / selection of essential material, determination of the proportion of cognitive aspects (memory, understanding, application) for each basic competency / indicator, and so on. In addition, teachers do not use the assessment data to find out the extent of their learning success and the strength or weakness of students which need to be responded to make improvements or enrichment.
Another problem isassessment conducted by teachers or schools are often interpreted only for the purpose of giving grades to students, so that the real purpose of assessment is to know how far students have been able to master a basic material / competency taught. Similarly, giving the score for students is often based only on a percentage of a student's correct number of answers on a test without taking into account the weight of each item that builds up that test. As a result, the results of the assessment conducted by the teacher become biased and unable to describe the true competence of the students.
This research conducted try to develop the test to measure the math skill of elementary school students that can be used to identify the level of math skill, measure the development of mathematics skill and develop a profile of the achievement level of the student's math skill to reveal the aspects of the skill tested, whether it is successful or failed to be mastered by the student and strength and weakness of the student.

RESEARCH METHOD
This research is a development research with quantitative approach, which aims to produce a product. The product in this research is mathematics skill test instruments of public elementary school students. The product is produced through instrument development procedure.
Instrument development model in the form of test use modification of the Wilson Model (2005: 18) and Order and Antonio Model (1998: 34) with the following steps: (1) initial development of the test, (2) test trial, and (3) broad-scale trial. Initial development consists of: test design and validation by experts' judgement. After the test design is complete, the test is validated by an expert, if there is an item that is not yet reliable, the test is revised first until the test is valid in content (Ramadhan, S., Sumiharsono, R., Mardapi, D., &Prasetyo, Z. K. 2020). Then the instrument is tested on students in grade III elementary school. Based on trial, unfit items were revised and fit items were assembled as fit mathematical tests. This mathematical test is ready to be used for measuring, then continued the broad-scale trial process.

Determining the Test Goal
On the stage of initial development test, first that should be done is determining the test goal. This instrument includes summative test due to given at the last final semester. Do, the goal of the test is to know the students' math skill of elementary school.

Determining the Competency Tested
After the test goal is clear, the next step is chosen the competency tested. This competency is appropriate with the core and basic competency for math subject in grade 3 of elementary school. Based on the core and basic competency, then determined the appropriate indicators.

Determining the Material Tested
Based on the competency standard, basic competency and indicator, the next is describing Math material of grade 3 of elementary school which is appropriate. An appropriate math material for grade 3 of elementary school include: number, geometry and measurement.

Arranging the Test Grids
To able to make a good item test, it needs a grid of test. The grid is a matrix containing the specification of test items made. These grids are the guidance of question made, therefore, by the test grids anyone making question, will produce the question and the difficulty level which is relatively same.
1.5. Writing the Item As it has been stated above, the test grid has crucial role in test development. The test item is made based on the test grid.
1.6. Arranging the Scoring Guideline The test can be used if it is completed by the guideline of scoring. The scoring guideline is designed to maintain the objectivity of assessment and scoring certainty obtained by the test participant.

The Content Validity
After the items are compiled in the math skill test in grade III of Elementary School and the scoring guideline is conducted the limited trial. This limited trial was conducted with the aim to find out the readability of the test details. Limited trial results are used as the basis for revision and refinement of the items. Besides limited trial, in order to obtain good instruments, the lattice of instruments, items, and scoring guidelines that have been compiled are subsequently reviewed, and validated. The validation process, in order to meet the requirements in terms of concept, construction and language is used with expert's judgement.

Item Improvement and Assembly Line of the Test
To make improvement to the test item, qualitative analysis of test quality on the grid, instrument items, and assessment guidelines are conducted first. The first step is to examine aspects, sub-aspect, indicators, and instrument grids. Second, a review of all the test items that have been completed is compiled. Third, all the items that have been compiled are tested on a limited basis. Finally, item improvement is based on limited trial results and expert forum study.

Instrument Trial
The trial stage in this case is named limited trial which consist of several ways such as 1) determining the subject trial, 2) implementing the trial and 3) analyzing the trial result.

Wide Scale Trial
Wide scale trial has a goal that not only to determine the characteristics of instrument but also determine the individual skill of the respondence. This stage includes: 1) test assembly, 2) wide scale test implementation, 3) result analysis, 4) result interpretation.

Design and Trial Subject
The trial will be conducted in grade 3 of elementary school in the area of Lubuklinggau City such as SD Negeri 11 Lubuklinggau, SD Negeri 42 Lubuklinggau, dan SD Negeri 58 Lubuklinggau.
The test subjects in this research are elementary school students since the students are the main users of the product developed in this research. Besides that, students are required to obtain a coefficient of test reliability and wearability of developed tests. The research subjects of elementary school students selected in this study will be used to collect data on students' math skill. The selected elementary school students are grade III, IV, and V. Item instruments developed is 40 items and used to measure grade III. In the limited trial, 160 students are taken and 228 students for measurement.

Technique and Instrument of Data Collection
Data collection in this research is done by stratified random sampling technique. The sampling step of the research begins with identifying the strata of schools that are members of the population. Determination of school strata by considering the category of school that is distinguished from excellent and non-excellent school. After the school strata is identified, three schools are randomly selected in each school category. All grade III students in the selected schools are the samplesof research.
The instrument or data collection tool used in this research is a test, a test of mathematics skill developed by researcher based on Core Competency / Basic Competency / teaching material in elementary school. Determination of Core Competency, Basic Competency and indicator tested are conducted through Group Discussion Forum (FGD). FGD participants are teacher, subject teacher, study expert (mathematics lecturer) and researcher. FGD participants choose Core Competency, Basic Competency and indicator that are reliable or important to be tested at summative tests (end of semester).

Data Analyzing Technique
The respondent's answer sheet is corrected and discrete by the assessor. Assessors are elementary school math teachers that have attended the training. The training was conducted twice, namely: (1) equalling understanding of the contents of the test details, and (2) equalling the understanding of the way of scoring. The data of test result are analysed quantitatively. Analysing the items use customized scales. Data has been already in the format and analysed by using BILOG-MG program. Based on the analysis of the test results, obtained the parameters of the test item, so that it can be done the improvement of the details of the question that is deemed necessary. Proof of validity based on internal structure can be verified by CTT and IRT (Vendramini&Silvi, 2011:1). Therefore, based on the analysis of test results, it is found out: (1) the details of the problem that are not fit and (2)  The second characteristic use the difficulty level or index difficulty by utilize the BILOG-MG program to obtain index difficulty or difficulty level (b). The item can be stated as good item if the index of difficulty is more than -2.0 or less than 2.0 which is able to be stated with (-2,0 <b <2,0).

Information Function and SEM
Based on the analysis with Pascale, obtained the information function and standard error of measurement (SEM). Based on the information function and SEM, this test is suitable for learners having low, medium and high skill(θ).

RESULT AND DISCUSSION 1. Development Result of Initial Product
One of the educational problems related with the quality of education is the low mastery of students toward the competency, as a result of inadequate assessment. The assessment system is not optimal because: (1) the quality of test made by teacher is still inadequate, (2) the monitoring of the testing network in the area has not been implemented properly, (3) the reporting of exam results has not been optimal and (4) the utilization of exam results has not been done optimally.
Based on the description above, this research try to develop math skill test for elementary school students in grades III, IV, and V. Development of math skill test refers to Core Competency and Basic Competency based on Curriculum 2013. The test is used to: (1) identify the math level of elementary school students, measure the progression of elementary school students' math skill and profile the achievement level of the student's math skill.

Content Validity
Validity is classified into three types, such as: (1) validity of content, (2) validity of criteria (criterion-related)and (3) validity of construct (Nunnally, 1978, Allen & Yen, 1979, Fernandes, 1984, Woolfolk &McCane, 1984, Kerlinger, 1986, and Lawrence, 1994. This validity can be found out through the analysis of the contents of the test and empirical analysis of the test score of grain response data (Lissitz&Samuelsen, 2007). The validity of the contents of an instrument is defined as to what extent the items in the instrument represent the components in the entire content of the object to be measured and to what extent they reflect the characteristics of behaviour to be measured (Nunnally, 1978;Fernandes, 1984).
The validity of the content is determined by using expert agreement. Expert agreement of the field of study or often referred to a measured domain determines the level of content validity related (HeriRetnowati). This case is caused by the measurement instrument, such as test or questionnaire is proven valid only if the expert believes that the instrument is able to measure the mastery of the skill defined in the measured domain. Analysing the validity of the content use the aiken formula.
Aiken formulate the formula Aiken's V to calculate the content validity coefficient which is based on the result of assessment from the experts' panel as much as n of people toward an item of to the extent of the item represent the construct measured.
The instrument can be stated valid if the experts believe that the instrument measure the things which will be measured. Experts' judgement give the scoring used that will be used to prove the content validity toward the number of instruments in this research. The instrument that will be validated are as follow:  Table 1 is the validation result of instrument which use the index of Aiken V. based on the data above, the range of aiken value for the instrument is 0.83-1. While based on aiken table, if the number of items is 40, consist of 5 criteria and there are 3 raters, so the minimal limit accepted is 0.92. based on the data, it can be stated that all the items are proven valid reviewed from the content of validity except item number 17 and 19, therefore need to be revised again.

Reliability
The reliability of a test is generally expressed numerically in a coefficient of -1.00 ≤ ≤ +1.00 (Retnawati, 2016). Mahrens & Lehman (1973) state that although there is no general agreement, it is widely accepted that for test used to make decisions on individual students should have a minimum reliability coefficient of 0.85. The estimated reliability of the research used the Cronbach-alpha formula and was analysed with the support of the SPSS 22 program. The estimated reliability results on three instruments are presented below  Table 2 is reliability estimation result in research instrument developed. This research estimate the reliability on the instrument of grade 3 where the instrument consist of 40 items of question. Table 2 above explain that the instrument of grade 3 has coefficient of reliability which is 0.883. The result of coefficient estimation of reliability show that the instrument develop is reliable to be used in measuring the students' math skill of elementary school grade 3.

Instrument Trial Result
The trial instrument result is begun with assumption test. Assumption test is a precondition test to find out whether the result of research is reliable to be conducted to the next test step or not. The precondition test in this research consist of unidimensional test, independence of local and parameter invariance.

Unidimensional Test
Assumption test which should be fulfilled is that every item of test only measures one skill. One of the ways to test this assumption is by analysis factor producing KMO, Eigen value and variant that can be explained and the component of factor. Analysis of exploratory factor is conducted by the support of SPSS 22. The result of factor analysis on the three instruments developed can be presented on the table 6.  Table 3 is the analysis test result of KMO on the instrument of grade 3. Table 3 show that the value of KMO on the instrument of grade 3 is 0.819. the result is bigger than 0.50 which means that the three of trial samples used in this instrument is stated enough. The matrix can be conduct factor analysis if the value of KMO is bigger than 0.5.  Table 4 is the eigen value of math test instrument of grade III. The number factors formed can be view from eigen value >1, which means that the factor used as an indicator (Wagiran, 2014: 302).On the instrument of grade III show that from 40 item of questions form 13 factors, where the factor 1 is dominant factor with the eigen value is 7.331. factor 1 as dominant factor is a factor having the highest eigen value compared with other factors, therefore, it can be stated that the instrument developed is unidimensional.
Dimension that is measured in a data can be proved on the result of scree plot, i.e., the amount of steep. The number of steeps show the number of dimension or factor and the change of eigen value do not show the existence of dimension (Retnawati, 2016: 142). Therefore, unidimensional also can be viewed from the result of scree plot formed. The test is stated unidimensional when the component 1 and 2 in scree plot have the distance which is far enough (Furr& Bacharach, 2008: 74).  Figure 1 is scree plot exploratory analysis result of analysis factor from the instrument of grade III. Figure 1 show that all instruments on component 1 have far range with component 2, while component 2 to component 3 has really close range. This case indicates that there is one dominant factor and other factors give a great contribution toward the variant that can be explained. Based on the scree plot above, all instrument developed in this research is considered unidimensional.

Local Independence Test
Assumption of local independence isthe requirement that should also be fulfilled if using IRT analysis. This assumption test aims to figurewhether the student's skill is independent toward the item, which means that the student's answer to one item will not affect the answer to the other item. The assumption test of local independence can be proven automatically after proven by the unidimensional of participants' response data to the test (Retnawati, 2014: 7). However, local independence assumption tests can also be proven through a covariant matrix based on the skill of students that classified into several groups. This assumption is fulfilled if the covariance value between the skill interval is small or close to zero. Therefore, if the covariant value is close to zero, then it can be concluded that it fulfils the assumption of local independence. Table 5. Test Result of Local Independence Table 5 is covariant matrix based on students' skill of grade III. The table indicates that matrix value of variant-covariant among groups of students' skills. Based on the analysis result, it is found out that the variant covariant value among groups of intervals of students' skill that form diagonal line is small even close to zero. Therefore, there is no correlation and it can be concluded that local independence has been fulfilled.  Figure 3 is parameter invariance analysis result of grade III. The result of scree plot of grade III explains that the estimation result is really close to straight line and the correlation value is 0.9766 which includes in very high category. The parameter invariance assumption of skill can be concluded that it has been fulfilled.

Test f Goodness of Fit
The three assumptions for IRT analysis have been fulfilled well, so it can be conducted the goodness fit test model for test analysis that has been developed. Goodness of fit tests model for 1-PL, 2-PL, or 3-PL were performed by comparing the valueof 2 .. The probability value of each item shouldfulfil p>0.05, otherwise revision isconducted before the instrument testing is conducted. The goodness of fit test was analysed by using the support of MG Bilog program. The following table is the result of goodness of fit model analysis that has been done.  Table 6 is the result of goodness of fit model. Table 6 show that the number of items which are suitable for model 1PL and 2PL is 40 items or the overall items are suitable for the models. Meanwhile, on the model of fit item on model 3PL is 31 items. The result on goodness of fit test indicates that the most model which are suitable are model 1PL and 2PL. Based on the case, the model used in this research is model 2PL.

Parameter Estimation of Item Question
The analysis used to figure out the characteristics of a good item is by using 1 PL model. Items that fit the model with 2 PL are then reanalysed to figure out the characteristics of the item. The criteria for a good item according to model 2 PL are based on the different trick (ai) and difficulty level of item (bi). Theindex of different trick of item can be stated good if it is between 0-2. Besides that, an item can be stated good if the index of difficulty level range between -2 to +2 (Hambleton &Swaminathan, 1985: 107). The following is the parameter estimation result of item question developed.  7 indicates that the overall is fulfilled good criteria both parameter of difficulty level (bi) and different trick (ai). On the parameter of difficulty level show the maximum value which is 0.509 and the minimum value is -1.665, while the average of difficulty level is -0.590. Theparameter of different trick shows the maximum value which is 0.885 and the minimum value is 0.459, while the average value is 0.670. based on those results, it can be concluded that the overall items on the parameter of difficulty level and different trick include in good category and ready to be used to conducted the process of measurement.

Information function(IF) dan standard error measurement (SEM)
Information function is used to reveal the latent skill which measure by using the test through item contribution. Information function of test is also the number of function of each item. Information function is inversely proportional with measurement error or standard error measurement. The value of information function of test instrument will be high if the items of test arrangement have high information function. The following is the curve of the relation between information function and error measurement on each class. The following is IF and SEM analysis result.

Students' Skill Measurement
Students' skill in this research is viewed based on the score logit with very high, high, medium, low and very low category. The number of students samples in grade 3 is 228 students. The following is students' skill category in grade 3. Students' skill descriptive is an analysis result about students' skill based on the score logit. Analusis of students' skill in this research use Bilog-MG softwere. The following is the analysis result of students' skill in each class. students' skill by using Bilog-MG program, then the result of skill is visualized into bar chat to view its data distribution and changed into pie chat to view its percentage. Figure 5 and 6 show the number of students in very high category is 6 students or 2.6 percent, in high category is 33 students or 14.5 percent, in medium category is 31.6 percent, in low category is 87 students or 38.2 percent and in very low category is 30 students or 13.2 percent.
The result data of skill analysis, then viewed based on skill maximum, minimum and average score. This case is conducted to find out the descriptive achievement of students' skill based on score logit. The following is analysis result that has been conducted.  Table 9 is the result of descriptive analysis of skill data that has been conducted. The table 9 explain that students' skill of maximum score is 3.185, while the minimum score is -2.282. the average score of students' skills is 0.451.
If we observe students' skill distribution on table 5. It can be concluded that students' distribution, students' skill tendency is on medium, low and very low category. However, if it is observed on table 9, students' skill includes in very good category, this case is proven by the average score of students is 0.451. This case explains that the probability of students in doing the test item with the maximum average of difficulty level is 0.451.the difficulty level is on medium category since the difficulty level of medium categoryis between -2 to 2.
If it is observed from the maximum score of skill on the table 9, show that the score is 3.185. the score indicates that students are able to answer rightly the item of test with the characteristics of difficulty level is 3.185. the case show that students are in category having very high skill due to students are able to do the items with high difficulty level. Item with high difficulty level is the item having > 2 of score logit.
If observed from students; minimum score on the table 9 show that the score is -2.282. the score explains that the opportunity of students answers the question rightly only on the item question with the characteristic of maximum difficulty level which is -2.282. the case indicates that students obtaining the score are only able to do the item test with the characteristics -2.282. if observed from the characteristic of question on the level of difficulty, the students include in low category since the students are only able to do the test on the characteristic -2.282.
The exposure above conclude that there is a gap which is far enough between students with medium average skill and students with high skill. Based on the result and discussion above, it explains that the average of students' skill is on medium category and it can be concluded that students in measurement sample have a good mathematics skill.

CONCLUSION
Based on the result of development and discussion toward the instrument development to measure students' mathematics skill in elementary school, it can be concluded as follow: 1. The range of aiken value for the instrument test of grade 3 is 0.83-1. While based on aiken table if the number of items is 40, which consist of 5 criteria and 3 raters, the minimum range accepted is 0.92. based on the data it can stated that all items are proven valid based on the content validity. Based on the analysis result show that coefficient reliability of instrument test of math skill in grade 3 is 0.883, so it can be concluded that all instruments developed can be stated reliable.
2. The result of precondition test show that unidimensional test is fulfilled due to the test is proven only measure one dominant dimension such as the same skill. Local Independence assumption test is also fulfilled since the covariant value among intervals of skill is small or close to zero. The calculation result of correlation between the difficulty level of response include in high category, therefore the parameter invariance assumption of skill is fulfilled.
3. Based on the analysis of three instruments, the result of goodness of fit is suitable on the model 2PL, so in parameter estimation of item of all packages estimate on 2Pl model or parameter b and a (difficulty level and difference trick). Based on the parameter analysis result of item test grade 3 show that the overall items is on the categories of good difficulty level and difference trick. The case indicates that the overall items is accepted and reliable to used to measure the development of students' math skill of elementary school.