A COMPARISON OF MANTEL-HAENSZEL AND STANDARDIZATION METHODS: DETECTING DIFFERENTIAL ITEM FUNCTIONING
Abstract
Abstrak:
Tujuan penelitian ini untuk meninjau sensitivitas dua metode yaitu metode Mantel-Haenszel (MH) dan metode Standarisasi dalam deteksi perbedaan fungsi butir atau Differential item functioning (DIF). Sensitivitas ditinjau dari banyaknya butir DIF. Data yang digunakan dalam penelitian ini adalah data generasi dengan menggunakan program Wingen3 yang berbentuk respons dikotomi sebanyak 3054. Ukuran sampel (200 and 1000) respons untuk kelompok referensi dan (200 and 1000) respons untuk kelompok fokus. Sampel diambil secara acak sebanyak 35 replikasi. Distribusi kemampuan kedua kelompok adalah distribusi normal dengan rata-rata dan varians yaitu 0 dan 1. Hasil penelitian menunjukkan bahwa metode MH lebih sensitif dari pada metode standarisasi dalam deteksi DIF untuk sampel 400 maupun sampel 2000. Dari hasil penelitian ini ditemukan bahwa ada kemungkinan metode standarisasi lebih unggul ketika menggunakan sampel yang kecil atau jumlah anggota populasi kelompok fokus dan referensi tidak seimbang, dimana kelompok fokus lebih sedikit dibandingkan kelompok referensi.
Abstract:
The purpose of this study was to review the sensitivity of the two methods, the Mantel-Haenszel (MH and the Standardization methods to detect differences in function items (DIF). Sensitivity was based on the number of DIF grains. The data used in this study were generation data using the Wingen3 program in the form of a response dichotomy of 3054. Sample size was (200 and 1000) responses for the reference group and (200 and 1000) responses for the focus group. Samples were taken randomly as many as 35 replications. The distribution of the ability of the two groups was normal with average and variance, 0 and 1 respectively. The results of the study indicated that MH method were more sensitive than standardization method in DIF detection for samples of 400 and 2000. The finding also assumed there were possibility that standardization method was supreme when using a small sample or the number of population members of the focus group and reference was not balanced, while the focus group was less than the reference group.
Downloads
References
Baker, F. B. (1981). A criticism of scheuneman’s item bias technique. Journal of Educational Measurement, 18(1), 59–62.
Berk, R. A. (1982). Handbook of methods for detecting test bias. Baltimore, Maryland: The Johns Hopkins University Press.
Budiyono. (2005). Perbandingan metode Mantel-Haenszel, SIBTEST, regresi logistik dan perbedaan peluang dalam mendeteksi keberadaan DIF. Universitas Negeri Yogyakarta.
Dorans, N. J. (1989). Applied measurement in education two new approaches to assessing differential item functioning : Standardization and the Mantel-Haenszel method. Applied Measurement in Education, 2(3), 217–233.
Dorans, N. J., & Holland, P. W. (1992). DIF detection and description: Mantel-Haenszel and Standardization. New Jersey.
Dorans, N. J., & Kulick, E. (1983). Assessing Unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An Application of the Standardization Approach. New Jersey.
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the Standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23(4), 355–368.
Dorans, N. J., & Kulick, E. (2006). Differential Item functioning on the mini-mental state examination state examination: an application of the Mantel-Haenszel and standardization procedures. Medical Care, 44(11), 107–114.
Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1988). The standardization approach to assessing differential speededness.
Fidalgo, Á. M., Ferreres, D., & Muñiz, J. (2004). Liberal and Conservative differential item functioning detection using Mantel-Haenszel and SIBTEST : Implications for type I and type II error rates. Journal of Experimental Education, 73(1), 23–39.
Gierl, M., Khalid, S. N., & Boughton, K. (1999). Gender differential item functioning in mathematics and science : Prevalence and policy implications. In Improving Large-Scale Assessment in Education (pp. 1–25). Canada: Centre for Research in Applied Measurement and Evaluation University of Alberta Pap.
Guler, N., & Penfield, R. D. (2009). A comparison of the logistic regression and contingency table methods for simultaneous detection of uniform and nonuniform DIF. Journal of Educational Measurement, 46(3), 314–329.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. California: SAGE Publications Inc.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. Braun (Eds.). In Test Validity (pp. 129–145). Erlbaum: Hillsdale, NJ.
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Psicothema, 453.
Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory, application to psychological measurement. Illinois: Down Jones-Irwin Homewood.
Jensen, A. R. (1980). Bias in mental testing. New York: A Division of Macmillan Publishing Co., Inc.
Masters, G. N., & Keeves, J. P. (1999). Advances in measurement in educational research and assessment. United Kingdom: Elsevier Science Ltd.
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias, 7(2), 105–118.
Muniz, J., Hambleton, R. K., & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115–135.
Naga, D. S. (1992). Pengantar teori skor pada pengukuran pendidikan. Jakarta: Besbats.
Narayanan, P., & Swaminathan, H. (1996). Identification of Items that show nonuniform DIF. Applied Psychological Measurement, 20(3), 257–274.
Ong, Y. M. (2010). Understanding differential functioning by gender in mathematics assessment. University of Manchester for the degree of Doctor of Philosophy.
Rogers, H. J., Swaminathan, H., & Jane, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105–116.
Spray, J. A. (1989). Performance of three conditional DIF statistics in detecting differential item functioning on simulated tests, (October).
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedure. Journal of Educational Measurement, 27(4), 361–370.
Teresi, J. A., Ramirez, M., Lai, J.-S., & Silver, S. (2008). Occurrences and sources of differential item functioning (DIF) in patient-reported outcome measures: Description of DIF methods, and review of measures of depression, quality of life and general health. Psychol Sci Q, 50(4), 538.
Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A theoretic comparison of methods. EM, (60).
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Copyright (c) 2019 Ahmad Rustam, Dali Santun Naga, Yetti Supriyati
This work is licensed under a Creative Commons Attribution 4.0 International License.