Cross-cultural Measurement Invariance of the Items in the Science Literacy Test in the Programme for International Student Assessment (PISA-2015)

Betül Alatlı


This study aimed to investigate cross-cultural measurement invariance of the PISA (Programme for International Student Assessment, 2015) science literacy test and items and to carry out a bias study on the items which violate measurement invariance. The study used a descriptive review model. The sample of the study consisted of 2224 students taking the S12 test booklet from Australia, France, Singapore, and Turkey. Measurement invariance analyses for the test were done using Multi-Group Confirmatory Factor Analysis (MGCFA). Differential Item Functioning (DIF), in other words, measurement invariance of the test items, was analyzed using the item response theory log-likelihood ratio (IRTLR), Hierarchical Generalized Linear Model (HGLM), and the Simultaneous Item Bias Test (SIBTEST) methods.According to the findings, the test was determined to exhibit structural invariance across cultures. The highest number of items showing DIF was observed in the comparisons of Australia-Singapore and Australia-France with 35%. The number of items showing DIF, with 24%, determined in bilateral comparisons which included Turkey, the only country taking the translated form among other countries, did not show a significant difference compared to the other comparisons. While the lowest number of items showing DIF was obtained from Singapore-France samples with 12%, the rate of items indicating DIF in the France-Turkey samples was 18%. On the other hand, 35% of the items showed cross cultural measurement invariance. An item bias study was carried out based on expert opinions on items identified and released as showing DIF in the comparisons of Turkey with Australia and Singapore.According to the findings, translation-bound differentiation of the items, familiarity of a culture group with the contents of the items, polysemy in the expressions or words used in the items, the format, or the stylistic characteristics of the items were determined to be the cause of the bias in the skills measured with the items.


Differential Item Functioning, Item Bias, Measurement Invariance, Pisa, Science Literacy

Full Text:



American Educational Research Association, American Psychological Association, National Council on Measurement in Education [AERA/APA/NCME]. (1999). Standards for educational and psychological testing. Washington: American Psychological Association.

Atalay, K., Gök, B., Kelecioğlu, H. & Arsan, N. (2012). Değişen madde fonksiyonunun belirlemesinde kullanılan farklı yöntemlerin karşılaştırılması bir simülasyon çalışması. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi,43, 270-281.Retrieved from

Başusta, N. B. (2013). Differential item functioning analysis of PISA 2006 science achievement test in terms of culture and language (Unpublished doctoral dissertation). Hacettepe University, Ankara, Turkey

Camilli, G. & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.

Çepni, Z. (2011). Değişen madde fonksiyonlarının SIBTEST, Mantel Haenzsel, Lojistik Regresyon ve Madde Tepki Kuramı yöntemleriyle incelenmesi. (Unpublished doctoral dissertation). Hacettepe University, Ankara, Turkey.

Demirtaşlı, R. N. (2014). Öğrenme, öğretim ve değerlendirme arasındaki ilişkiler. In N. Demirtaşlı (Ed), Eğitimde ölçme ve değerlendirme (3-29). Edge Akademi: Ankara

Embretson, S. E. & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum Publishers.

Ercikan, K., Roth, W. M. & Asil, M. (2015). Cautions about inferences from international assessments: The case of PISA 2009. Teachers College Record, 117, 1–28.

Fan, X. & Sivo, S. A. (2007). Sensitivity of fit indices to model misspecification and model types, Multivariate Behavioral Research, 42(3), 509-529. doi:10.1080/00273170701382864.

Fraenkel, J. R. & Wallen, N. E. (2006). How to design and evaluate research in education (6th ed.). New York: McGraw-Hill.

Gierl, M. J. (2000). Construct equivalence on translated achievement tests. Canadian Journal of Education, 25(4), 280-296. doi: 10.2307/1585851.

Gierl M. J. & Khaliq S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement; 38(2), 164-187. doi: Gierl, M. H., Khaliq, S. N., & Boughton, K. (1999, June 7-11). Gender differential item functioning in mathematics and science: Prevalence and policy implications. In Annual Meeting of the Canadian Society for the Study of Education, Canada Retrieved from

Gök, B., Kelecioğlu, H. & Doğan, N. (2010). Degişen madde fonksiyonunu belirlemede Mantel-Haenzsel ve Lojistik Regresyon tekniklerinin karşılaştırılması. Egitim ve Bilim, 35, 3-16. Retrieved from

Grisay, A. (2003). Translation procedures in OECD/PISA 2000 international assessment. Language Testing, 20(2), 225-240. doi:10.1191/0265532203lt254oa

Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44, 182-188.doi:10.1097/01.mlr.0000245443.86671.c4

ITC (2005). International test commission guidelines for test adaptation. London: Author.

Johnson, T. P. (1998). Approaches to equivalence in cross-cultural and cross-national survey research. ZUMA-Nachrichten Spezial, 3, 1-40. Retrieved from

Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38(1), 79-93. Retrieved from

Kankaras, M. & Moors, G. B. D. (2014). Analysis of cross-cultural comparability of PISA 2009 scores. Journal of Cross-Cultural Psychology, 45(3), 381-399. doi:10.1177/0022022113511297

Le, L. T. (2006, April 7-11). Analysis of differential item functioning. In Annual Meeting of American Educational Research Association in San Francisco. Retrieved from

Mahler, C. (2011). The effects of misspecifcation type and nuisance variables on the behaviors of population fit indices used in structural equation modeling. B.A: The University of British Columbia.

Niemann, D., Martens, K. & Teltemann, J. (2017). PISA and its consequences: Shaping education policies through international comparisons. European Journal of Education, 52(2), 175-183. doi:10.1111/ejed.12220

OECD (2016). PISA 2015 results (Volume I): Excellence and equity in education. PISA.Paris: OECD Publications

OECD (2017). PISA 2015 technical report. Paris: OECD Publications. Retrieved from

Osterlind, S. J. & Everson, H. T. (2009). Differential item functioning. Thousand Oaks. CA: SAGE Publications.

Pan, T. (2008). Using the multivariate multilevel logistic regression model todetectdıf: a comparison with HGLM and Logistic Regression DIF detection methods (Unpublished doctoral dissertation). Michigan State University, Michigan, USA.

Raju, N. S., Laffitte, L. J. & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87(3), 527-529. doi:10.1037//0021-9010.87.3.517

Raudenbush, S.W. & Bryk, A.S. (2002). Hierarchical linear models. Newbury Park, CA: Sage.

Rawls, A. M. W. (2009). The importance of test validity: An examinatıon of measurement ınvariance across subgroups on a reading test (Unpublished doctoral dissertation). University of South Carolina, South Carolina, USA.

Shealy, R. T. & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In P. Holland & H. Wainer (Eds.), Differential item functioning (197–240). Hillsdale, NJ: Erlbaum.

Sireci, S. G. & Swaminathan, H. (1996, October). Evaluating translation equivalence: So what’s the big DIF? In Annual Meeting of the NortheasternEducational Research Association, Ellenville, NY. Retrieved from

Sjøberg, S. (2015). PISA and global educational governance–A critique of the project, its uses and implications. Eurasia Journal of Mathematics, Science & Technology Education, 11(1), 111-127. doi:10.12973/eurasia.2015.1310a

Stark, S., Chernyshenko, O. S. & Drasgow, F. (2006). Detecting differential ıtem functioning with comfirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292- 1306.doi:10.1037/0021-9010.91.6.1292.

Tabachnick, B. G. & Fidell, L. S. (2007). Using multivariate statistics. New York: Allyn and Bacon.

Ulutaş, S. (2015). A Study on detecting of differential item functioning of PISA 2006 science literacy items in Turkish and American samples. Eurasian Journal of Educational Research, 58, 41-60. doi:10.14689/ejer.2015.58.3.

Uzun, N. B. & Gelbal, S. (2017). PISA fen başarı testinin madde yanlılığının kültür ve dil açısından incelenmesi. Kastamonu Eğitim Dergisi, 25(6), 2427-2446. Retrieved from

Vandenberg, R. J. & Lance, C. E. (2000). A review and synthesis of the MI literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69.doi:10.1177/109442810031002.

Yıldırım, H. H. & Berberoğlu, G. (2009). Judgmental and statistical DIF analyses of the PISA-2003 mathematics literacy items. International Journal of Testing, 9(2), 108-121.doi:10.1080/15305050902880736.

Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao and S. Sinharay (Eds.), Handbook of Statistics, Psychometrics, 26, 45-79, The Netherlands: Elsevier Science B. V

Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic Regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2013-2020 (CC-BY) Australian International Academic Centre PTY.LTD.

International Journal of Education and Literacy Studies  

You may require to add the '' domain to your e-mail 'safe list’ If you do not receive e-mail in your 'inbox'. Otherwise, you may check your 'Spam mail' or 'junk mail' folders.