Assessing Individual and Group Oral Exams: Scoring Criteria and Rater Interaction

Özlem Yalçın-Çolakoğlu, Merve Selçuk


Criterion referenced tests of second language speaking performance are administered in different institutions using different procedures. The present study reports raters’ practices of second language speaking tests, in particular the correspondence between test-takers’ grades when assessed individually and in groups. Data derived from audio-recordings of raters’ (n=8) decision making process (scoring) in two test modes, post-test interviews and two sets of students’ (n=92) speaking scores were obtained from individual versus group discussion tasks. Although a grading rubric had been used, it was found that raters also relied on rubric irrelevant criteria when judging performances, which brings up the question whether the validity of the inferences is jeopardized.


Individual and Group Tasks, Rater, Rubric, Scoring criteria, Speaking

Full Text:



Arter, J., & McTighe, J. (2001). Scoring rubrics in the classroom: Using performance criteria for assessing and improving student performance. Corwin Press.

Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign languages speaking. Language Testing, 12(2), 238–257.

Black, P. J. (1998). Testing, friend or foe?: the theory and practice of assessment and testing. Psychology Press.

Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89-110.

Brown, A. (2005). Interviewer variability in oral proficiency interviews. P. Lang.

Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. In R. Tulloh (Ed.), IELTS research reports (vol. 3, pp. 49-84). Canberra, Australia: IELTS Australia.

Busching, B. (1998). Grading inquiry projects. New directions for teaching and learning, 1998(74), 89-96.

Douglas, D. (1994). Quantity and quality in speaking test performance. Language Testing, 11(2), 125-144.

Ducasse, A. M., & Brown, A. (2009). Assessing paired orals: Raters' orientation to interaction.Language Testing, 26(3), 423-443.

Ericsson, K., & Simon, H. (1993). Protocol analysis: Verbal reports as data (revised edition). Cambridge, MA: MIT Press.

Folland, D., & Robertson, D. (1976). Towards Objectivity in Group Oral Testing. English Language Teaching Journal, 30(2), 156-167.

Fulcher, G. (1996). Testing tasks: issues in task design and the group oral. Language Testing, 13(1), 23-51.

Green, A. (1998). Verbal Protocol analysis in language testing research: A handbook (Vol. 5). Cambridge: Cambridge University Press.

Holzbach, R. L. (1978). Rater bias in performance ratings: Superior, self-, and peer ratings. Journal of Applied Psychology, 63(5), 579.

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational research review, 2(2), 130-144.

Kramsch, C. (1986). From language proficiency to interactional competence. The Modern Language Journal, 70(4), 366-372.

Lado, R. (1961). Language Testing: The Construction and Use of Foreign Language Tests. A Teacher’s Book. New York: McGraw Hill.

Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.

Luoma, S. (2004). Assessing speaking. Ernst Klett Sprachen.

May, L. (2006a). An examination of of rater orientations on a paired candidate discussion task through stimulated recall. Melbourne Papers in Language Testing, 11(1), 29–51.

May, L. (2006b). ‘Effective interaction’ in a paired candidate EAP speaking test. Paper presented at the 28th Annual Language Testing Research Colloquium in Melbourne, Australia, July 2006.

Messick, S. (1996). Validity and washback in language testing. ETS Research Report Series,1996(1), i-18.

Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research & Evaluation, 7, 71–81.

McNamara, T. F. (1997). ‘Interaction in second language performance assessment: Whose performance? Applied linguistics, 18(4), 446-466.

Norton, J. (2005). The paired format in the Cambridge Speaking Tests. ELT journal, 59(4), 287-297.

Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret test scores. System, 30(2), 143-154.

O’Sullivan, B., Weir, C. J., & Saville, N. (2002). Using observation checklists to validate speaking-test tasks. Language Testing, 19(1), 33-56.

Shohamy, E., Reves, T., & Bejarano, Y. (1986). Introducing a new comprehensive test of oral proficiency. ELT Journal, 40(3), 212-220.

Perlman, C. C. (2003). Performance Assessment: Designing Appropriate Performance Tasks and Scoring Rubrics.

Van Lier, L. (1989). Reeling, writhing, drawling, stretching, and fainting in coils: Oral proficiency interviews as conversation. TESOl Quarterly, 489-508.

Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing, 29(3), 325–344.

Wiggins, G. (1998). Educative assessment. San Francisco: Jossey-Bass.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2010-2019 (CC-BY) Australian International Academic Centre PTY.LTD.

Advances in Language and Literary Studies

You may require to add the '' domain to your e-mail 'safe list’ If you do not receive e-mail in your 'inbox'. Otherwise, you may check your 'Spam mail' or 'junk mail' folders.