Assessing Yemeni EFL learners’ Oral skills via the Conceptualization of Target Language Use Domain: A Testing Framework

Sami A. Al-wossabi


There is an evident lack of a comprehensive evaluation basis for Yemeni learners’ speaking skills in the English department, Hodeidah University. The present paper presents a detailed framework of oral assessment criteria that involves a description of target language use domains and then shows how such domains can be systematically related to test design. The framework takes as its main goal the development and description of a criterion referenced rating scale representing real-world criterion elements. The aim of the testing framework, therefore, is to ensure maximum appropriateness of score test interpretations and maximize the validity and fairness of local speaking tests. A five-point likert scale is carried out to elicit 10 trained raters’ perceptions of using the pilot scale. The research findings support the use and appropriateness of the scale as it aids raters identify underlying aspects of their learners’ oral discourse that cannot be observed in traditional discrete point tests.



Target language use domain (TLU), performance based-tests, real language use, rating sale, test fairness, construct validity, language descriptors

Full Text:



Bachman, L. F. (2002). Some reflections on task-based language performance assessment. Language Testing, 19, 453-476.

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press.

Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12(2), 239-257.

Bachman, L. F. and Savignon. S. J. (1986). The evaluation of communicative language proficiency: A critique of the ACTFL Oral Interview. Modern Language Journal 10(4), 380-90.

Bailey, K. M., & Savage, L. (Eds.), (1994). New ways in teaching speaking. Alexandria, VA: TESOL.

Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E. (1999). Longman grammar of spoken and written English. Harlow, England: Pearson Education.

Brindley, G. (1998). Describing language development? In L. F. Bachman & A. D. Cohen (Eds.), Interfaces between second language acquisition and language testing research (pp.112-140). Cambridge: Cambridge University Press.

Brown, A. (1995). The effect of rater variables in the development of an occupation-specific language performance test. Language Testing, 12(1), 1-15.

Caban, H. L. (2003). Rater group bias in the speaking assessment of four L1 Japanese ESL students. Second Language Studies, 21(2), 1-43.

Chalhoub-Deville, M. (2001. Task-based assessments: Characteristics and validity evidence. In M. Bygate, P. Skehan, & M. Swain (Eds.), Researching pedagogic tasks: Second language learning, teaching and testing (pp.210–28). Harlow, England: Longman.

Douglas, D., & Selinker, L. (1993). Performance on a general versus a field-specific test of speaking proficiency by international teaching assistants. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing (pp. 235-56). Alexandria, VA: TESOL.

Elder, C., Barkhuizen, G., Knoch, U., & von Randow, J. (2007). Evaluating rater responses to an online rater training program for L2 writing assessment. Language Testing, 24(1), 37– 64. 10.1177/026553220707151

Elder, C., Iwashita, N., & McNamara, T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? Language Testing, 19(4), 347-368.

Fulcher, G., Davidson, F., & Kemp, J. (2011). Effective rating scale development for speaking tests: Performance decision trees. Language Testing, 28 (1), 5-29.

Fulcher, G., & Márquez Reiter R. (2003). Task difficulty in speaking tests. Language Testing, 20, 321-344. 10.1191/0265532203lt259oa

Fulcher, G. (1996). Testing tasks: Issues in task design and the group oral. Language Testing, 13(1), 23-51.

Fulcher, G. (1987). Tests of oral performance: The need for data-based criteria’. ELT Journal,41(4), 287-91.

Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241–276). Norwood, NJ: Ablex Publishing Corporation.

Golebiowska, A. (1990). Getting students to talk. Great Britain: Cambridge University Press.

Gorsuch, G. (2001).Testing textbook theories and tests: The case of suprasegmentals in a pronunciation textbook. System, 29, 119-136.

Jenkins, S., & Parra, I. (2003). Multiple layers of meaning in an oral proficiency test: The complementary roles of nonverbal, paralinguistic, and verbal behaviors in assessment decisions. The Modern Language Journal, 87, 90-107. 4781.00180

Kim, Y. (2009). An investigation into native and non-native teachers' judgments of oral Englishperformance: A mixed methods approach. Language Testing, 26(2), 187–217.

Lazaraton, A. (1992). The structural organization of a language interview: A conversation analytic perspective. System, 20, 373-386.

Lumley, T. (2005). Assessing second language writing. The rater’s perspective. Frankfurth: Peter Lang.

Matthews, M. (1990). The measurement of productive skills: Doubts concerning the assessment criteria of certain public examinations. ELT Journal, 44(2), 117–121.

Norris, J., Brown, J., Hudson, T., & Yoshioka, J. (1998). Designing second language performance assessments (Technical Report #18). Honolulu, HI: University of Hawai‘i, Second Language Teaching & Curriculum Center.

North, B. (2003). Scales for rating language performance: Descriptive models, formulation styles, and presentation formats. TOEFL Monograph 24. Princeton NJ Educational Testing Service

North, B. (1995). The development of a common framework scale of descriptors of language proficiency based on a theory of measurement. System, 23(4), 445–465.

North, B., & Schneider, G. (1998). Scaling descriptors for language proficiency scales. Language Testing, 15(2), 217–263.

O'Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19169-192.

Orr, M. (2002). The FCE Speaking test: Using rater reports to help interpret test scores. System, 30, 143-154.

O'Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System, 28, 373-386.

Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 21(1), 27-57.

Roever, C. (2005). “That’s not fair!” Fairness, bias, and differential item functioning in language testing. Retrieved from

Taylor, L. (2006). The changing landscape of English: Implications for language assessment. ELT Journal, 60 (1), 51-60.

Underhill, N. (1987). Testing spoken language. Cambridge: Cambridge University Press.

Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second-language speaking ability: Test method and learner discourse. Language Testing, 16(1), 82–111.

Upshur, J. A., & Turner, C. E. (1995). Constructing rating scales for second language tests. ELT Journal, 49 (1), 3–12.

Winke, P., Gass, S., & Myford, C. (2013). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30 (2), 231-252. . 10.1177/0265532212456968



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2012-2019 (CC-BY) Australian International Academic Centre PTY.LTD

International Journal of Applied Linguistics and English Literature

To make sure that you can receive messages from us, please add the journal emails into your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.