Appraising Pre-service EFL Teachers' Assessment in Language Testing Course Using Revised Bloom's Taxonomy

The teachers need to be conceived as a “change agent” and not as a mere transmitter of knowledge and culture. In developing countries like Iran, one of the most significant concerns in the field of teachers’ education is efficiency of pre-service programs. To this aim, the current descriptive-evaluative study intended to describe the state of pre-service teachers' assessment in the field of language testing by (a) examining the exam questions to find out whether they are aligned with curriculum objectives and syllabus (content validity, (b) exploring whether they take care of higher order cognitive processes and (c) finding what combinations of cognitive process levels and knowledge types in Revised Bloom's Taxonomy are prevalent in the questions. The results exhibited an unbalanced coverage of content in exams. Also the questions were found to be inadequate in terms of measuring complex cognitive skills (Analyze and Evaluate); Remember and Understand domains take up 91.6 % of all questions and no item was found for Create. Three combinations of cognitive process level and knowledge type was dominant in the data set: (1) Remember Factual Knowledge, (2) Understand Conceptual Knowledge, and (3) Apply Procedural Knowledge. These associations confirm the Anderson and Krathwohl's (2001) proposition.


Statement of the Problem
Rationale of curriculum planning and different reforms in the domain of education system in Iran and world has consistently indicated improvement of assessment-measurement system and its alignment with theories and ultimate goals of education as being the most salient element of success in educational reform and fundamental shift in quality of school teaching and learning (Rabihavi, et al. 2011). Assessment determines whether the process of learning leads to success or failure (Dochy, 2009;Kozhageldiyeva, 2005). To find out whether evaluation process has been carried out accurately needs evaluation itself. Popham (1987) has referred to measurement-oriented reforms in education and believes amending educational movement results in curriculum reform. After examining the role played by assessment in reforming science curriculum, Orpwood (2001) also found out lack of proper attention to assessment as a reason for some of the problems. He reminds the demand on experts for more attention towards the role of assessment.
In recent years the term 'assessment literacy' has been coined to denote what teachers need to know about assessment (Huttner, et al., 2011). Traditionally it was regarded as the ability to select, design and evaluate tests and assessment procedures as well as to score and grade them on the basis of theoretical knowledge. More recent approaches embrace a broader understanding of the concept when taking the implications of assessment for teaching into account. Two key questions are asked: (a) What does an assessment tell students about the achievement outcomes we value? (b) What is likely to be the effect of this assessment on students? (Stiggins, 1991). It's essential for assessment literates to know and understand the key principles of sound assessment and translate these into quality information about students' achievements and effective instruction. Boyles (2005, p. 18) points to language teachers' need of the necessary tools for Flourishing Creativity & Literacy analyzing and reflecting upon test data in order to make informed decisions about instructional practice and program design. In their views (Boyles, 2005;Stiggins, 1991) the notion of assessment literacy extends beyond technical knowledge about how to select and create appropriate assessment instruments for specific purposes to include the ability to analyze empirical data to improve instruction. In other words, being literate in assessment involves a move away from a passive interpretation towards an active application of data that will impact on teaching and assessment is perceived as a means to promote learning rather than merely observe and record it, hence 'assessment' for learning (Stiggins, 2002).
Several studies in a range of educational contexts indicates that "the typical teacher can spend as much as a third to a half of his or her professional time involved in assessment-related activities" (Newsfields, 2006;Stiggins, 1999 p. 23). Additionally in the global economy of the 21st century, students will need to understand the basics, but also to think critically, to analyze, and to make inferences. Helping students develop these skills will require changes in assessment at the school and classroom level, as well as new approaches to large-scale, high-stakes assessment. These points imply the need for future teachers to recognize the importance of higher-order thinking and problem solving skills. They are to be their students' model of learning that makes the mission of teacher preparatory programs and teacher trainers more demanding. These programs are to create essential sensitivity among teacher trainees toward assessment. This study seeks to investigate the state of teacher trainees' assessment in the field of language testing to identify potential weaknesses and inconsistencies.

Significance of the Study
Iranian education system is faced with several complications (Rabihavi, et al. 2011;Molaeenezhad & Zekavati, 2007); among these, evaluation practice can be mentioned which entirely relies on retrieving knowledge. It suffices for learners to memorize content and use only retrieving capacity. On many occasions this evaluation system fails to assess and measure abilities like reasoning, processing, synthesizing and accurate judgment among learners. The questions constructed and administered by teachers are necessary instruments for measuring such abilities. Examining teachers' exam questions at school has found these questions and progress tests incompatible with measurement criteria (Rabihavi, et al. 2011). This might be attributed to the state of teacher education in Teacher Training Centers; Teacher educators were not able to create due attention to testing standards and psychometric principles among student teachers.
For making sound inferences about students' abilities and subsequently directing their teaching, language teachers need a strong knowledge of assessment practices. Only if do they understand the basic principles of classroom assessment their efforts to improve teaching and learning based on assessment results will be effective. Addressing relevant questions, findings of the current study would help understand the problem better and can be used for improving and reforming evaluation system and teaching methods. They may somehow shed light on needed curriculum in teacher training domain. Given the significance of assessment courses in general and language assessment literacy in particular, there is a need to take a closer look at the status quo of these courses in Iranian context. Brindley (2001) has referred to the allegedly heated issue of future language teachers' preparation in the field of foreign language testing and evaluation in recent years. O'Loughlin (2006, p. 71) argued that second language assessment is a "notoriously difficult domain of knowledge" for students in second language teacher education programs because of the high theoretical complexity of key concepts like reliability, validity, and practicality and the need to be balanced against each other in designing and using assessment instruments. Developments in the domain of language testing are different from language teaching; however they are closely tied to each other (Johnson & Johnson, 2001, p. 187). Construction of good test items is believed to be a demanding task for teachers since it involves "a psychological understanding of pupils, sound judgment, persistence, and a touch of creativity" as well as field knowledge and clear view of the desired outcomes (Groulund, 1985). Jin (2010, p. 556) brings up the urgent need for teachers to be thoroughly trained in language assessment concepts, skills and strategies. Assessment is an increasingly important domain of language teachers' expertise as the professional demands on them to accurately assess their students increases as the theory and practice of assessment continues to mature (O'Loughlin, 2006, p. 71;Newsfields, 2006;Brindley, 2001). Shohamy (2005, p. 107) also argues that professional development in assessment is not a question merely of demonstrating the technical 'tricks of the trade'. She argues for "… the need to expand the role of teacher education programs in which teachers are exposed not only to the procedures and methods of testing and assessment but also to aspects related to the consequences of tests." Very recently Lam (2014) explored the overall language assessment training landscape in primary/secondary school contexts and investigated the extent to which two assessments courses may facilitate and/or inhibit the development of preservice teachers' language assessment literacy in one teacher education institution. His findings reveal that language assessment training in Hong Kong remains inadequate and the courses are still unable to bridge the theory-practice gap within the assessment reform context. Very little research as yet has been undertaken to study the students' evaluation of language testing and assessment courses (Inbar Lourie, 2008;Kleinsasser, 2005;O'Loughlin, 2006). Studies conducted by Bailey and Brown (1996), Brown and Bailey (2007), and Jin (2010) are among the few ones on language testing courses and included large groups of informants teaching language testing courses. Jin (2010) used almost the same questionnaire utilized by Bailey and Brown (1996) and found out most of the topics covered are the same as the ones listed by Brown and Bailey in their studies. He concluded that in China LTA (i.e Language Testing and Assessment) courses cover essential theoretical and practical aspects of the language testing area. In two other studies by O'Loughlin (2006) and Kleinsasser (2005) more comprehensive reports on language assessment courses were offered. O'Loughlin focused on a post-graduate elective course titled 'Assessment in the language classroom'. The course entails practical components (e.g designing assessment tools for assessing various skills) as well as conceptual themes. The author attempted to reveal how students' understanding of key concepts and ability to evaluate current assessment documents and instruments develops. The findings indicate that participants (two students) attained the course objectives; yet they differed in their readiness and capacity to grasp new ideas in language assessment area. The researcher attributed this difference to factors such as students' cultural background, prior experience with assessment as learners and teachers and the characteristics of the input they receive in language assessment classes. Hence he concluded that a learner-centered approach should be adopted which takes those factors into account while planning and managing assessment courses. Kleinsasser (2005) like O' Loughlin (2006) presented readers with direct and detailed reports related to language assessment courses and showed the significance of the collaboration between teachers and students for improving the quality and usefulness of an LTA course.

Theoretical Framework
To get a better in-depth understanding of the assessment practices of teacher preparatory programs, a robust cognitive framework is required. Revised Bloom's taxonomy has been extensively applied in testing and evaluation across different disciplines and its efficiency is substantiated time and again (Chen, 2004;Squire, 2001;Aviles, 1999). Bloom's taxonomy is a framework for examining the depth of cognitive process levels in educational objectives and it is used to determine the extent to which assessment tools measure higher-order thinking skills. Revised Bloom's taxonomy provides a broader vision of learning that includes not only acquiring knowledge but also being able to use knowledge in a variety of new situations. This taxonomy, a revision of the original Bloom's taxonomy, was developed by Anderson & Krathwohl (2001). It has two dimensions of cognitive process and knowledge which is extensively explained in Appendix A. Cognitive dimension includes six major categories and 19 specific sub-categories whereas knowledge dimension contains four main categories. Although relatively few studies in Iranian teacher education context used this taxonomy for examining assessment tools (Rabihavi, et al. 2011), numerous studies worldwide adopt it to take a closer look at evaluation procedures. Masters et al. (2001) investigated randomly-selected multiple-choice questions from 17 test-banks of accompanying selected nursing textbooks. 2143 items were rated on thirty generally accepted guidelines for writing multiple-choice questions, the cognitive levels of original Bloom's taxonomy and distribution of correct answers. Results indicated that most of the questions (i.e. 47.3 %) were written at the lowest cognitive level of Knowledge. 24.8 % and 21 % of items were respectively at Comprehension and Application levels and only 6.5 % were at the Analysis level. Since most of the textbooks were intended for upper division courses, these findings turn out to be surprising. Another field of study whose tests were the subject of scrutiny with regard to cognitive process levels is Agriculture (Squire, 2001). 628 agricultural science questions in senior secondary schools in Botswana were analyzed. A great proportion of questions were at Knowledge level of Bloom's original taxonomy. Almost no item was found at higher cognitive levels of Application, Analysis, Synthesis, and Evaluation. Unexpectedly even the essay-type items in those tests were at the lowest levels. Using Revised Taxonomy, Chen (2004) examined the knowledge types and cognitive levels of Computer Science test in Technical College Entrance Examination of Taiwan between 2001 and 2004. In a similar pattern, most items (44 % to 77 %) assess only lower-level thinking that required students to remember factual information. No item was found at Evaluate and Create levels.

Research Questions
Given the above introduction the research questions are formulated in the following way:

Method
Concerning the nature and purpose of the study, we adopt a descriptive-evaluative research method. To this aim the curriculum, syllabus and the assessment instrument for language testing course of pre-service teachers are examined and analyzed.

Context of the Study
The main bodies responsible for training teachers in Iran are Teacher Training Centers and Universities that are jointly managed by Ministries of Education, and Science, Research, and Technology. Prior to 2013, the undergraduate teacher training programs consisted of two two-year period; upon completion of the first two-year the graduates would get an Associate's degree in relevant majors. The then-teachers are allowed to continue their studies for another two-year period to obtain a B.A. This second term was mostly for in-service teachers. In 2013 the teacher training programs shifted onto four-year long training at the end of which graduates would receive a B.A.

Sampling Procedures
Five Teacher Training Centers were chosen for this study. Two reasons can be enumerated for this selection: IJALEL 4(4):8-20, 2015

11
(1) The Centers' ease of accessibility to the researcher (Convenience Sampling) (2) To the extent the availability is possible for the researcher, it was tried to cover as geographically different areas as possible (Purposive Sampling) Short descriptions for each of the core courses is provided in this document, along with educational objectives, syllabus (the weight given to each section or topic), and suggested materials. However lecturers are allowed to structure their own lessons by keeping those points in mind. Language testing (language assessment literacy) is divided into two two-credit courses titled Testing (1) and Testing (2). The assessment instruments were designed and developed by the lecturer him/herself. In recent decades teacher-made tests have played a very fundamental role in assessing students' learnings, however, their efficiency has rarely been investigated (Rabihavi, et al., 2011).

Data Analysis
The exam questions for Testing 1 and 2 were collected, and then each question was rated based on Revised Taxonomy's cognitive processes and knowledge types (explained below). To ensure the accuracy of researchers' coding results two Ph.D graduates of Tarbiat Modares University in TEFL were recruited to serve as raters. The criteria for coding, the definition of the Revised Bloom's taxonomy (Anderson & Krathwohl, 2001) and some sample questions were given to the raters to help them become familiar with the coding framework. Then Cohen's kappa coefficient was computed to ensure inter-rater reliability and to observe whether there is a statistically significant consistency among raters.
To see the frequency counts and percentages of the major combinations as well as the sub-combinations of the cognitive process and knowledge dimensions crosstabulation which is a descriptive statistical procedure was employed.

Results
The consistency among raters was specified by computing kappa value. This figure was respectively 0.88 and 0.83 for Rater 1 * Rater 2 and Rater 1 * Rater 3 that shows a high degree of agreement among raters. Both values are significant at .001 level. This means The results for the first research question are reported below.
For answering research questions Concerning the question on cognitive domain of exam questions, the results (overall frequency and percentage) for all five centers are reported in the following tables. The purpose of this question was to determine the extent to which exam questions are successful in measuring higher-level thinking processes (e.g. analyze, evaluate, and create).
The questions were judged on Bloom's Revised Taxonomy. Overall 346 items out of 441 fall within the lowest level of cognitive process that would mean as 78.5% in percentile. This category entails two subcategories of recognizing and recalling which respectively constitute 38.3 % and 40.1% of this category.
Concerning next cognitive level, Understanding, 58 items were found which would mean 13.2%. Among seven different subcategories, Interpreting comprises 36.2% of the whole category. Other processes of Exemplifying, Classifying, Inferring, Comparing and Explaining make up respectively 1.1%, 1.1%, 2.7%, 1.6%, 1.8% of all questions. No item was found for Summarizing.   For answering the second question which is on the content validity, table of specification, objectives and topics set in the curriculum and syllabus are taken into account (Appendix B). In the following table the coverage of subject matters in exam questions is specified in terms of the number of items and the percentage for Testing (1):  (1), Table 3 shows the relevant proportion of items on each subject matter. Looking closely it is apparent the majority of questions (62%) are on purely theoretical areas which are taught in the first half of the semester. The sections on " Test Construction", " Characteristics of Good Test", and " Theories of Language Testing" are not satisfactorily taken care of. Given their importance that derives from the practical value essential for future teachers, it is necessary to amend this weakness and devote a larger number of questions to these areas. Examining the centers individually (Table 4), all of them lack enough coverage of the last three sections (mentioned above). This, in effect, exhibits a major drawback of teacher preparatory programs which is their theoretical orientation. Shahid Mofatteh and Shahid Bahonar (Tehran) Testing exams include comparatively sufficient number of questions on the topics of the syllabus in comparison with other centers. By taking a look at Table 5 that includes all questions of Testing (2), a rather balanced and satisfactory coverage of content areas is observed. However it seems there are two neglected issues which are "Cloze and Dictation-type Tests" and "Functional Testing". Considering the fact that syllabus places a high value on these parts of the content, inadequate number of items were observed. Comparing the centers (see Table 6) in terms of content validity of exam questions,Shahid Rajaei of Urmia is in better state than other centers. The questions were evenly distributed among the topics. On the contrary the questions of two centers, namely of Arak and Tehran, were the least valid in terms of content. Quality of questions in Shahid Mofatteh of Shahr-e-Rey is moderate in comparison with other four centers.
To answer the third research question cross-tabulation was conducted that is presented in the table below. From the item classification of 441 questions, five major cognitive process levels along with thirteen sub-levels and three types of knowledge were found. The five major cognitive skills are: Remember, Understand, Apply, Analyze and Evaluate. As for the knowledge type, Metacognitve Knowledge is absent from the list. This neglected area of knowledge in tests designed for assessing prospective teachers' competencies takes on an additional significance while considering its implications for classroom practice. Metacognitive Knowledge is associated closely with some terms such as metacognitive awareness, self-awareness, self-reflection and self-regulation; all can be used interchangeably. It can play an important role in student learning and by implication in the ways students are taught and assessed in the classroom; it seems to be related to the transfer of learning which is the ability to use knowledge gained in one setting or situation in another (Bransford et al., 1999). Teacher educators are to cultivate the awareness of this knowledge among prospective teachers by its inclusion in assessments. One way is to use portfolio assessment that offers students the opportunity to reflect on their work which also leads to gaining self-assessment information. The lack of any question on this knowledge can be attributed to the difficulty of its measurement in formal classroom tests; it is more easily assessed in classroom activities and discussion of various learning strategies (Pintrich, 2002).   As for items classified as Apply, all of them, comprising about 6.8 % of overall questions, were identified as Procedural Knowledge. Only one sub-category of Analyze, the next cognitive process, was found in the tests which is Organizing. Out of three items, two were measuring Conceptual Knowledge while one tests Procedural Knowledge. For Evaluate both sub-categories, namely Checking and Critiquing, were present and each has two questions. The two items of Checking aim to measure Factual Knowledge whereas the items of Critiquing measured Procedural Knowledge. For the last major cognitive category, Create, no item was found.

Discussion & Conclusion
This study was undertaken to investigate what cognitive levels and knowledge types were tested on language assessment literacy exams administered in the past two years at five Teacher Training Centers. The results of item analysis can have a positive washback effect to language assessment literacy classes across teacher preparatory programs. The findings for the first question indicated a narrow range of cognitive processes (i.e. lower-level capacities of Remember and Understand) were mostly being tested in all centers and only a very limited number of questions aim at measuring complex cognitive abilities of Analyze and Evaluate. This is similar to Masters et al. study (2001) in which the majority of the questions were written at the lowest cognitive level of Remember. In the current study after Remember the next cognitive processes in terms of frequency were respectively Understand and Apply corresponding to the order found in Masters et al. study. Also this finding corroborates the results of Squire's study (2001) that found a great proportion of items at Knowledge level of Bloom's Original Taxonomy (corresponding to Remember in Revised Taxonomy) and a few number of items at higher cognitive levels. The result is also compatible with the findings of Rabihavi, et al. (2011). In their study, the exam questions of two teacher training centers were analyzed and they were either labelled as Knowledge or Comprehension on Bloom's Taxonomy; none of the items measured the more complex processes. Current findings is consistent with Lan and Chern's study (2010) that aimed to investigate cognitive process levels and knowledge types measured on the English reading comprehension tests of college entrance examinations in Taiwan. In their study items on Remember Factual Knowledge and Understand Factual Knowledge, which belong to lower cognitive levels, were the majority in the tests under scrutiny and few items were found at higher levels of Apply and Analyze. Analysis of Turkish high-school physics-examination questions by Kocakaya and Gonen (2010) likewise found only about 27.5 percent of the questions at the higher levels of cognitive domain (analysis, synthesis and evaluation levels). Studies by Hand, Prain and Wallace (2002), Çepni et al. (2003Çepni et al. ( ), Karamustafaoglu et al. (2003, Köğce (2005) also support the view that most traditional examinations are of the lower order cognitive skill type. In this study, the reason for finding no item on Create category might be due to its productive nature that cannot be grasped through multiple choice or fill-in-the-blanks formats. This view is consistent with Buckles and Siegfried (2006), who found that multiple-choice questions can measure elements of in-depth understanding when being carefully designed, and maintained that Synthesis and Evaluation levels could not be accurately measured since the creativity or originality could not be simply tested via multiple-choice questions.
Concerning the third research question, the following association of cognitive skills with knowledge types surfaced more often than the other combinations in the data set: (a) Remember Factual Knowledge (b) Understand Conceptual Knowledge (c) Apply Procedural Knowledge As far as the third research question is concerned, the findings are in line with Chen's (2004) in the computer science discipline which revealed this knowledge-and-cognitive association. The result in the present study further confirms Anderson and Krathwohl's (2001) proposition that certain types of knowledge tended to be associated with certain types of cognitive skills. It is well worth noting that inasmuch as instructors' approach in assessment affects students' learning preferences, the findings of this study have direct implication for classroom practice. As long as the majority of questions only test students' ability to remember or understand the content, the test takers would find memorization and reproduction of knowledge sufficient for getting an acceptable score and would not pay attention to the depth of the content. They would develop a rote and superficial approach to learning without forming a connection between learnings and meaninglessly accumulate information. This situation gets even worse by considering the fact that test-takers will become future teachers and hence become learning models of their own students. Training teachers with this attitude towards assessment adversely affect students since they don't recognize the importance of higher order cognitive process and consequently will not expose their students to these skills. Language assessment literacy course in this sense has got a double significance for future teachers; teacher educators are required to prepare the candidates by familiarizing them with essential knowledge and skills on assessment and getting them to realize the necessity of testing higher level cognitive skills. Given the above discussion some potential areas for further research emerge. As results of this study demonstrated a gap in the assessment of higher order cognitive skills in teacher preparatory programs, there is a need for developing an alternative evaluation instrument which covers as broad range of cognitive processes as possible and compensates the shortcomings of current assessment procedures. Another issue worthy of investigation is to make sure whether testtakers actually apply expected cognitive skills while answering a question and to check the correspondence between what takes place in reality and the judgment of test-developers. Procedures such as think aloud or interview can be adopted for this purpose. Since "Methodology of Teaching English" is one of the mandatory subjects to be completed towards the English teaching licensure, it is needed to appraise its assessment from cognitive processing perspective.

Appendix A The Cognitive Process Dimension
Level The basic elements students must know to be acquainted with a discipline or solve problems in it Conceptual Knowledge The interrelationships among the basic elements within a larger structure that enable them to function together Procedural Knowledge How to do something, methods of inquiry, and criteria for using skills, algorithms, techniques, and methods Metacognitive Knowledge Knowledge of cognition in general as well as awareness and knowledge of one's own cognition Detecting inconsistencies or fallacies within a process or product; determining whether a process or product has internal consistency; detecting the effectiveness of a procedure as it is being implemented Critiquing Judging Detecting inconsistencies between a product and external criteria; determining whether a product has external consistency; detecting the appropriateness of a procedure for a given problem