The Validity of Grammaticality Judgment Task on Saudi EFL Learners

The purpose of this paper is to shed light on the validity of the controversial data-gathering tool Grammaticality Judgment Task on measuring the grammatical competency of Saudi EFL learners. Despite the widespread use of GJT in SLA research, it is surrounded by a great deal of criticism. The present paper is part of a larger study investigating the acquisition of past verb forms by Saudi EFL learners. Thirty-six Saudi EFL learners took part in the study and were divided into three groups as follows: guided-planning group, semi-guided planning group, and control group. The task used in the study consisted of twenty test items: 10 control test items, and 10 experimental test items. The results did not reveal significant statistical differences between the three groups. Also, the results did not reflect the actual grammatical competency of the participants.


Introduction
Grammaticality Judgment Task (GJT) is a widely used tool by researchers in the area of Second Language Acquisition SLA.In this task, L2 learners are presented with a set of sentences and they are required to identify the grammatically deviant sentences.GJ Tasks are conducted to: (a) assess the speakers' reactions to sentence types that only occur rarely in spontaneous speech; (b) obtain negative evidence on strings of words that are not part of the language; (c) distinguish production problems (e.g., slips, unfinished utterances, etc.) from grammatical production; and (d) isolate the structural properties of the language that are of interest by minimizing the influence of the communicative and representational functions of the language (Schütze, 1996).However, some concerns have arisen regarding the validity of GJT.GJT can be influenced by extra grammatical factors, the link between metalinguistic judgments and grammatical knowledge is not clear, and there is a lack of control techniques (Tremblay, 2005).Another concern about the validity of GJT is that learners may base their judgment on extraneous factors, such as sentence complexity or semantic irregularity (Ellis, 1991).
The present study investigates the validity of Grammaticality Judgment Task on Saudi EFL Learners.The paper is part of a larger study investigating the acquisition of past verb forms by Saudi EFL learners.A GJT was utilised in the larger study to measure the participants' competency in identifying grammatically deviant sentences.The larger study was carried out to explore the factors that hinders the acquisition of past verb forms by Saudi EFL learners.Also, the larger project was set out to highlight the extent of L1 interference when Saudi EFL learners report in English using the past tense.The larger study tested the receptive and productive knowledge of the participants by using GJT and Picture-Cued Storytelling task.It concluded that Saudi EFL learners have difficulty in acquiring the past verb form in English.However, the GJT did not provide sufficient understanding of the participants L2 competency.

Background
The use of Grammaticality Judgment Task (GJT) goes back to the late 70's and early 80's (Bialystok, 1979;Gass, 1983).Since then, GJT have been widely used by researchers as a data-collection tool to test theoretical claims (Tremblay, 2005).Grammaticality judgments (GJT) comprise one (although definitely not the sole) kind of metalinguistic function, or objectification of language.In other words, the one means of objectifying language is to declare if a provided sentence is suitable or not.Grammaticality judgments comprise intricate behavioural performance: the participants in GJT may guess when not sure, lose patience if the test is too long, or try to keep balance between the number of sentences they judge to be grammatical and deviant (Ellis, 1991).
A great amount of SLA studies have employed grammaticality judgment tasks (for an early review of these see R. Ellis, 1991a) as a means of quantifying L2 learners' knowledge.It is apparent that the preferred technique of exploring L2 explicit awareness as a conscious knowledge comprises the grammaticality judgment feat.Currently there is considerable literature on GJT.Following the reviews of Chaudron (1983) and R. Ellis (1991a) of GJT studies, several supplementary researches have been done (e.g., Bard Kaplan, 1998;Gass, 1994;Goss, Ying-Hua, & Lantolf, 1994;Leow, 1996;Mandell, 1999).These researches have been specifically targeted at exploring the legitimacy and dependability of GJTs.
The main construct validity matter concerns what is quantified by a GJT.What sort of awareness do learners utilize when they judge a sentence's grammaticality: implicit knowledge, explicit knowledge or some sort of combination of the two?As observed by Birdsong (1989), "metalinguistic information [from a GJT] are comparable to cheap hot dogs: they are made up of meat but considerable other ingredients as well" (p.69).Unexpectedly, this is a matter that numerous SLA researchers utilizing GJTs are unable to address.Nonetheless Sorace (1996) clearly accepts the challenge: "It may comprise an additionally intricate job [than is true of native-speaker judgments] to resolve on the sort of norm consulted by learners within the procedure of making a judgment, especially within a learning setting that encourages the growth of metalinguistic awareness.It is a challenge to see whether subjects disclose their thoughts or what their thoughts should be".(p.385) It may be theorized that when learners are requested to determine the grammaticality of a sentence quickly, they may be more inclined to depend on implicit awareness.However, if they are provided with time, they are capable of acquiring controlled access to explicit awareness.Sorace, too, suggests that a timed process is required to ascertain that the assessment draws on tacit as opposed to metalinguistic awareness.Ellis and Han's (1998) research reinforces this standpoint.They established that measures taken from a timed and untimed GJT version of the same GJT factored out individually.Within a principal-components evaluation, the timed GJT loaded on the similar aspect as an oral production assessment, while the untimed GJT loaded on the similar factor as a metalingual-comments score.Ellis and Han noted these two factors as explicit and implicit awareness correspondingly.DeKeyser (2003) nonetheless offers a cautionary word, observing that time pressure does not assure a measure of implicit awareness.As observed previously, it is feasible that several learners may develop comparatively automized explicit awareness, which may also be appraised subject to time pressure.Additionally, it is not true that explicit awareness will be implemented by learners if they have the time to do this.They may still opt to depend on their implicit awareness.Certainly, they may be required to, (or instead, to guess) if they do not have the required explicit awareness to assess the grammaticality of a specific sentence.R. Ellis (1991b) and Goss et al. (1994) established that although there may have been the chance to contemplate a judgment, learners at times selected to answer immediately.At best, we may then state that an immediate judgment may more likely mirror implicit awareness, and a deferred judgment may indicate explicit awareness.
An additional challenge associated with the judgment of sentences as being ungrammatical or grammatical is associated with whether the learners really judge the particular structures intended for them to judge by researchers, or alternative structures included within the test sentences.This challenge may be addressed if learners are requested to show or rectify what they believe to be ungrammatical within the sentences.Nonetheless, it is unclear if this improves the legitimacy of a GJT as a measure of explicit awareness.In researches concerning L1 metalingual awareness, the capability to repair sentences at a considerably early age (4years) is observed as mirroring a tacit awareness of the rules of language as opposed to a conscious knowledge.Gombert (1992), states that it mirrors "episyntactic" as opposed to "metasyntactic" conduct.Indeed, children subsequently apply conscious awareness to rectify ungrammatical sentences, but obviously the capability to carry out such a function utilizing tacit awareness does not vanish.Time may once more be anticipated to comprise an essential aspect; requested to show or rectify a mistake online, L2 learners may be anticipated to depend more on their implicit awareness, while if provided sufficient time, they may have the chance to utilize their explicit awareness.
The dependability of the GJTs employed in specific research has additionally been questioned.Birdsong (1989) highlights the hazards of reaction bias (e.g., an overall inclination to deem sentences as ungrammatical).Ellis (1991b) documents three studies within which the similar GJT was dispensed to L2 learners within a single week.The learners altered 22.5%, 31.0%, and 45% of their judgments from one test to the next.Ellis asserts that the GJTs could have been undependable, as the awareness of the L2 learners was not certain, resulting in them inconsistently applying a different set of techniques for providing judgment.Only one of these techniques concerned applying explicit awareness in the state of pedagogical regulations of differing precision.Ellis' translation has been supported by alternative researches that have explored the manner in which learners reach judgments (e.g.Goss, Ying-Hua and lantolf, 1994).Additionally, learners of L2 have been seen to utilize a greater range of tactics contrasted to native speakers when completing a GJT (Davies and Kaplan, 1998).
In conclusion, GJT is one of the most popular tools used in SLA for measuring the receptive knowledge and language competency for L2 learners.Despite its popularity, GJT is a controversial tool that has drawn a great deal of criticism and questions about its validity.That Said, using GJT as a data collection tool requires careful design for the test and the test items, and cautious administration.The possibility of a GJT offering a measure of explicit awareness may be raised if (a) learners are provided time to assess sentences and to rectify ungrammatical sentences, (b) reactions of learners to the ungrammatical sentences on the assessment (or the sentences that have been considered ungrammatical by the learners) are contemplated individually from their reactions to the grammatical sentences, and (c) the uncertainty of learners in assessing individual sentences is considered.

Participants
Thirty-six Saudi students from different universities in the UK participated in this study.They were all adult males and IJALEL 4(6):78-83, 2015 80 females who had completed a one-year general English language course in the UK as a requirement to achieve the sufficient English language proficiency level.Their average score in IELTS were between 5.0 and 6.0.The participants were studying different disciplines, and had lived in the UK for more than two years.The homogeneity in IELTS results and the number of years spent in the UK for the participants was to maintain a balanced level of L2 proficiency and to ensure they had the adequate competency to identify grammatically deviant and well-formed sentences in English.The nature of the study was explained to the participants and they signed a form giving their consent to take part in this study.The participants were divided into three groups: Guided-Planning Group consisted of twelve participants with different levels of English language proficiency (judged by their IELTS scores).The guided-planning group was provided with a detailed explanation about the task.Participants in this group were asked explicitly to focus on the past verb forms and they were provided with example.The guided-planning group was allowed time to plan their answers prior to the task and was provided with guidance.The implementation of pre-task planning was to get positive reflection on the participants' performance (Willis & Willis, 1988).
Semi-Guided Planning Group consisted of twelve participants with different levels of English language proficiency (judged by their IELTS scores).The semi-guided planning group was also provided with detailed instructions and the participants were asked to pay attention to the past verb forms but no examples were provided.The participants in this group were also given time to plan their answers before engaging with the task.
No-Planning Group This was a control group consisting of twelve participants with different levels of English language proficiency (judged by their IELTS scores).The group received general instructions about the task with no further details or planning time.

Test Items
The task included 20 test items: 10 control items, and 10 experimental items.The items were presented in a counterbalanced order.All the test items were formulated in the past tense including three aspects: past simple, past progressive, and past perfect because the task was part of a larger study investigating the acquisition of past verb forms by Saudi EFL learners.Generally, the control test items were simple, because they, as suggested by Bullock et al. (2005), serve as a comparison point for the experimental items.In contrast, the verbs in the experimental items were manipulated because they were the focus of the study.

Procedure
The task was conducted in many cities in the UK and at different times because the participants were assembled from different cities in the UK.The participants sat for the tasks individually, and occasionally in groups for the GJT when the researcher travelled to meet a group of participants.The participants were given an introduction regarding the aims and purpose of the research and the tasks they were about to undertake.The first task given was the Grammaticality Judgment Task.The participants were given a list of sentences and they were asked to determine whether they were grammatically correct or incorrect.The same task was repeated after eight weeks to find out if the instruction on using the past verb forms had taken place (Schmitt, 2010).
As mentioned earlier, the GJT used in this paper was part of a larger study carried out to investigate the acquisition of past verb forms by Saudi EFL learners.The participant sat for the tasks individually, and occasionally in groups.Each participant was given a brief introduction about the purpose of the study.The participants were given a maximum of 10 minutes to plan their answers and provide their judgment.The same task was repeated in eight weeks' time.
The Participants were instructed to base their judgment on their intuition and whether they would, or would not, use the sentence in normal situations.To avoid rejecting sentences "on the basis of prescriptive rules of English", the participants were instructed not to reject a sentence because they might know a better way to deliver the same meaning (Tremblay, 2005 p.144).For this purpose, guidance and examples were provided before engaging with the task.

Results
A one-way analysis of variance (ANOVA) between groups was conducted to compare the results from Grammaticality Judgment Task 1 (GJT 1).Looking at table (1) below, the analysis shows no significant results between the three groups: guided-planning group, semi-guided planning group, and no-planning group [F (2, 33) = 0.512, p = 0.604].However, the participants from the guided-planning group showed better performance in their initiative judgment on task items (M = 4.33, SD = 1.155) than participants from semi-guided group (M = 3.92, SD = 1.832) and no-planning group (M = 3.75, SD = 1.288).In other words, the null hypothesis of meeting the assumption is true as no significant results emerged between the three groups from the Grammaticality Judgment Task 1 (GJT1).The same analysis was repeated on the delayed test Grammaticality Judgment Task 2 (GJT2), which was given to the same participants of the three groups eight weeks later.This time, the results yielded a slight significant change between the groups [F (2, 33) = 3.317, p = 0.049].Again, the participants from the guided-planning group showed better performance in their initiative judgment on task items (M = 4.83, SD = 1.115) than participants from semi-guided group (M = 4.33, SD = 1.557) and no-planning group (M = 3.58, SD = 0.793).A Post Hoc Test and a Complex Contrast Test were employed on Grammaticality Judgment 2 (GJT2) to find out which group caused the significance.The Tukey HSD Post Hoc Test was selected for this purpose to determine which group made the difference in the results.The guided-planning group was compared to the other two groups (semi-guided planning and no planning) and the results from the complex contrast test showed that guided-planning group was significantly different in terms of their performance in the delayed grammaticality judgment task at (p. = 0.047).However, the Tukey HSD test showed nothing significant between the guided-planning group and the semi-guided planning group p. = 0.568, and between the semi-guided planning group and the no-planning group (p.= 0.288).Nonetheless, the statistical significance occurred between guided-planning group and no-planning group p. = 0.039.See table (3) below.In summation, there were no statistically significant differences among the three groups in the first GJT p = 0.604.The delayed GJT, however, showed a slight significant change in the performance of the three groups p = 0.047, which was expected as the delay time was only eight weeks and the same procedure for guidance and planning was conducted

Discussion
The objective of this paper is to test the validity of GJ on Saudi EFL learners with a focus on the past verb forms.The test included 20 items: 10 control items, and 10 experimental items.The test items were presented in a counterbalanced order.The instructions were very specific and clear, and the participants were allowed 10 minutes to produce their judgment.The results revealed that the participants failed to intuitively determine whether a sentence was correct or deviant from a grammatical perspective.It has also shown that participants lack the ability to assign the correct temporal reference when reporting an incident in the past tense.
The results showed that the three groups did not make any statistical significance in first encounter of the GJT although two groups were allowed to plan their answers before engaging with the test and they were provided with guidance.This indicates that the participants were unable to identify grammatically deviant from well-formed sentences focused on the past verb form.The second encounter, however, revealed a slight improvement in the performance of the first and second groups and a fall back in the performance of the third group.The improvement pinpoints that repeating the same test with the same planning and guidance procedure results in better outcomes.
The study focused on the three aspects in the past tense: simple, progressive and perfect to measure the participants understanding those aspects.The participants demonstrated good understanding of the past simple and were able, generally, to make clear judgments about the ungrammatical sentences.However, they showed some difficulty in recognizing deviant sentences in the past progressive form.The past perfect form was even harder for the participants to identify the ungrammatical sentences in this form.
Despite the slight statistical significance in the delayed test, the results indicated that guided-planning and semi-guided planning did not make any significant change in the informants' performance, and the results were, generally speaking, convergent.Although the sample was relatively small, the results indicate that Saudi EFL learners face problems when using the past tense in English.
The target from using the grammaticality judgment task was to measure the intuitive judgment of the participants to the extent that they believed the sentences to be grammatically acceptable.However, the results show that the participants from the three groups were unable to make that clear judgment.That said, the two groups were given guidance and a chance to plan their answers.The same test was repeated eight weeks later on the same groups, and the results this time came with a slight statistical significant change (see table 2).Although the null hypothesis was rejected on the delayed test p.= 0.049, the performance of the guided-panning group and semi-guided planning was approximate, and the significance was made by the performance of the no-planning group (see table 4).

Conclusion
GJT is one of the widely used tools for Metalinguistic data in SLA research.Despite the criticism, GJT is still an essential tool for gathering metalinguistic data in the area of SLA.The finding of this study comes in alignment with the concerns about the validity of GJT in measuring grammatical awareness.This study makes a potential contribution to the literature on investigating the validity of GJT on Saudi EFL learners.The outcomes of this study suggest that design of the GJT plays a major role in determining the reliability of the collected data.The complexity of the experimental test items used in this study affected the performance of the participants and maybe led them to reject those items.
Based on the results of this study, it is suggested that researchers take extra caution when using GJT as a tool for the data collection.GJT can be more effective when designed carefully and the test items are less complex.A true-falselike test is not efficient for GJT as it might push L2 learners to make judgments about the grammaticality of a sentence based on random guessing.The test items in the GJT used in this study focused only on the past verb forms.This is believed to be another element that affected the results of the GJT and gave unrealistic readings for the participants' grammatical awareness.Hence, it is importance to adopt a diversity of verb forms in order to get a clearer idea about the L2 learner's awareness of grammar.Having said that, the participants were relatively able to identify the simple past form, but they struggled with the past progressive and past perfect forms.

Table 1 .
One-way ANOVA results for the GJT

Table 2 .
One-way ANOVA results for the delayed GJT

Table 3 .
Tukey HSD Post Hoc Test to determine which group caused the significance