The Role of Lexical Cohesion in Writing Quality

The idea of whether repetition has any relation with the writing quality of the text has remained an issue that intrigues a number of scholars in linguistics and in writing studies. Michael Hoey (1991), Halliday and Hasan (1976) are two prominent works in presenting detailed and thoughtful analysis of repetition occurrences in the text. This study uses a model of lexical cohesion proposed by Witte and Faigley (1981) which itself is based on the taxonomies of cohesive ties presented by Halliday and Hasan (1976). The model deals with lexical cohesion and its subclasses, namely, repetition (same type, synonym, near-synonym, super-ordinate item, and general item) and collocation. The corpus includes five argumentative essays written by students in the field of English language literature. Five teaching assistants were asked to rank the papers on a five-point scale based on their perception of the papers’ writing quality. The results showed that the paper that received the lowest rating in terms of the writing quality was the one that included the largest number of repetition occurrences of the same type. The study concludes by arguing that repetition may not be considered as monolithic, and suggests that every type of repetition needs to be examined individually in order to determine what enhances and what deteriorates the writing quality.


Introduction
This paper shares the interest that linguists have found in discussing cohesion, especially after the influential work of Halliday and Hasan (1976).It investigates the relationship between the lexical cohesion and writing quality.More particularly, it highlights the specific types of lexical cohesion that either enhance or weaken the writing quality.It accepts the fact that both "writing quality" and "cohesion" are still slippery terms due to the instability of the factors that label them.The paper therefore follows a specific model proposed by Witte and Faigley (1981) which itself was based on the taxonomies of cohesive ties presented by Halliday and Hasan (1976).The study raises some issues that might be taken further by researchers such as the mother tongue of the writers and the raters of the papers as well as the different disciplines and types of papers.

Literature Review
Scholars in their definitions of cohesion have stressed the importance of the text and the relationship between the elements in the text.For example, Halliday and Hasan (1976) asserted that cohesion in text is determined by the "relations of meaning that exist within the text, and that define it as a text" (p.5).Likewise, Hoey (1991) defined cohesion as "the way certain words or grammatical features of a sentence can connect that sentence to its predecessors (and successors) in a text" (p.3).Carter (1998) provided similar definition by stating that "the term cohesion embraces the means by which texts are linguistically connected" (p.80).Also, Cook (1994) compared cohesion to coherence by showing that "cohesion is a manifestation of certain aspects of coherence, and a pointer towards it, rather than its cause or necessary result" (p.34).
The two agreed-upon categories of cohesion are the lexical and the grammatical.The review here will focus mainly on the lexical type because the paper uses only the lexical ties.Halliday and Hasan (1976) provided two major subclasses of lexical cohesion: reiteration and collocation.Reiteration includes five subclasses: same item, synonym or near synonym, super-ordinate item, and a general item.Similarly, Cutting (2008) identified four subclasses: repetition, synonyms, super-ordinates, and general words.Witte and Faigley (1981) securitized Halliday's and Hasan's (1976) taxonomies in relation to composition and writing research.They analyzed student essays on the topic the "changes in behavior " (p. 195).They asked two readers to rank 90 essays on a four-point scale based on their perception of the writing quality.Then, they analyzed the five essays that were given the highest score and the five essays given the lowest score.They examined the essays looking for errors, syntactic features, and the number of cohesive ties.The results yielded the following: the highly rated essays contained less errors but have more syntactic complexities.More importantly, they found that the ideas in the highly rated essays were highly detailed and more connected.In general, the study showed that writers of low-rated essays used more reiteration than collocation category.More related to the present study, Witte and Faigley found that "the majority of lexical ties (65%) in the low essays are repetitions of the same item " (p. 197).Although it was not the focus in Witte and Faigley (1981), the findings regarding the relationship between the types of lexical cohesion and the writing quality is intriguing to investigate and will be given the exclusive focus in this study.Chiang (2003) questioned the factors that determine writing quality in nonnative English speakers' papers by looking for four categories: cohesion, coherence, syntax, and morphology.Chiang's study differs from the previous study because it focused on the mother tongue of the raters.Fifteen English speakers and fifteen Chinese speakers were asked to rate sixty Chinese-speaking students' papers.What can be linked to this present paper is the interesting finding regarding the rating of cohesion as the best determiner of writing quality.Also, the insignificant difference regarding the mother tongue of raters is important to the present study because the readers were both native and nonnative speakers of English.
Lexical repetition was investigated in Hoey (1991).He introduced three relevant works.First, Hasan (1984) who provided two categories of lexical cohesion: First, "general" which in turn includes: repetition, synonymy, antonymy, hyponymy, and meronymy.Second, "instantial" which includes: equivalence, naming, and semblance.Clearly, these categories are different from Halliday's and Hasan's (1976) that were mentioned above.The second work that Hoey considered was Winter's (1974Winter's ( , 1979)).Hoey introduced this work by showing that Winter "has little interest in such classifications" but "his interest is in how the grammar of sentences contributes to their interpretation in context" (p.16).For Winter, therefore, "it is much more important to recognize the common function of the variety of cohesive ties than to distinguish them, the common function being to repeat" (pp.17-18).The third work introduced by Hoey was Philips's (1985) who showed interest in identifying collocation in the text.According to Hoey (1991), Philips seeks to establish, with the help of a sophisticated statistical methodology, whether, for any given stretch of text, collocations can be found and, if so, how they might be interpreted.Taking as his data scientific textbooks, and as his unit of analysis the orthographically defined chapter, he shows that it is possible to identify the collocates of any given word by statistical means.(p.21) Teaching of lexical cohesion was the main focus in McGee (2009).Because he disagreed with including collocation as part of lexical cohesion, McGee examined only the type of reiteration by looking at its four sub-classes: same item, synonym, super-ordinate, and general item.McGee found that reiteration can lead to redundancy and hence weakens the writing quality.McGree's finding is of great importance because the common belief tends to consider the reiteration as an important component that enhances the writing quality.Reiteration was also examined in Reynolds (2001) in terms of cultural background, topic, and writing development.The data consisted of both descriptive and persuasive essays, and the study questioned whether there was any relation between lexical repetition on one hand, and the topic of the task and the cultural background on the other hand.The author found that the persuasive topic showed frequent use of bonds.Contrary to the common belief that the tendency of repetition correlates with the length of the essay, where students feel the necessity of explaining the old information they already talked about instead of coming up with new words, Reynolds discovered that when "the total number of words increases, the frequency of repetition links and the proportion of T-units bonded by repetition decrease" (p.25).
It can be concluded from the review that the investigation of lexical cohesion tends to be disparate.Cutting (2008) did not look into lexical cohesion in depth as her work serves as a general introduction to the topic.Hoey (1991) developed the notion of lexical cohesion by introducing simple and complex systems of repetition.Unlike Halliday and Hasan (1976) who were interested in the categories and classifications of cohesion, Winter (1974Winter ( , 1979) ) found the benefit of examining the function of the cohesive ties more than worrying about their categorizations.Philips (1985) focused on and developed the type of collocation.McGee's (2009) study highlighted the issues of lexical cohesion inside the classroom.Witte and Faigley (1981) as well as Chiang (2003) examined the relation between cohesion and the writing quality.

Data
The data were drawn from the website Michigan corpus of upper-level student papers (http://micusp.elicorpora.info)where five papers were examined.The selection process was systematic as the following criteria were met: the L1 and gender of the writers, the discipline, the type of the paper, the level of the students, and the specific genre.Thus, the data comprised five papers written by female nonnative speakers (L1 of four is Chinese and of one is Urdu).All the students were final year undergraduates.The papers were argumentative essays in English language literature; and the specific genre was the novel.Table 1 summarizes the corpus.Five teaching assistants were asked to rate the papers based on their perception of the papers' writing quality because the goal of the study is to measure the relationship between the lexical cohesion and writing quality.

Procedure
The first fifty T-units of each essay were analyzed.The notion of T-unit was first introduced by Hunt (1965) who indicated that "these units are the shortest grammatically allowable sentences into which the theme could be segmented.
If it were segmented into units any shorter, some fragment would be created" (p.21).Witte and Faigley (1981) paraphrased the definition as "an independent clause and all subordinate elements attached to it, whether clausal or phrasal" (p.37).It seems though that this definition is a bit outdated.It is clear that Hunt (1965) proposed this notion before modern grammar points out many inaccurate definitions in the traditional grammar books.The question that any student of modern grammar would ask is "what do we mean by independent clause?"For example, Hunt (1965) argued that a clause like "and it is worth sixteen dollars" is an independent clause, and thus he considered it as a T-unit (p.37).However, this sentence is clearly a coordinated clause because of the use of the conjunction and which makes it dependent.Modern grammar books do not follow such definitions, i.e. dependent and independent.Instead, other classifications are given, like, main clause, subordinate, and coordinate.Furthermore, in modern grammar syntax, the sentence can be rewritten by moving some words from other parts of the sentence.Here is an example taken from a paper examined in this research: "It is different from the bond that would exist between two women of equal social standing, and different even from the bond that would exist between a mistress and her maid."According to the modern grammar books, the subject can be rewritten (and thus the verb) before the second coordinated sentence.Hence, the sentence would look like: "It is different from the bond that would exist between two women of equal social standing, and [it is] different even from the bond that would exist between a mistress and her maid."Fortunately, this is the only sentence that carries this kind of ambiguity in the five papers analyzed for this study.Thus, the previous sentence was divided in two T-units.
The following excerpt explicates the way the T-unit is undertaken in this study.The T-unit is shown in brackets: Data Extracted 1: [Newland and Ellen's romance is shaped by their dreams, by the fantasy of their future together.][ When asked if he wants Ellen for a mistress, Newland responds "I want-I want somehow to get away with you into a world where words like that-categories like thatwon't exist.Where we will be simply two human beings who love each other…and nothing else on Earth will matter" (238).] [.He wants a space devoid of context, not only without expectations and conventions to follow, but entirely without categories to qualify love.][The idea that "nothing else on Earth will matter" is an ideal, a space in which the two of them can be the sole inhabitants, an idea of Eden or a personal utopia.][They find this space separate from "anything else on Earth" for mere moments during the story: in her house the first time he visits, in the house of the old Patroon, on the boat in Boston, in the carriage coming back to New York.][But the moments are fleeting, and, when they end, they still have no action or plan to assure a life together.][It is a rare occurrence that they find each other in these solitary moments, and the possibility of such a rapport outside of these spaces "The bond between Roxana and Amy is not a standard bond between two female companions" 1-"The world of the last chapter, particularly Paris is, in a way, the potential space of the dream of Newland and Ellen's love, but there is a sense that passing time and a life already lived makes this era the domain of the new generation." 2-"In the final chapter, "the flower of life" that Archer misses is, in one way, this same dream, one which could not grow in the stifling environment of his youth, before he has a child." "It is this focus on decorum and acting within one's spheres of existence, in all their myriad forms, which certain of Austen's characters, like Lady Bingley, espouse." does not exist.][Similarly, the promises made during these interactions do not endure;] [on the boat, Ellen promises that "I won't go back", a promise we want to believe.][But, it becomes shattered when taken outside of the realm of their space apart;] [such a promise cannot exist when there are such duties to others, when there is a society in which they must fit themselves.] All the quotes were excluded even if they are used to build new clauses.For instance, the following quoted words are excluded: "To "neglect such an acquaintance" would mean a certain disregard for the prospects of his daughters."Sometimes, writers use certain words or phrases as part of their discourse without using quotes while some writers use a colon to show it is a quote.In this study, all these words were excluded because they do not belong to the students' own writing.Here is an example from Satire to highlight this issue where all underlined words were excluded: Data Extracted 2: "Later, he describes how his daughters touch Mr. Thornhill: All my endeavours could scarce keep their dirty fingers from handling and tarnishing the lace on his clothes, and lifting up the flaps of his pocket holes, to see what was there.The image is striking because of the way he describes his own family: dirty fingers which tarnish the clothes suggests a soiling, his family somehow denigrating Mr. Thornhill simply by touching him." So, following Witte's and Faigley's (1981) model of questioning the relationship between cohesion, coherence, and writing quality, the same classifications were used in this paper, which again were adopted in the first place by Halliday and Hasan (1976).These items of lexical cohesion include two major categories.First, reiteration, which includes: the same types, synonyms, super-ordinates and general items.The second category is collocations.All the lexical ties are counted either within or across T-units.
Here is one example of each category; the case of lexical cohesion is underlined: A. Reiteration:

Data Extracted 5:
To further clarify the issue of super-ordinate, here is an example taken from Witte and Faigley (1981); the superordinate case is underlined: "But Jane Austen herself seems to suggest a rethinking of rough demarcations, of pushing one's boundaries and frames in as many ways as possible.An inspection of those things and characters she chooses to paint reveals the ideals she is pushing thematically in the novel, of being constrained by one's context and resisting it." "It is closest, in fact, to the relationship that exists between a husband and his wife." "Some professional tennis players, for example, grandstand, using obscene gestures and language to call attention to themselves.Other professional athletes do similar things, such as spiking a football in the end zone, to attract attention" (p, 193).
Witte and Faigley explained that "professional athletes" is a super-ordinate term for professional tennis players as "professional athletes in other sports are encompassed by the term" (1981, p. 193).Cutting (2008) gave this example of super-ordinate: In explaining the function of the lexical cohesion, the super-ordinate in the previous example, Cutting (2008) wrote, Here again there is a repetition of 'chrysanthemums', but then they are referred with the words 'the flowers'.This not a synonym of 'chrysanthemums'; it is a more general term is known as a super-ordinate, an umbrella term that includes 'pansies', 'tulips', 'roses', etc.This is another way of avoiding repetition and still referring to the referent with a noun.Lawrence could have used an endophoric 'them' instead, and said 'Elizabeth stood looking at them', but this might have given them less prominence; he does want them at the centre of the story (p.12).
(A.4) General item (a case occurs within the T-unit, taken from Pride): Data Extracted 6: The word "things" in the previous example is considered to be a general item, and it apparently refers to the preceding clause, particularly to "a rethinking of rough demarcations, of pushing one's boundaries and frames in as many ways as possible." Cutting ( 2008) provided an explanation for general words when she wrote, " these can be general nouns, as in 'thing', 'stuff'', 'place', 'person', 'woman' and 'man', or general verbs as in 'do' and 'happen.'In away, the general word is a higher level super-ordinate: it is the umbrella term that can cover almost everything" (p.12).It seems that Cutting does not consider that the "general word" is to be linked within the text.In this study, however, all "general words" that do not show relation with the text are excluded.

B. Collocation: (this case occurs within the T-unit, taken from Roxana).
Data Extracted 7: In this example, the word "husband" collocates with the word "wife." Clearly the concept of collocation is a bit vague.In the previous section, it has been shown how Halliday's and Hasan's (1976) categories were different from Hassn's (1984), particularly in which the category of collocation was merged "The candle-light glittered on the luster-glasses, on the two vases that held some of the pink chrysanthemums, and on the dark mahogany.There was cold, deathly smell of chrysanthemums in the room.Elizabeth stood looking at the flowers" (p.12) IJALEL 4(1):261-269, 2015 A-They ascent up the Emmons Glacier on Mt.Rainier is long but relatively easy.
B-The only usual problem in the climb is finding a route through the numerous crevasses above Steamboat Prow.
C-In late season a bergschrund may develop at the 13,0000-foot level, which is customarily bypassed to the right.
with other cohesive ties in Hasan (1984).In elucidating the notion of collocation, Witte and Faigley (1981)  Cutting ( 2008), in her introduction to lexical cohesion, did not mention collocation at all.Therefore, according to her explanations, the example of "camping trip" either is not part of lexical cohesion or it comes under the category of super-ordinate.In this paper, each word and phrase were tested and if they were found to fit under another word or phrase, then they were considered to be occurrences of super-ordinate types.This technique was helpful to avoid any kind of ambiguity between those two categories i.e. super-ordinate and collocation.Some examples of super-ordinate items in this study are: novel/Pride and Prejudice; Darcy/character.And the examples of collocation are: female/women; husbands/men.

Results
Table 2 summarizes the findings of the ranking of the writing quality by the raters, the teaching assistants.The L1 of the participants is shown: The following five tables summarize the findings of the analysis.Table 3 presents the findings of the total number of cohesive ties, Table 4 includes that total number of the cohesive ties within the T-unit, Table 5 discusses the total number of the cohesive ties across the T-unit, Table 6 shows the average number of the cohesive ties within the T-Unit, and finally Table 7 presents the average number of the cohesive ties across the T-Unit.
In readers' ranking, number 1 indicates the best ranking while number 5 shows that the paper has been chosen as the worst.The ranking numbers are put right after each other and each order represents a specific reader's ranking.So, to see how a specific reader ranks the other, look at their order.Also, the asterisk "*" represents the absence of words which indicates that no average has been made, while number 0 represents a zero average of words; in other words, the cohesive ties happens directly after the other.

Discussion
The aim of this study was to highlight the relationship between the number of the lexical cohesive ties used in the academic paper and its writing quality.More specifically, it was to question the relationship between the types of lexical cohesion and the writing quality.According to the results shown in Table 3, the writing quality in Pride was the best according to the readers' perception.Moll was in the second place, Space was in the third position, Satire was the fourth, and Roxana was the lowest ranked.As already stated, Witte and Faigley (1981) showed that "the high rated essays are much more dense in cohesion than the low-rated essays " (p. 195).According to Table 3, the paper that contained the most use of cohesive ties was Roxana.However, this paper was ranked the worst.Therefore, the conclusion drawn in this paper will be clearly different than the one in Witte and Faigley.  3 indicated that the striking contrast between Roxana and Pride resides in the number of the same type of the cohesive ties they had.Roxana contained 160 lexical cohesive ties of the same type while Pride contained only 139.Here is a note from one reader regarding her choice of Roxana as the worst: "repetitive."Clearly, such comment was discouraging in the first place as many studies, cited in the literature review section, have shown that repetition tends to enhance in one way or the other the writing quality.By looking at the results in Table 3, the reader's note is seen clearly.It shows that the high number of uses of the words of the same type in a text diminishes the quality of the writing.Witte and Faigley had the same results but the only difference is that in this study the paper that includes the highest number of cohesive ties is the same paper that has more usages of the "same item" category which makes the final argument in this paper different from Witte's and Faigley's (1981).In other words, the results shown in this paper refute the argument that links the use of cohesive ties to the writing quality and instead it supports the argument that links the writing quality to the type -not the number-of the cohesive ties.
In addition, the issue of choosing Roxana as the worst because of the high number of the same lexical items is enhanced by another issue.Table 4 elucidates a striking difference between Roxana and Pride according to the same item criteria.It shows that Roxana contained 26 lexical ties of the same type which occurred within the T-unit while Pride contained only 11.So, the second part of the argument is that the high number of lexical cohesive ties within the T-unit deteriorates the writing quality.This is also supported by the results shown in Table 5 where there was a minor difference in the number of the same type between Roxana and Pride.Table 4 shows also that the total number of lexical cohesive ties in Roxana has remained the highest.4 and Table 3, again, an interesting difference can be noticed; Roxana had more cohesive ties within T-units (as in Table 4) regarding the same item, whether super-ordinate or collocation.But, in Table 5 which shows the items across the T-unit, Roxana had a higher number only in the same type category.Therefore, the issue that motivated the readers to give that ranking originated not only due to the number of the same type usages but to the location of the cohesive ties.
The average of words used within and across the T-unit shown in Table 6 and Table 7 verify the findings regarding the same type issue.A reader might argue that since it has been shown that writing quality is determined by the use of lexical cohesion of the same type and its use within the T-unit, why then did not the readers of the papers rank Moll as the best since it contains less lexical cohesive ties of the same type, especially within the T-unit.Before attempting to unravel this puzzle, here are two notes from the readers regarding Moll: "choppy;" "Poor lexical choices.Little syntactic complexity.Some new information is introduced incohesively, making the discourse incoherent some times."One side of these comments supports what has been pointed out that writing quality is not all about lexical cohesion, and the other side enhances the argument of this paper regarding the importance of distributing the cohesive ties to include all the different types and not to stick to only one-the same item.It can be seen clearly from Table 3 that Pride has an average of 19.03 words per T-unit while Moll has 17.5; this is one important factor.The second can be seen clearly by looking at the total number of cohesive ties of the same type across the T-unit that Pride has (128 for Pride and 109 for Moll) which further supports the argument that shows the relationship between the writing quality and the use of cohesive ties within the T-unit.Furthermore, the total number of all the cohesive ties within the T-unit between these two papers is very close.However, the gap strikingly increased when examining Table 5 which shows 150 cohesive ties in Pride with only 119 in Moll.
Concerning the type of collocation, it added another layer of difference between the highest rated essay and the lowest one.Roxana used collocation ties 21 times while Pride employed them 14 times.Interestingly, this difference supports the argument of this paper because although Roxana used more collocation ties, those ties were used within the T-unit.But, when it comes to the collocation uses across the T-unit, Roxana used less.

Conclusion
The findings of the study indicate that there are two important factors that influence the writing quality regarding its relation to the lexical cohesion; the use of the lexical item of the same type, and the location of that lexical item.However, it should be noted that this study was concerned about the lexical cohesion, so this might explain why its findings are different from those in Witte's and Faigley's (1981).It should be noted also that writing quality is not all about lexical cohesion, yet it is an effectual factor and must not be ignored as the results above have discerned.A final note regarding the role of lexical cohesion in writing quality is that the use of lexical cohesion is important, but the items need to be distributed to include all different kinds of lexical ties; i.e., same item, synonym, super-ordinate, general, and collocation.In other words, having a high number of one lexical item will clearly affect the writing quality; as Roxana used most of the ties of the same item and is thus being ranked as the worst.
1): Same type (a case occurs within the same T-unit, taken from Roxana): Data Extracted 3: (A.2) Synonym: (a case occurs across T-units, taken from Moll): Data Extracted 4: (A.3) Super-ordinate item: (a case occurs within the same T-unit, taken from Pride).
It is intriguing to investigate (a) what made the readers consider Roxana the worst, and (b) what attracted them to choose Pride to be the best?Roxana had an average of 17.8 of words in T-unit while Pride had 19.03.Table

Table 1 .
The examined papers indicated that "all lexical cohesive relationships which cannot be properly subsumed under lexical reiteration are included in a 'miscellaneous' class called collocation" (p.193).They added that "lexical cohesion through collocation is the most difficult type of cohesion to analyze because items said to collocate involve neither repletion, synonymy, superordination, nor mention of general items" (p.193).They provided the following example to illustrate the vagueness of collocation:Witte and Faigley commented on the previous example showing that:It is much more difficult to analyze.For one of the authors of the present article, antecedent knowledge of mountaineering allows Steamboat Prow to collocate with Mt.Rainier and bergschrund to collocate with glacier.For the other author, neither pair is lexically related by collocation apart from the text where they are connected by inference (p.194).

Table 2 .
The L1 of the raters and their ranking

Table 3 .
Total of cohesive ties Note:

Table 4 .
Total of cohesive ties within the T-unit Note:

Table 5 .
Total of cohesive ties across the T-unit

Table 6 .
The average of cohesive ties within the T-unit Note:

Table 7 .
The average of cohesive ties across the T-unit

261-269, 2015 269 quality
and the lexical ties within the T-units, as it shows that Pride had more lexical ties than Roxana.Also, by examining Table Table4shows a similar average of words of the lexical ties used within.That difference can be interpreted due to the number of words used in each T-unit (Roxana 17.3, Pride 19.3).However, Table7shows a gross distinction between the two papers.The same type has occurred every 77.36 words in Roxana while it occurred every 153.96words in Pride.So, although Roxana used ties of the same type across the Tunit, those ties remained close.