Measuring Readability

Impacter introduces new measure for readability

Published at 2021-03-22 by Goya van Boven

In an ideal world, all grant applications are probably just evaluated based on their academic merits. However, we know for a fact that reviewers are humans like us and are therefore influenced by the smoothness of a text. Especially when people get tired, statements are easily misunderstood. Like many academics in linguistics, we at Impacter also ask ourselves the question of how we can objectively measure the difficulty of a text. In attempts to answer this question, several tests have been proposed. The most famous of these tests, the Flesch-Kincaid Reading Ease (FKRE) test, is also the readability measure that is currently used in Impacter. The FKRE formula calculates reading ease based on the number of sentences in the text, the amount of words per sentence and the number of syllables in these words. The basic assumption here is that longer words and sentences result in a text that is difficult to read.

Although this method of measuring is popular, it has also received criticism. Among others, the test is said to only measure features of words and syntax, neglecting other important features of a text, such as its vocabulary, grammar and style. A lower FKRE score is said to indicate that a text is of a more advanced level, but this does not necessarily imply that a text is better; long sentences containing long words can be unclear, difficult to follow, or ungrammatical. Similarly, texts with short sentences can lack cohesion and structure, making them more difficult to understand.

This is why I investigated if better readability measures than the FKRE score could be found. In order to do this, I have compared the scores of various measures on a dataset consisting of research proposals that received funding as well as proposals that did not. I have found three measures that seemed good predictors of success, which I will now discuss.

Global Cohesion

The first of these measures is global cohesion. Cohesion is defined as the presence or absence of cohesive ties - explicit cues that help the reader to connect the ideas in a text. These cues have been found to contribute to the text's quality. Such signals could be connectives, e.g. because or for example, or overlapping words: words that occur in multiple parts of the text. There are different kinds of cohesion, which are defined by different parts of a text. Local cohesion refers to cohesion within a sentence, while global cohesion concerns part of a text such as paragraphs. Overall text cohesion is defined over the entire text body.

Research has demonstrated that global cohesion has a positive effect on text quality, as it assures that structure is maintained throughout the text. Global cohesion can, amongst others, be measured through the overlap of so called content word lemmas in adjacent paragraphs. A lemma is the dictionary form of a word, to which all conjugations of a word belong. For example, the verbs were, am and been are all forms of the same lemma. Using the lemma assures that we can still recognize the same concept, even if it appears in a different grammatical form. In our datasets, the positive effect of global cohesion was replicated: the successful proposals showed significantly more content word lemma overlap between a paragraph and its two following paragraphs.

Aiming for the sweet spot: Update global cohesion in your proposal by increasing or decreasing the links between various paragraphs of your proposal.

Sentence structure complexity

Secondly, I have looked into syntactic difficulty, i.e. the complexity of sentence structures. This is usually associated with slower processing times and is considered as a factor that can decrease readability. However, Pitler et al. (2008) have found a positive relationship between readability and several measures of syntactic difficulty. In our dataset this positive effect was found as well, with the number of verb phrases per sentence being the measure that demonstrated this effect most strongly. Although higher syntactic difficulty results in a more complex sentence, this does not necessarily mean the sentence becomes more difficult to understand. Take a look at the following two text segments :

a) It is raining outside. I don’t have to go anywhere. I am relieved. 
b) It is raining outside, but I don’t have to go anywhere so I am relieved. 

Although sentence b) contains more verb phrases than the sentences in segment a), most people would find b) to be easier than a). This is reflected in the results of our dataset, where the successful proposals contained significantly more verb phrases per sentence.

Aiming for the sweet spot: Increase sentence structure complexity in your proposal by combining sentences that make sense together, decrease it by splitting up long sentences in multiple sentences.

Lexical Diversity

The final measure is lexical diversity, the variation of words used throughout a text. This measure has been found to positively correlate with more sophisticated and more difficult texts. Moreover, it is considered to reflect greater linguistic skill. There are various measures for lexical diversity, the simplest being the Type/Token Ratio (TTR). This represents the ratio of unique words (types) to the total amount of words (tokens). A problem with this, and many other lexical diversity measures, is that it is sensitive to text length: as the text gets longer it gets less likely to run into a word type that has not been used yet. The Measure of Textual Lexical Diversity (MTLD) is the only measure independent of text length. It uses the knowledge that TRR drops as the number of words increases, and calculates the number of words needed for the TTR to exceed a certain threshold: the more words are needed, the greater the lexical diversity. In our dataset the successful proposals used a significantly wider variety of words.

Aiming for the sweet spot: Increase your proposal's Lexical Diversity by introducing more synonyms for recurring words.

In conclusion, three new measures have been found that assess the readability of a text, all considering different aspects of the texts and together yielding a more complete estimate of text quality than the FKRE score alone. Combining these three measures explained 26% of the variation between successful and unsuccessful proposals in our data. This is quite a high score considering all factors that play a role in assigning research grants. This score was not further improved by including FKRE.

As a consequence, from today onwards, Impacter will now measure these three variables for new calls in Impacter, starting with the upcoming Vici preproposal.

Although readability is not the only important factor in writing a successful proposal, it can make a difference in how your proposal is received. Therefore: spending a little extra time on improving the readability of your text might just be what your proposal needed to stand out.

Sources

Pitler, E., & Nenkova, A. (2008, October). Revisiting readability: A unified framework for predicting text   quality. In Proceedings of the 2008 conference on empirical methods in natural language processing  
(pp. 186-195).