Surface Readability

The current state of the “short words and sentences” approach to readable writing

10 min readApr 19, 2021

Most advice today on making your writing more readable is based on a 1948 psychology paper that emphasized shorter words and sentences. In the article below I discuss that classic tradition, plus a few recent variations, as useful background for the larger approach I develop in other pieces. I’ll cover:

Rudolf Flesch’s model of readability based on short words and sentences
Tools like Grammarly that are based on grammatical analysis and ranked word-lists
Special dialects such as Plain English and Simplified Technical English

The Contributions of Rudolf Flesch

I first heard about readability over lunch with my friend Tom Ladwig, a journalism professor and syndicated columnist. On a napkin he listed ten principles from The Art of Readable Writing³ by Rudolf Flesch, things like “prefer the simple word”. From memory he wrote out Flesch’s formula for measuring readability, which was based entirely on average syllables per word and average words per sentence. He told me that the Nobel Prize-winning author Ernest Hemingway often wrote at the 8th-grade level. “Any good editor will tell you,” he said, “don’t use a 25-cent word when a nickel word will do.”

Rudolf Flesch, a major force in 20th-century readability and literacy, was not a native English speaker. After fleeing the Nazi invasion of his native Austria in 1938, he discovered that his law degree was no good in the US.⁵ He worked at a book-printing company briefly before accepting a refugee scholarship at Columbia University, where he earned a PhD in Library Science in 1942. His doctoral work on readability focused on the value of: 1) shorter words and sentences, and 2) words with fewer prefixes and suffixes.

In those days, testing readability was difficult and impractical. Researchers checked words against ranked vocabulary lists and checked things like number of unique words, size of sentences and paragraphs, and grammatical features such as the number of prepositional phrases — while waiting, at the end, of the line, in his bright red hat, and so on.² They also checked physical features like print size, line spacing, and even the weight of the book. Before computers, this was too much work for most publishers and print shops.

Flesch cut through all that with an approach that was simple but very effective. In 1948 he published A New Readability Yardstick¹ in the Journal of Applied Psychology. His readability test boiled things down to just two variables: average syllables per word, and average words per sentence. That’s all. Anyone could calculate the score of a text in a few minutes with a pencil. It wasn’t perfect. Flesch himself noted that an unreadable sample could have a good score — for instance, by using obscure words that happened to be short. But he showed that a good test result usually correlated with faster reading and better comprehension, and he demonstrated this with test scores for some well-known magazines. With that paper, and his 1949 book The Art of Readable Writing, he started a readability revolution that helped some publications drive up their readership by 40% or more.

Flesch went on to publish many more books on literacy and the value of readable language. His research argued for phonics-based reading instruction, rather than memorization. His work helped inspire The Cat in the Hat (1957) and other Dr. Seuss books.⁶ (For more on the debate over phonics versus whole-word recognition, see my pieces on challenged readers and writing for low cognitive load.)

Approaches Based on Flesch

Flesch-Kincaid

Seventy years after that first paper, it’s a tribute to Flesch that most current readability measures are still just variations on his original idea. The most popular approach is probably the Flesch-Kincaid grade-level score, which Flesch helped develop in the 1970s for the US Navy.⁷ Flesch-Kincaid is a common feature in text-editing software. It’s more tolerant of long words than the original formula, and instead of the original 0-to-100 score, it provides an estimate of grade-level difficulty. Here’s the formula:

…And Many Others

The site Readable.com lists 15 readability metrics developed for education or for legal, business, or technical writing. Of those, 13 are variations on the short-word-and-sentence approach. For instance, they may count letters instead of syllables. (This is easier to compute; one example is the Automated Readability Index.⁹) Or they only count one-syllable words. Or only words of three or more syllables. Or they are designed for faster counting, or easier math.

But the approach has clear limits. For instance, for readers in the early grades, many one-syllable words like pelt or suave are meaningless. So although it’s common to target an “8th-grade reading level”, Flesch-type scores that don’t use graded word lists are best targeted at adults, not children. Even then, studies show that for text with uncommon terms such as medical or technical jargon, tests based on word length may grossly underestimate the reading difficulty for educated adults.

Word Lists and Computation

Remember the complicated readability metrics that Flesch’s formula replaced? Computers and artificial intelligence make those approaches much more practical, so we’ve seen some of them come into common practice.

Where Flesch used syllables as a gross marker of word difficulty, these approaches use lists of words that have actually been scored for difficulty. For instance, the Readable.com site describes two word-list approaches, Spache and Dale-Chall, in which lists of words are assigned points or categories of difficulty by experts.

The list below shows some other interesting options, and you can find more on the Web.

Lexile is a hybrid approach to readability that uses sentence length and word difficulty.¹³ It uses a word list that assigns word difficulty based on how often the word appears in common text. My earlier examples of pelt and suave are less common than apple or banana, so they’re considered harder. Books for young readers are routinely tested and assigned Lexile scores by computers. Roughly the same thing happens to about half of US students in grades 1–12 every year, so that students can easily be matched with books at their level. If you’ve never played with readability tools, you might be surprised that Lexile rates at least two Hemingway novels as more readable than the first Harry Potter novel, and Stephen Hawking’s A Brief History of Time as easier than Robinson Crusoe.
The Hemingway Editor is a fun editor or paste-in tool that highlights your long words & sentences plus other simple issues such as too many adverbs, passive voice, etc.¹⁴
Grammarly¹⁰ and Analyze My Writing¹¹ are two examples of tools that can analyze grammatical complexity and even comment on your style and voice. If you’re a writer, you’ll learn while playing with these tools, and probably have fun. Their detailed explanations are interesting and well-written — a good sign that they know their stuff!
Acrolinx¹², another AI tool in this increasingly-crowded market, is a configurable content-checker you can install into your favorite editing software. Designed for enterprises, not individuals, it has the ability to create sophisticated custom rules. These can help you get serious about reading difficulty, handle specialized vocabulary, speak in consistent voices, and avoid words that translate to something offensive in other languages. Any company with a global audience or very specialized needs and voices would be wise to investigate this kind of tool.
Beyond Surface Characteristics: A New Health Text-Specific Readability Measurement is a 2007 study in which authors measured readability of medical notes that patients might receive from doctors.¹⁵ Their hybrid approach included checking words against a list of medical terms scored for general readers. When compared to Flesch-Kincaid and two other common Flesch-type scores, their hybrid was a better predictor of the difficulty of medical text for non-medical readers. As an interesting side note, the score they create is the “distance” of a text from an ideal text. If you’re mathematical, this might suggest the idea of a vector-based approach to readability, showing a text’s cosine distance from a sampled ideal text for features like domain, genre, and style or voice. It’d be fun to test this kind of distance-from-the-ideal metric to judge entries in contests where people submit humorous imitations of authors like Hemingway¹⁷ or Bulwer-Lytton¹⁸.
Domain-Specific Iterative Readability Computation is a 2010 study in which the authors seek a way of scoring text for many different domains (e.g., scientific or technical fields).¹⁹ Instead of a list of words scored by frequency or by human experts, they used AI to generate an ontology or knowledge graph, meaning a map of the relationships between specialized terms that appear in text from that domain. We might expect, for instance, that a word with links to many other words is more readable to people in that field than a word that has very few such links. Even in this early test, the approach showed results similar to expert-based word lists, but has the advantage of not requiring a scored word list created by a human expert.

Plain English and Other Simplified Dialects

Finally, I’ll quickly discuss examples of people trying to simplify the language that’s available to writers. They use a simplified vocabulary and may include grammar rules like “use active voice” and “avoid pronouns”. Readability is not the only reason these exist. They can create text that is:

More understandable for non-native speakers or other challenged readers
More understandable for ordinary readers
Formally analyzable, typically for computing or automation

Some of these projects are considered movements. For instance there have been several organized movements to make legal, scientific, or bureaucratic language easier to read. Others fall into the formal category of controlled natural languages. There are at least 17 of these in English.

Here are three versions of simplified English that you may see in discussions of readability.

Plain English is a movement that Rudolf Flesch was part of, which argued for replacing heavy professional jargon with ordinary English that everyone can understand — especially in the contexts of legalese and bureaucracy.²¹
Basic English was a very simple subset of English, designed by linguist David Ogden in 1930 to facilitate teaching English internationally.²² It was popular just after WWII, and lives on in the form of an 850-word English vocabulary (list) used by some teachers in Asia. Fans claim that this short list has 90% of the power of a full English vocabulary, and can be taught in 40 hours. (I push back a little on this claim in my article Readability Equals Translatability.)
Simplified Technical English is a controlled dialect used by tech writers for content intended for international use.²³ Companies that offer STE products claim that it reduces ambiguity, is more readable to non-native speakers, and translates well even with automatic translation.

Last Thoughts

“Readability” is a factor in a larger equation that also includes the cultural and technical literacy required to understand what’s being said. And it involves more than just words and sentences. Every text, from a joke or recipe to a novel or encyclopedia, has genre, goals, structure, presentation, style, voice, and so on. Each of these is a factor in written communication. Flesch did not consider his quick formula to be the whole game, and neither should we. Making something “readable” does not end with short words and sentences. I discuss a more multi-dimensional approach in other articles.

For now, here’s one example. In that original 1948 paper, Flesch actually suggested two metrics: the Reading Ease Score described above, and a simple Human Interest Score. That score looks for personal words and sentences — things like personal pronouns, quoted speech, and sentence fragments, as well as questions, commands, or other speech addressed to the reader. Like his readability score, his interest score sacrificed a lot of detail in favor of a simple predictor that worked pretty well. In his 1951 book How to Test Readability⁴, Flesch said “As a matter of fact, I consider the Human Interest Score more important than the Reading Ease Score… Reading ease simplifies the job of reading; but human interest provides motivation — which is much more important.”

Resources

Articles in This Series

A Cognitive Model of Reading - The science of how people read
Surface Readability - The value and limits of simple “short words & sentences” approaches to readability, and some interesting variations
A Deeper Readability - Techniques that go beyond surface-based approaches, based on cognitive science and other sources
Writing for Challenged Readers - About ESL & dyslexic readers, and what they said helps them most
Readability Equals Translatability - How the right approach to readability becomes a scalable approach to fast, consistent translation across multiple languages, how that works in a modular, single-source content management system, and whether language must be “dumbed-down” to achieve readability

Bibliography

[1] A New Readability Yardstick, Rudolf Flesch 1948, Journal of Applied Psychology. You can read it for free in [2] below.

[2] The Classic Readability Studies, William H. DuBay, editor

[3] The Art of Readable Writing, Rudolf Flesch, 1949. This book contains a lot of Flesch’s original, extensive thoughts on readability. Twenty-first century readers are likely to find some of his examples and language archaic and offensive.

[4] How To Test Readability, Rudolf Flesch, 1951

[5] Dr. Rudolf Flesch, 75, Authority on Literacy, NY Times, Oct 7 1986

[6] The Cat in the Hat on Wikipedia

[7] Flesch-Kincaid Readability Tests on Wikipedia

[8] Readability Formulas on Readable.com

[9] Automated Readability Index on Wikipedia

[10] Grammarly.com

[11] AnalyzeMyWriting.com

[12] Acrolinx product site

[13] Lexile on Wikipedia

[14] The Hemingway Editor is a fun editor that highlights your long words & sentences plus other issues such as too many adverbs, passive voice, etc.

[15] Beyond Surface Characteristics: A New Health Text-Specific Readability Measurement, Hyeoneui Kim, Sergey Goryachev, Graciela Rosemblat, Allen Browne, Alla Keselman, and Qing Zeng-Treitler, 2007

[16] Cosine Similarity on Wikipedia

[17] International Imitation Hemingway Competition on Wikipedia

[18] Bulwer-Lytton Fiction Contest website

[19] Domain-Specific Iterative Readability Computation, Jin Zhao and Min-Yen Khan, 2010

[20] Controlled natural languages on Wikipedia

[21] Plain English on Wikipedia

[22] Basic English on Wikipedia

[23] Simplified Technical English on Wikipedia