Cloze reading and predictability: Language in psychosis
Researchers have looked at using large language models (LLMs) such as ChatGPT to investigate psychosis, with predictability of speech as an especially promising area of research. In psychosis, it sometimes occurs that when a person talks, their speech is less predictable – someone might jump from topic to topic, or their speech seems disconnected. A new sentence might be quite unpredictable compared to previous ones. This less predictable speech is often a sign of formal thought disorder. Tools such as LLMs can measure the relation between words and sentences, and thus how predictable they are. In this blog post, we examine the concept of ‘predictability of speech’ in psychosis. It takes us to the man who taught himself a lot of nonsense in the 1890s, a graduate student who perfected an elegant procedure in the 1950s that is used to train modern day programs like chatGPT, and to the use of those type of programs to measure predictability today.
The story of Cloze begins in the 1890s with Ebbinghaus, a German researcher, now better known for conducting memory experimentations on himself. To examine memory, he created some 2,000+ nonsense words like “vec”, “gux”, and so forth, and tested it by personally memorizing every one of them. More relevant here is his contribution to the modern LLMs. The school authority of what was then called Breslau, now Wrocław in Poland, wanted to know if it was wise to extend schooling beyond 5 hours a day. There were concerns about the effect tiredness may have on the mental ability of the children. Ebbinghaus responded by conducting several tests to measure the mental abilities of these children.
Included in this was a test of “omitting letters, syllables, words, or even phrases, from a prose passage and requiring the examinee to restore the passage”. While the method didn’t become well known for measuring school children’s abilities, it quickly caught up elsewhere in the English speaking world. In America, it became popular, in modified formats, as a test for intelligence. Some argued for its better suitability as a measure of “language ability”, but intelligence testing was the biggest game in the town.
Fast forward half a century to the 1950s. A graduate student named Wilson L. Taylor, at Ilinois University, rediscovered this procedure in the context of Gestalt psychology. Taylor was struck by the similarity between filling in the deleted words of a text and the Gestalt principle of ‘closure’, when we can visually perceive a full shape of an object even if parts of it are missing or hidden. He renamed the test to ‘cloze procedure’ after this principle of closure.
The next decade saw the cloze procedure finding new applications. By the 1960s, it was getting use in examining speech and writing in chronic psychosis – schizophrenia. Two steady streams of research were performed at this time. One was to test the ability of participants with schizophrenia to ‘cloze’ – to fill in the masked words in an obscured text correctly. The other was to test their ability to produce language that ‘clozed’ well – testing if you masked words in the speech and writing created by them, whether these masks could be filled correctly by others. Now we have arrived at ‘predictability’ in speech and writing!
Some early research indicates that problems in both aspects were present in acute phases of psychosis. Though large differences between subjects with psychosis and controls were found in many studies of that era, unfortunately, the interest in cloze waned over time. The procedures were cumbersome, requiring tape recordings and the recruitment and training of judges to painstakingly rate the responses.
With the advent of large language models in the last two decades, the approach has gotten more popular again, this time with neural networks trying to fill in the empty spaces in text. The performance of the computer models on “clozing” a sentence and the mistakes made serve as feedback for the models. Through repeated “closing” millions of words and sentences, the models learn relations in human language and get better at the next word prediction exercise. Once trained, the models can be used to calculate if a speaker produces a word that matches what the language model has predicted. This procedure has been used to calculate the predictability of speech produced by subjects with psychosis.
Another problem that LLMs being used for cloze procedures has solved is one of objectivity. Someone who scores the performance might do that differently for each piece of text, or they might become tired. An LLM, once trained, will always give the same score. The workload is also reduced since a computer can automatically check hundreds of pieces of writing or speech in seconds – no more need for tape recorders or large groups of raters. In other words, using LLM scores, we can get quick, objective speaker-related measurements of predictability.
We and other researchers think that LLMs will expand this line of work. Anticipating this growth, some areas of research using predictability are especially interesting.
- How does reduced predictability affect thinking? Some scientists argue that using sentences is essential for conscious thinking. How does the reduced predictability affect thought in patients, and vice versa?
- One age-old wisdom for clinical interviews is that clinicians need to have a conversation with a person with acute psychosis for long enough to detect the presence of unpredictable speech, signaling symptoms such as thought disorder. In short conversations, a clinician might miss peculiarities in words and sentences. Can we demonstrate this with predictability measures? Does speech become less predictable with time in the presence of psychosis? Interestingly, the opposite is likely to happen when talking with someone not affected by psychosis; the more they speak, the more predictable their speech may appear.
There are many other questions that are worth pursuing here. Researchers in the DISCOURSE research consortium investigating speech in people with psychosis, are working on applying the cloze and LLM-based predictability measures for research. In the future, we hope that this research will help in improving clinical care, by measuring and predicting subtle speech predictabilities.
By Alban Voppel, PhD