Can Artificial Intelligence Detect Depression in a Person’s Voice?
MIT scientists have trained an AI model to spot the condition through how people speak rather than what they tell a doctor
Diagnosing depression is a tricky business.
There’s no blood test, no scan, no biopsy to provide hard evidence of something gone awry. Instead, the full weight is on the skill of a trained clinician to make an evaluation based largely on a person’s responses to a series of standard questions. Diagnosis is further complicated by the fact that depression can be exhibited in multiple ways—from apathy to agitation to extreme eating or sleeping patterns.
So, the notion that artificial intelligence could help predict if a person is suffering from depression is potentially a big step forward—albeit one that brings with it questions about how it might be used.
What makes that possible, says Tuka Alhanai, a researcher at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), is the ability of a machine learning model to identify speech and language patterns associated with depression. More importantly, the model she and fellow MIT scientist Mohammad Ghassemi developed was able to recognize depression with a relatively high degree of accuracy through analyzing how people speak, rather than their specific responses to a clinician’s questions.
It’s what Alhanai refers to as “context-free” analysis; in other words, the model takes its cues from the words people choose and how they say them, without trying to interpret the meaning of their statements.
“Instead of telling the model to focus on answers to particular questions, it’s programmed to figure out on its own what it wants to focus on,” she says.
The potential benefit, Alhanai notes, is that this type of neural network approach could one day be used to evaluate a person’s more natural conversations outside a formal, structured interview with a clinician. That could be helpful in encouraging people to seek professional help when they otherwise might not, due to cost, distance or simply a lack of awareness that something’s wrong.
“If you want to deploy models in a scalable way,” she says, “you want to minimize the amount of constraints you have on the data you’re using. You want to deploy it in any regular conversation and have the model pick up, from the natural interaction, the state of the individual.”
Spotting patterns
The model focused on audio, video and transcripts from 142 interviews of patients, about 30 percent of whom had been diagnosed with depression by clinicians. Specifically, it used a technique called sequence modeling, in which sequences of text and audio data from both depressed and non-depressed people were fed into the model. From that, different speech patterns emerged for people with and without depression. For instance, words such as “sad,” “low” or “down” might tend to be paired with voice signals that are flatter and more monotone.
But it was up to the model to determine which patterns were consistent with depression. Then it applied what it learned to predict which new subjects were depressed. Ultimately, it achieved a 77 percent success rate in identifying depression.
The researchers also found that the model needed considerably more data to predict depression solely from how a voice sounded, as opposed to what words a person used. With the latter, when it focused exclusively on text, the model needed to analyze an average of only seven sequences to predict depression. But when using only voice audio, it required 30 sequences. That suggests that the words a person chooses is a better predictor of depression than how they sound.
Algorithmic overreach?
It’s still far too soon to say how an AI model might be incorporated into depression diagnosis. “It’s a step towards being able to analyze more free-form interactions, but it’s only an initial step,” says James Glass, a senior research scientist in CSAIL. He notes that the test sample was “tiny.” He also says that the researchers will want to try to better understand what specific patterns from all the raw data the model identified as indicative of depression.
“These systems are more believable when you have an explanation for what they’re picking up,” he says.
That’s important because the whole idea of using AI in diagnosing mental health conditions has been met with its share of skepticism. It’s already being used in therapy chatbots, such as Woebot, but being involved in actual diagnosis would take the role of machines to another level.
Canadian doctor Adam Hofmann, writing recently in the Washington Post, warned of the possible consequences to what he referred to as “algorithmic overreach.”
“Could false positives, for example, lead people who are not yet depressed to believe they are,” he wrote. “One’s mental health is a complex interplay of genetic, physical and environmental factors. We know of the placebo and nocebo effects in medicine, when blind users of sugar pills experience either the positive or negative effects of a medicine because they have either the positive or negative expectations of it.
“Being told you are unwell might literally make it so.”
Hofmann also raised concerns about how long the conclusions of such AI diagnostic tools could be kept from outside third parties, such as insurers or employers. That anxiety about potential abuse through “depression detectors” was likewise cited in a recent blog post on The Next Web.
Alhanai and Glass have heard the apprehensive speculation about the risks of relying too much on AI models for mental health diagnosis. But they say that their research is geared to helping clinicians, not replacing them.
“We’re hopeful we can provide a complementary form of analysis,” Glass says. “The patient isn’t with the doctor all the time. But if the patient is speaking at home into their phone, maybe recording a daily diary, and the machine detects a change, it may signal to the patient that they should contact the doctor.
“We don’t view the technology making decisions instead of the clinician,” he adds. “We view it as providing another input metric to the clinician. They would still have access to all the current inputs they use. This would just be giving them another tool in their toolbox.”