The Race to Develop Artificial Intelligence That Can Identify Every Species on the Planet
Scientists are building machine-learning-powered software that can recognize a species based solely on a cellphone picture
For about two decades, a team of scientists has been studying and categorizing a special type of mushroom: Hebelomas. But identifying this genus of fungus is extremely difficult. As Pete Bartlett, one of the veteran researchers on the Hebeloma Project, explains, the tiny sprouts have a reputation as “little, little brown mushrooms” that you probably should avoid eating. Often found in woodlands, different species of Hebeloma—there are more than 100 known species, and they’ve been found in more than 50 countries—can look remarkably similar.
Now, though, the team is working on a new strategy that involves artificial intelligence. After collecting about 10,000 samples of these troublesome mushrooms, the researchers have built an algorithm that can make an informed guess about which specific species of Hebeloma they’re analyzing. The idea, Bartlett says, is that any “enthusiastic mycologist” with a microscope can take measurements of a mushroom, enter those measurements on the website and receive a ranked list of what species that mushroom is likely to be. These mushrooms, Bartlett says, often interest people because they’re extremely common—and can possibly promote the growth of trees.
“You can tell an A.I. something about what species you’ve classified and how you did it,” explains Bartlett, who serves as the project’s website manager and also has a background in mathematics. “It should be able to get to the same species as well.”
This technology is only meant to be used by a relatively small group—Bartlett estimates that about 100 people, a mix of fairly intense citizen scientists and professionals, come to the website every month. But the prospect of using artificial intelligence, including machine learning, to identify species of animals, plants and fungi is catching on in wider and wider circles.
Back in 2011, researchers at the Smithsonian Institution, along with those at several universities, developed an app called LeafSnap, which aimed to identify plants based on the shape of their leaves. The system incorporated computer vision to automatically identify the plants’ shape and compare them to a pre-established data set of plant leaves. Today, like the researchers behind the Hebeloma Project, computer scientists and others have been looking at using machine learning algorithms trained on databases of images to identify species.
A Microsoft program called Wild Me developed a platform based on crowdsourced images to identify species, from giraffes to sharks to seals—the goal is to use species identification software to better monitor animal populations and fight the risk of extinction. The British nonprofit Conservation AI is working with Nvidia, a leading supplier of A.I. hardware and software, on a similar project that tracks rare species in real time so that conservationists can protect them.
Google, along with researchers at the Australian Acoustic Observatory at the Queensland University of Technology, recently announced that it’s using A.I. to identify birds based on recordings of their calls. One model the team developed identifies the sound of the glossy black cockatoo, for example.
One of the most popular species identification tools is iNaturalist, a social network for citizen scientists that’s supported by the Gordon and Betty Moore Foundation, as well as other major donors and small-dollar donations. Since its creation in 2008, the iNaturalist app has logged more than 145 million observations from around the world.
With that humongous data set, iNaturalist has since developed an algorithm that uses computer vision to identify a species, based on previous images that have been fed into its system. The iNaturalist system also includes a geospatial model that factors in the location where an observation is made. Users can upload pictures of plants and animals—and then A.I. might contribute to suggestions of what that species may be.
“It’s really a combination of those three probabilities, [including] what does it look like for the computer vision model [and] what’s expected based on where you are,” says Scott Loarie, the co-director of iNaturalist. “It’s interesting to think about how those different pieces of evidence or different pieces of prediction come together to really give you high confidence on what it is.”
The system now includes about 76,000 identifiable species, Loarie says. While the data includes an observation of about 500,000 species overall—it’s estimated there are more than 2 million species worldwide—the model can only be trained to identify a species when there have been at least 100 observations collected. The system, Loarie explains, values accuracy over precision, so it may be more “confident” about the genus than the species.
The iNaturalist algorithm isn’t the only system built with the platform’s data. Google Lens, a Google image recognition technology, is also partially trained on iNaturalist data in order to recognize images of species.
This data is used for more than just research. Deep Learning Analytics, an algorithms startup that was acquired by General Dynamics Mission Systems in 2019, also made extensive use of iNaturalist data as part of a contract with the U.S. Department of Defense. The idea was to build an app, called BioThreat ID, for the military to identify invasive species like vipers and inedible fungi, according to documents obtained through a public records request.
“You can imagine you have a special operator out in the field on a mission and they are alone. They don’t have any contact with any kind of command structure. They don’t want to give away their position,” says Jeremy Trammell, a senior data scientist at Deep Learning Analytics who worked on the project with help from Virginia Tech professor Jim Egenrieder. “Maybe they’re running low on supplies, and they’re asking the question: Can I eat this?”
Today, the app is functional, but General Dynamics Mission Systems hasn’t made it widely available to download.
As Bartlett explains, building these models doesn’t necessarily require a big data set or collaborating with a Big Tech company. In the case of the Hebeloma Project, the researchers bought an off-the-shelf algorithm, adjusted it based on the measurements they wanted to consider, loaded the data they’ve previously collected into the software and got to work.
“You don’t have to be Google or somebody that actually has an enormous farm of computers learning the entire internet,” explains Bartlett. “This was learning a relatively small amount of data—10,000 mushrooms, 15 measurements in each—and you run it on one computer.”
Like other forms of A.I., however, these tools have limitations. In the case of the Hebeloma Project, the team is looking for ways to go beyond the measurements it currently processes and incorporate image recognition, too. Even with that technology, the supply of images for different species can limit the ability of A.I. to actually identify what it’s analyzing. The iNaturalist system is far more accurate for more commonly sighted species and far less so for rare ones.
Still, the hope is that with better training, this technology will continue to improve. For instance, Loarie, from iNaturalist, says he’s looking forward to large language models eventually being able to provide more context about the species someone is looking at.
“We think the techniques will become more general, acting on images, data from sequences and morphological data all simultaneously to provide determinations,” Bartlett adds. “Also, A.I. techniques will likely be used to determine boundaries between species themselves, rather than requiring humans to do that initial groundwork.”