The microbes living on Earth are so plentiful as to be innumerable. Untold. Countless. Not in the hyperbolic sense, but the literal, gobsmacking sense. “It’s estimated there are 100 million times as many bacteria as there are stars in the universe,” says microbiologist Rob Knight, director of UC San Diego’s Center for Microbiome Innovation. “And we know almost nothing about most of them.”
To map the planet’s microbiome—to classify its sundry members and fathom their relationships—would be beyond ambitious. “It’s a crazy idea. Cataloguing the microbial diversity is an immense problem, you know, because there are approximately a trillion species on the planet,” says microbiologist Jack Gilbert, director of the University of Chicago’s Microbiome Center.
It’s funny to hear Knight and Gilbert talk this way. Because seven years ago, the two of them teamed up with microbiologist Janet Jansson, director of biological sciences at Pacific Northwest National Laboratory, to found the Earth Microbiome Project, a positively massive international effort devoted to—you guessed it—cataloguing the planet’s microbiome.
Today, Gilbert, Knight, Jansson, and a few hundred of their colleagues unveiled the inaugural version of that microbial map: the first reference database of bacteria colonizing the planet. To do it, they developed new protocols, analytical methods, and software for identifying and comparing microorganisms collected from every continent. All told, EMP collaborators collected 27,751 samples from organisms and environments around the world, including the human gut, a bird’s mouth, the soil of an Antarctic volcano, a river in Alaska, and the bottom of the Pacific Ocean. Published this week in Nature, the effort represents the work of upwards of 500 researchers from more than 160 institutions in 43 countries around the globe. It’s the most macro study of the microscopic world ever published.
“This article itself is a textbook of ecology,” says microbiologist Martin Blaser, director of New York University’s Human Microbiome Program, who was unaffiliated with the project. “Students in years to come will read it and say: Here’s where a lot of the rules originated—the rules of ecological relationships, the principles for how nature is organized.”
Those organizing principles are too numerous—and, in most cases, too nascent—to recount here. (The 27,751 samples collected for this meta analysis appear in some 100 other studies, half of which have already been published in peer-reviewed journals.) But Knight sums it up: “What was really remarkable about our findings was that this was true across different types of environments, whether we’re talking about microbes on animals, or on plants, or in saline or on non-saline communities,” he says. “Even though the kinds of microbes in these environments are completely different, the ecological principles remain largely the same.” And now microbiologists have a tool to dig up even more of those dynamic principles.
UC San Diego Center for Microbiome Innovation
But compiling the catalogue wasn’t easy. As in previous studies, the researchers classified samples of bugs by sequencing the 16S rRNA gene, which carries unique mutations that act like a bacterial barcode. Once researchers have the sequences for all the bugs in a particular sample, they compare them all to each other and cluster bacteria into groups based on their similarities. Their identities become interdependent.
That kind of interdependence is fine if you’re assessing the diversity of bacteria from a small set of samples, from a specific region—but it makes it makes it difficult for researchers to compare bacteria between environments, or compare their observations to yours. “It really limits the ability to share information, or to accumulate information across studies,” says microbiologist Jon Sanders, a postdoc in Knight’s lab and coauthor on the Nature paper.
It’s especially a problem if you’re dealing with billions of sequences—which is exactly what the Earth Microbiome Project had. Its researchers sequenced the 16S rRNA genes not from the microbes in one sample, or even a few hundred—but all 27,751 of them. This yielded some 2.2 billion sequences.
To put that number in perspective, 10 years ago, Knight published what was, at the time, the most comprehensive analysis of the planet’s microbial makeup. He and co-author Catherine Lozupone combined 16S rRNA sequences from 111 studies, for a grand total of 21,752 sequences. In Knight’s words, the 2.2 billion sequences in this new paper represent a 100,000-fold expansion in our knowledge of the microbial world.
So with the help of some clever algorithms, the researchers classified the 2.2 billion sequences not by clustering them, but by trimming each one down to a stretch of genetic code 90 base pairs long—a completely independent identifier. When the researchers were through scrubbing their data, they had 307,572 unique microbial sequences, almost 90 percent of which were undocumented in existing 16S rRNA databases.
More on Microbes
“I call it the true name of the bacteria,” Sanders says. In practice, a true name means not having to wonder if the microbe you found in a lake in Colorado is the same as the one your colleague found off the coast of San Diego five years ago. And with EMP’s catalogue, researchers can usually identify where a sample originated just by knowing the creatures living in it. “The key thing is, I can see the sequence now and it’s meaningful, and I can see it again in 20 years and it will still be meaningful,” Sanders says. “It gives us the ability to accumulate information about these bacteria across many, many studies going into the future.”
That makes the EMP database more than a resource—it’s also a jumping off point. “It opens the field toward more complicated types of analysis, and serves as a masterful example of how microbiologists and biologists can work together to address much bigger questions” says Nikos Kyrpides, director of the Microbiome Data Science Group at the Department of Energy’s Joint Genome Institute, who was not affiliated with the study. “We need to confront the fact that we’re living on a microbial planet, that the magnitude of work we need to do is enormous, and we can only do it if we work collaboratively.”
To that end, Jansson, Knight, and Gilbert say they’re recruiting more contributors from around the world—to collect samples from a greater range of latitudes and elevations. To target fungi, in addition to bacteria. To sequence not just the 16S rRNA gene, but entire genomes. And to bridge EMP’s catalogue with other databases, like the American Gut dataset.
“There will be more to come,” says Jansson. After all, there are 100 million times as many bacteria as there are stars in the universe. That means there are innumerable billions—maybe trillions—of microbes left to meet.