CRISPR has a problem: the embarrassment of riches.
Since the gene editing system became famous, scientists have been looking for variants with better precision and accuracy.
One search method screens for genes related to CRISPR-Cas9 in the DNA of bacteria and other creatures. Another artificially develops CRISPR components in the laboratory to give them better therapeutic properties, such as greater stability, safety and efficiency in the human body.
This data is stored in databases containing billions of genetic sequences. While there may be exotic CRISPR systems hidden in these libraries, there are simply too many entries to sift through.
This month, a team from MIT and Harvard led by CRISPR pioneer Dr. Taking inspiration from an existing big data approach, Feng Zhang used AI to narrow the sea of genetic sequences to a handful that are comparable to known CRISPR systems.
The AI searched open-source databases containing genomes of unusual bacteria, including those found in breweries, coal mines, cold Antarctic coasts and (no joke) dog saliva.
In just a few weeks, the algorithm has identified thousands of potential new biological “parts” that could form 188 new CRISPR-based systems, including some that are exceedingly rare.
Several new candidates stood out. For example, some could focus more precisely on the target gene for editing with fewer side effects. Other variations are not directly useful, but could provide insight into how some existing CRISPR systems work, for example systems that target RNA, the ‘messenger’ molecule that directs cells to build proteins from DNA.
“Biodiversity is such a treasure trove,” said Zhang. “By doing this analysis we can kill two birds with one stone: we both study biology and may also find useful things,” he added.
A wild hunt
Although CRISPR is known for its gene-editing capabilities in humans, scientists first discovered the system in bacteria where it fights viral infections.
Scientists have long been collecting bacterial samples from nooks and crannies around the world. Thanks to increasingly affordable and efficient DNA sequencing, the genetic blueprint of many of these samples – some from unexpected sources such as pond scum – has been mapped and stored in databases.
Zhang is no stranger to the hunt for new CRISPR systems. “A few years ago we started asking ourselves, ‘What is there besides CRISPR, and are there other RNA-programmable systems in nature?’” Zhang said. MIT News earlier this year.
CRISPR consists of two structures. One of these is a ‘bloodhound’ guide RNA sequence, usually about 20 bases long, that targets a particular gene. The other is the scissor-like Cas protein. Once in a cell, the bloodhound finds the target and the scissors cut the gene. More recent versions of the system, such as base editing or prime editing, use different types of Cas proteins to perform single-letter DNA swaps or even edit RNA targets.
In 2021, Zhang’s lab traced the origins of the CRISPR family tree, identifying an entirely new family lineage. These systems, called OMEGA, use foreign guide RNAs and protein scissors, but they can still easily cut DNA in human cells grown in petri dishes.
More recently, the team expanded their search to a new branch of life: eukaryotes. Members of this family – including plants, animals and humans – have their DNA tightly wrapped in a nutty structure. Bacteria, on the other hand, do not have these structures. By screening fungi, algae and mussels (yes, biodiversity is weird and wonderful), the team found proteins they call Fanzors that can be reprogrammed to edit human DNA – initial evidence that a CRISPR-like mechanism also exists in eukaryotes.
But the goal isn’t just to find shiny new gene editors. Instead, we should harness nature’s ability to edit genes and build a collection of gene editors, each with its own strengths, that can treat genetic disorders and help us understand the inner workings of our bodies.
Collectively, scientists have discovered six major CRISPR systems: for example, some work together with different Cas enzymes, while others specialize in DNA or RNA.
“Nature is amazing. There is so much diversity,” said Zhang. “There are probably more RNA-programmable systems, and we continue to explore and will hopefully discover more.”
That’s what the team built the new AI, called FLSHclust, for. They transformed technology that analyzes mind-bogglingly large data sets – such as software that highlights similarities in large quantities of document, audio or image files – into a tool to hunt for genes linked to CRISPR.
Once complete, the algorithm analyzed gene sequences from bacteria and collected them into groups – a bit like clustering colors in a rainbow, where similar colors are grouped together so it’s easier to find the shade you’re looking for. From here, the team delved into genes associated with CRISPR.
The algorithm searched multiple open source databases, including hundreds of thousands of bacterial and archaeal genomes and millions of mysterious DNA sequences. In total, it scanned billions of protein-coding genes and grouped them into about 500 million clusters. Here, the team identified 188 genes that no one has yet associated with CRISPR and that could form thousands of new CRISPR systems.
Two systems, developed from microbes in animal guts and the Black Sea, used a 32-base guide RNA instead of the usual 20 used in CRISPR-Cas9. Just like a search query, the longer it is, the more accurate the results. These longer RNA ‘queries’ suggest the systems could have fewer side effects. Another system resembles an earlier CRISPR-based diagnostic system called SHERLOCK, which can quickly detect a single DNA or RNA molecule from an infectious invader.
When tested in cultured human cells, both systems were able to cut a single strand of the targeted gene and insert small genetic sequences with an efficiency of about 13 percent. It doesn’t sound like much, but it is a basic that can be improved.
The team also discovered genes for a new CRISPR system that targets RNA previously unknown to science. It appears that this version, and any yet to be discovered, will only be found after careful investigation, cannot be easily captured by sampling bacteria around the world and are thus extremely rare in nature.
“Some of these microbial systems were found exclusively in coal mine water,” says study author Dr. Soumya Kannan. “If someone hadn’t been interested in that, we might never have seen those systems.”
It is still too early to know whether these systems can be used in human gene editing. For example, those that randomly chop off DNA would be useless for therapeutic purposes. However, the AI can mine a vast universe of genetic data to find potential ‘unicorn’ gene sequences and is now available to other scientists for further research.
Image credits: NIH