Your resource for web content, online publishing
and the distribution of digital products.
«  
  »
S M T W T F S
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AlphaGenome reshapes how scientists interpret mutations

DATE POSTED:June 26, 2025
AlphaGenome reshapes how scientists interpret mutations

A new artificial intelligence tool, AlphaGenome, has been introduced to predict how DNA sequence variations impact gene regulation, now available via API for non-commercial research.

The genome functions as the cellular instruction manual, containing the complete set of DNA that directs an organism’s appearance, function, growth, and reproduction. Small variations within this DNA sequence can alter an organism’s environmental response or disease susceptibility. Deciphering the molecular-level reading of genomic instructions and the implications of minor DNA variations remains a significant challenge in biology.

AlphaGenome is an AI tool designed to more comprehensively and accurately predict how single variants or mutations in human DNA sequences influence a broad range of biological processes that regulate genes. Technical advancements, including the model’s capacity to process long DNA sequences and generate high-resolution predictions, enabled its development. AlphaGenome is currently accessible in preview through the AlphaGenome API for non-commercial research, with plans for a full release in the future.

The AlphaGenome model accepts DNA sequences up to 1 million base pairs in length as input. It then predicts thousands of molecular properties that characterize the sequence’s regulatory activity. The tool can also score the effects of genetic variants by comparing predictions generated from mutated sequences with those from unmutated sequences. Predicted properties encompass gene start and end locations across various cell types and tissues, splicing sites, RNA production levels, and the accessibility, proximity, or protein-binding status of DNA bases.

Video: Google DeepMind

Training data for AlphaGenome originated from large public consortia, including ENCODE, GTEx, 4D Nucleome, and FANTOM5. These consortia experimentally measured gene regulation properties across hundreds of human and mouse cell types and tissues, covering important modalities.

The AlphaGenome architecture incorporates convolutional layers to detect short patterns within the genome sequence. Transformers facilitate information communication across all positions in the sequence. A final series of layers converts the detected patterns into predictions for different modalities. During the training process, computations are distributed across multiple interconnected Tensor Processing Units (TPUs) for single sequences.

This model builds upon Enformer, a prior genomics model, and complements AlphaMissense, which specializes in categorizing variant effects within protein-coding regions. Protein-coding regions constitute 2% of the genome, while the remaining 98%, known as non-coding regions, are critical for orchestrating gene activity and contain numerous disease-linked variants. AlphaGenome provides a new perspective for interpreting these extensive sequences and the variants located within them.

AlphaGenome offers several distinctive features compared to existing DNA sequence models. It analyzes up to 1 million DNA letters and produces predictions at the resolution of individual letters. This long sequence context is important for covering distant gene-regulating regions, while base-resolution is important for capturing fine-grained biological details. Previous models balanced sequence length and resolution, limiting the range of modalities they could jointly model and predict accurately.

Technical advancements within AlphaGenome address this limitation without significantly increasing training resources; training a single AlphaGenome model without distillation took four hours and required half the compute budget used for the original Enformer model. By enabling high-resolution prediction for long input sequences, AlphaGenome can predict the most diverse range of modalities, providing scientists with more comprehensive information regarding the complex steps of gene regulation.

In addition to predicting a diverse range of molecular properties, AlphaGenome can efficiently score the impact of a genetic variant on all these properties within a second. It accomplishes this by contrasting predictions from mutated sequences with those from unmutated ones, summarizing that contrast efficiently using different approaches for various modalities. For the first time, AlphaGenome can explicitly model the location and expression level of splice junctions directly from sequence. This offers insights into the consequences of genetic variants on RNA splicing, a process where parts of the RNA molecule are removed and remaining ends rejoined, relevant to rare genetic diseases like spinal muscular atrophy and certain forms of cystic fibrosis.

AlphaGenome achieves state-of-the-art performance across a wide range of genomic prediction benchmarks. These benchmarks include predicting DNA proximity, whether a genetic variant will increase or decrease gene expression, or if it will alter a gene’s splicing pattern. In producing predictions for single DNA sequences, AlphaGenome outperformed the best external models in 22 out of 24 evaluations. For predicting the regulatory effect of a variant, it matched or exceeded the top-performing external models in 24 out of 26 evaluations. These comparisons included models specialized for individual tasks. AlphaGenome was the only model capable of jointly predicting all assessed modalities, demonstrating its generality.

AlphaGenome’s generality allows scientists to simultaneously explore a variant’s impact on multiple modalities with a single API call. This facilitates more rapid hypothesis generation and testing, eliminating the need for multiple models to investigate different modalities. AlphaGenome’s strong performance indicates it has learned a general representation of DNA sequence in the context of gene regulation, providing a foundation for the wider community. Upon full release, scientists will be able to adapt and fine-tune the model on their own datasets to address specific research questions. This approach offers a flexible and scalable architecture for the future, with potential for extended capabilities, better performance, coverage of more species, or additional modalities through expanded training data.

Dr. Caleb Lareau, from Memorial Sloan Kettering Cancer Center, stated, “It’s a milestone for the field. For the first time, we have a single model that unifies long-range context, base-level precision and state-of-the-art performance across a whole spectrum of genomic tasks.”

AlphaGenome’s predictive capabilities could aid several research areas. In disease understanding, it could help researchers pinpoint potential disease causes and interpret the functional impact of variants linked to traits, potentially uncovering new therapeutic targets. The model is considered suitable for studying rare variants with potentially large effects, such as those causing rare Mendelian disorders. In synthetic biology, its predictions could guide the design of synthetic DNA with specific regulatory functions, for example, activating a gene only in nerve cells but not muscle cells.

In fundamental research, it could accelerate genome understanding by assisting in mapping crucial functional elements and defining their roles, identifying essential DNA instructions for regulating specific cell type functions. For example, AlphaGenome was used to investigate the potential mechanism of a cancer-associated mutation. In a study of T-cell acute lymphoblastic leukemia (T-ALL) patients, researchers observed mutations at specific genomic locations. Using AlphaGenome, it was predicted that these mutations would activate a nearby gene called TAL1 by introducing a MYB DNA binding motif. This replicated the known disease mechanism and highlighted AlphaGenome’s ability to link specific non-coding variants to disease genes.

Professor Marc Mansour, from University College London, commented, “AlphaGenome will be a powerful tool for the field. Determining the relevance of different non-coding variants can be extremely challenging, particularly to do at scale. This tool will provide a crucial piece of the puzzle, allowing us to make better connections to understand diseases like cancer.”

AlphaGenome has current limitations. Accurately capturing the influence of very distant regulatory elements, those over 100,000 DNA letters away, remains a challenge, similar to other sequence-based models. A priority for future work involves increasing the model’s ability to capture cell- and tissue-specific patterns. AlphaGenome has not been designed or validated for personal genome prediction. While it can predict molecular outcomes, it does not provide a complete understanding of how genetic variations lead to complex traits or diseases, which often involve broader biological processes like developmental and environmental factors beyond the model’s direct scope. Efforts are ongoing to improve the models and gather feedback to address these gaps.

AlphaGenome is available for non-commercial use via the AlphaGenome API. Its predictions are intended solely for research use and have not been designed or validated for direct clinical purposes. Researchers globally are invited to communicate potential use-cases for AlphaGenome and to pose questions or provide feedback through the community forum.

Featured image credit