Researchers at the Children’s Hospital of Philadelphia (CHOP) and the University of Pennsylvania’s Perelman School of Medicine have employed a deep learning algorithm to identify potential mutations in the noncoding regions of DNA that could increase disease risk. The research findings, published in the American Journal of Human Genetics, lay the groundwork for future detection of disease-associated variants across various common diseases.
Current understanding shows that while certain genome sections code for proteins, over 98% of the human genome does not have this function. Instead, disease-related variants in the noncoding areas frequently participate in controlling protein expression, an aspect known as the “regulatory code.” Genome-wide association studies (GWAS) have contributed significantly to clarifying the clinical significance of many noncoding variants.
The challenge in recognizing specific disease-causing variants within broad regions identified by GWAS persists. Such variants are often located around motifs where transcription factors, specialized proteins, bind to regulate gene expression. These proteins leave a “footprint” when they bind, which researchers can trace to ascertain precise binding sites.
“This situation is comparable to a police lineup,” explained senior study author Dr. Struan F.A. Grant from CHOP. “You’re looking at similar suspects together, so it can be challenging to know who the actual culprit is. With the approach we used in this study, we’re able to pinpoint the disease-causing variant through identification of this so-called footprint.”
Using the ATAC-seq sequencing method and the PRINT algorithm, the researchers examined data from 170 human liver samples, identifying 809 footprint quantitative trait loci associated with DNA-protein interactions. These analyses allowed researchers to determine the strength of transcription factor binding at various sites based on the mutations present.
“This approach helps resolve some fundamental issues we have encountered in the past when trying to determine which noncoding variants may be driving disease,” noted Max Dudek, a PhD student involved in the study. “With larger sample sizes, we believe that pinpointing these casual variants could ultimately inform the design of novel treatments for common diseases.”
Funding for this study came from several sources, including the National Science Foundation Graduate Research Fellowship Program and various National Institutes of Health grants.
The research titled “Characterization of non-coding variants associated with transcription factor binding through ATAC-seq-defined footprint QTLs in liver” was published online on April 17, 2025.


