How All of Us Data Reveals Scientific Insights
A recent genomic analysis published in PLOS Genetics illustrates how large, diverse population cohorts can generate rare disease insights that are often difficult to obtain through disease-specific registries alone. Drawing on genomic data from more than 13,000 New York City participants enrolled in the National Institute of Health’s All of Us Research Program (All of Us), the study shows how population-scale resources can be used to identify pathogenic variants, founder populations, and ancestry-specific genetic risks that remain poorly documented in many communities.
All of Us is building one of the world’s most robust health databases, with more than 873,000 participants enrolled, including pediatric participants, and over 414,000 whole genome sequences released to researchers, it is helping scientists understand health across populations and across generations.
The work in this published study was led by Dr. Srilakshmi M. Raj, Associate Professor at Albert Einstein College of Medicine, in collaboration with investigators at Montefiore-Einstein and the New York Center for Rare Diseases. Montefiore-Einstein is a NORD Rare Disease Center of Excellence, and the study reflects its broader commitment to rare disease research that is grounded in the needs of its local patient population.
Rare disease insights from a general population cohort
Rare diseases are often investigated through disease-specific registries, case series, or sequencing efforts. These approaches remain essential, but they are often constrained by limited sample sizes and uneven representation of diverse populations – gaps that All of Us was specifically designed to address. Additionally, with over 10,000 known rare diseases, this disease-specific approach can be difficult to scale.
In this study, the investigators took a complementary approach. By analyzing patterns of identity-by-descent across New York City participants in All of Us, they identified seven founder populations, or groups of individuals who share elevated genetic relatedness due to historical bottlenecks or shared ancestry. Within these populations, the team detected 201 pathogenic or likely pathogenic variants that were significantly enriched, including 22 variants that had not previously been recognized as founder alleles.
These findings emerged because the analysis began with a broad, community-based cohort rather than a disease-specific registry. As Dr. Raj explained, the choice of dataset was closely tied to the mission of Montefiore-Einstein as a health system serving the Bronx: “Montefiore-Einstein is one of only a few academic medical centers that serves as the primary health system for an entire county. When I joined the faculty, I wanted to orient my research toward questions that prioritize the residents of the Bronx.”
She noted that, in the absence of a local population biobank, All of Us offered a way to study the genetics of the community at scale: “All of Us provided the best avenue for us to work with our community to do research that benefited our community.”
Undocumented genetic risks in under-represented communities
A central finding of the study was that several enriched pathogenic variants were concentrated in Caribbean-ancestry founder populations, including Puerto Rican and Garifuna groups. Many of these risks have been under-characterized in existing genomic reference datasets, reflecting broader gaps in the representation of diverse populations in genetic research.
The analysis also demonstrated that self-reported race and ethnicity did not reliably predict rare variant burden in this highly admixed urban population. Instead, population-scale genomic data made it possible to identify shared ancestry and founder effects that cut across conventional demographic categories.
Community engagement was a core component of the work. Dr. Raj emphasized that community leaders from the identified populations were involved as collaborators: “We invited community leaders to participate as co-authors on our manuscript and worked closely with leaders from each of the groups we identified.”
From All of Us data to community impact
The New York City findings served as a foundation for broader analyses. Dr. Raj’s team has since expanded this work nationwide using All of Us data, identifying population-specific genetic architecture and rare disease risks across dozens of groups in the United States. They have also conducted large-scale admixture mapping studies that highlight how genetic risk varies across heterogeneous populations, such as Latinos from different geographic and ancestral backgrounds.
Importantly, this research is being translated back into care. Building on findings of elevated cardiomyopathy risk in the Garifuna population, the team received an NHLBI-sponsored BuildUP Trust Challenge Prize to support community-engaged genetic screening and education. Working with trusted community leaders, they are developing approaches to improve awareness and access to genetic services in an under-studied population with a strong presence in the Bronx.
Implications for rare disease research
This study underscores that population-based cohorts are not a substitute for rare disease registries, but they are an increasingly important complement. When combined with careful analysis, community partnership, and clinical expertise, general population datasets can reveal rare disease risks that would otherwise remain invisible, particularly in communities that have historically been overlooked.
For NORD Rare Disease Centers of Excellence, including Montefiore-Einstein, this work highlights the value of integrating population-scale genomic data into a broader rare disease research and care ecosystem — one that supports discovery, improves diagnostic equity, and remains grounded in the communities it serves.
Dr. Raj is a featured speaker at the 2026 NORD Rare Disease Scientific Symposium, happening April 14-15, where she and other experts from the NORD Rare Disease Centers of Excellence will review their rare disease research findings and share lessons learned for other researchers.
Accessing All of Us data for rare disease research
All of Us data is already enabling new research on rare conditions. Scientists are using the program’s genomic and health history data to uncover rare gene‐disease links more reliably than before. Researchers are studying neurofibromatosis, cystic fibrosis, autoimmune diseases, Ehlers-Danlos syndrome, sickle cell disease, and many other conditions.
Researchers interested in conducting similar analyses can apply for access to the All of Us Researcher Workbench at researchallofus.org. The Workbench provides cloud-based tools for analyzing genomic data, electronic health records, physical measurements, and survey responses from a cohort that continues to grow.
This blog post is funded by the Division of Engagement and Outreach, All of Us Research Program, National Institutes of Health. Pyxis Partners Award Number: 1OT2OD038104-01


