November 3 @ 1:00 pm – 2:00 pm
This is an online event held via Zoom: https://uni-sydney.zoom.us/j/85114748391
Polygenic risk scores (PRS), defined as the weighted sum of risk alleles across a collection of genetic variants, have seen active development for predicting complex traits in recent years. PRS have demonstrated their ability to identify individuals with high disease risk, which can facilitate disease monitoring, prevention, and treatment.
Despite recent progress, two key challenges remain for PRS. First, most PRS have been developed in European populations, due to the dominance of large European cohorts in Genome-Wide Association Studies (GWAS). As a result, PRS derived for non-European populations often exhibit reduced accuracy. Second, GWAS phenotypes are often defined based on simplified criteria, typically grouping related ICD-10 diagnosis codes into binary disease definitions (case vs. control). This simplification potentially neglects the intricate phenotypic relationships embedded in detailed patient records, limiting both statistical power and clinical relevance.
In this talk, I will introduce two PRS frameworks developed to address these challenges. First, I present MIXPRS, an ensemble framework for multi-population PRS. MIXPRS integrates across methods and populations using only GWAS summary statistics, employing a data-fission strategy to generate pseudo-training and pseudo-tuning GWAS. By jointly analyzing diverse approaches, MIXPRS achieves robust and better prediction performance across settings.
Second, I present EEPRS, a framework that integrates trait-specific PRS with EHR-embedding–based PRS. We first extract phenotype embeddings from EHR data using methods such as Word2Vec and GPT, then perform GWAS on these embeddings to derive embedding-based PRS. By combining the GWAS results from embedding phenotypes with trait-specific PRS, EEPRS captures latent disease relationships and improves prediction accuracy.
Leqi Xu is a PhD candidate in Biostatistics at Yale University, advised by Dr Hongyu Zhao. Her research focuses on developing statistical and computational methods for polygenic risk prediction and the integrative analysis of electronic health records and single-cell data.