Our group is interested in using big data to advance precision medicine and health. Our research bridges bioinformatics and statistical genetics in the context of translational research. Most of us are affiliated with the School of Biomedical Informatics at the University of Texas Health Science Center at Houston.
Our group is interested in using big data to advance precision medicine and health.
Two main directions we are currently most excited about are:
1. Analysis of big genotyped cohorts
Modern biobanks include genotypes up to 0.1%-1% of an entire large population. At this scale, genetic relatedness among samples is unavoidably ubiquitous. However, current methods are not efficient for uncovering genetic relatedness at such a scale. We developed a method, RaPID[https://github.com/ZhiGroup/RaPID], for detecting Identical-by-Descent (IBD) segments, a primary embodiment of genetic relatedness. RaPID detected all IBD segments over a certain length in time linear to the sample size. With simulation, we showed that RaPID is orders of magnitude faster than existing methods, while offering higher power, accuracy, and sharper IBD segment boundaries.
2. Analysis of electronic health record (EHR) using deep learning
We have access to multiple EHR databases with over 50 Million patients. We develop deep learning methods for uncovering the logic of medical practice and to help improve efficiency of clinical care. One recent project is about predicting onset risk of heart failure from EHR.
RaPID: Ultra-fast detection of Identical-by-Descent (IBD) segments. Development and evaluation of methods and software for for detecting IBD segments, a primary embodiment of genetic relatedness, from very large genotyped cohorts.
HapSeq-Rare: Calling and Phasing of Rare Variants. Development and evaluation of methods and software for calling and phasing of rare variants from WGS data using haplotype information in reads.
HapSeq2. Our method for genotype calling and phasing for WGS data.
msBayes. Statistical Quantification of Methylation Levels by Next-generation Sequencing.
RaPID. Random Projection-based IBD Detection (RaPID).