The Center for Health Security
and Phenotyping (CHSP)

The Center for Health Security and Phenotyping (CHSP) at School of Biomedical Informatics focuses on harmonizing methodologies in computer science, applied mathematics, biostatistics, chemistry and pharmacology to facilitate and speedup biomedical data research and discovery. Led by Xiaoqian Jiang, PhD, the CHSP consists of faculty, staff, programmers, and graduate students. The strength of combining secure and privacy-preserving solutions with advanced machine learning models to meet the emerging needs of healthcare make CHSP unique in exploring massive and sensitive data across different modality and sources. Here is a list of sample projects that are currently conducted by us.

Miran Kim (Assistant Professor)

  1. Efficient multi-key homomorphic encryption with packed ciphertexts
    • We will present practical multi-key variants of homomorphic encryption scheme with packed ciphertexts, which will be used a wide range of applications in secure computation between multiple data owners. We will apply this technology to secure neural networks, where input data and pre-trained model are encrypted under different keys.
  2. Secure and differentially private machine learning for distributed data
    • We will develop secure and privacy-preserving machine learning frameworks by harmonizing homomorphic encryption and differential privacy techniques. This secure technology would protect computation on sensitive data from distributed sources as well as the outcomes of data analysis.
  3. Development of secure genotype-phenotype association models with efficient correction for population stratification
    • We will propose a novel framework to develop secure genotype-phenotype association models with efficient correction for population stratification based on the application of homomorphic encryption.
  4. Practical applications for securely outsourced genomic data analysis
    • This project is to develop secure technology for using patients’ genomic data in clinical applications while ensuring patient information security and privacy during computation. We will develop homomorphic encrypted genome query algorithms to support secure storage and analysis of human genome data.
  5. Secure outsourced genotype imputation using homomorphic encryption
    • This project will provide a secure framework of genotype imputation in genome-wide association study (GWAS) based on homomorphic encryption. This model can securely estimate genotypes of missing variants on encrypted genotypic data.
  6. On-chip private computation of deep neural networks for face recognition
    • We will develop on-chip computations with homomorphic encryption for face recognition. Once we can train homomorphic-encryption friendly neural network models for detection, we will implement a secure evaluation phase on encrypted data on trained models.

Yejin Kim (Assistant Professor)

Computational phenotyping:
Method developments:

  • Federated TF: TF-based phenotyping methods need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible due to institutional privacy policies. We developed a novel solution to enable federated TF for computational phenotyping without sharing patient-level data. Our method can help derive useful phenotypes from EHRs to overcome policy barriers due to privacy concerns.
  • Supervised TF: We developed a novel TF method for generating discriminative phenotypes. One of important characteristics that phenotypes should have is to be discriminative to a certain clinical outcome of interest such as mortality, readmission, cost, et al. To discriminate a high-risk group (high mortality), we incorporated the estimated probability of mortality from logistic regression during the decomposition process.
  • Similarity-aware TF: We developed a novel TF method for generating distinct phenotype. Phenotypes should be distinct from each other, because otherwise clinicians cannot interpret and use the phenotypes easily
  • Multi-modal TF: We developed multi-modal TF method to incorporate other modal data source (such as incorporating demographic data into diagnosis and medication history).


  • Intensive Care Unit (ICU) phenotypes: Using a large publicly available dataset MIMIC-III from critical care units, we derived representative ICU phenotypes: sepsis with acute kidney injury, cardiac surgery, anemia, respiratory failure, heart failure, cardiac arrest, metastatic cancer (requiring ICU), end-stage dementia (requiring ICU and transitioned to comport care), intraabdominal conditions, and alcohol abuse/withdrawal.
  • Combinatorial drug repositioning: Using a Cerner Health Facts dataset (that contains ~50M unique patients from 600 Cerner client hospitals), we derived combinations of drugs that is effective to prevent Alzheimer’s disease.

Reinforcement Learning for optimal treatment and diagnosis

  • A personalized medication regimen for symptom management of Parkinson’s disease (PD): We derived medication regimens that are personalized to individual PD patients to reduce motor functionality impairments. Our model suggested a medication option iteratively that minimize expected motor functionality decline.
  • A personalized test procedure for differential diagnosis: We proposed a novel decision process framework that detect the target disease by iteratively applying tests and reducing the ambiguity of disease diagnoses. It is based on partially observed MDP in which multiple tests can be performed simultaneously in partially observed environments. We developed solving schemes of the proposed decision process using integer programming for incorporating practical constraints. We applied our proposed model to derive a dynamic immunohistochemistry (IHC) staining test procedures that can detect lymphoid neoplasm with high accuracy while minimizing testing burden (i.e., time and cost).

Prediction or real-time detection of outcome

  • Real-time detection of end of EEG suppression: Using the real-time EEG signal data, we detected end of EEG suppression after seizure, which can automatically monitor patient’s status with minimal human’s supervision.

Shaghayegh Agah (Postdoc)

Shaghayegh Agah joined SBMI in January 2019 as a Postdoctoral Research Fellow after receiving her PhD in Chemical Engineering at Rice University. Her research projects include:

  1. Systematic analysis on the long-term effects of co-administered drugs on cancer incidence/recurrence:
    By performing novel machine learning and statistical data analysis on several large biomedical databases, we will identify the long-term effect of frequently co-administered drugs on cancer etiology. We will systematically adjust for selection biases including age, ethnicity, gender, genomics, and risk factors associated with targeted cancers to develop a computational framework and identify drug co-administration patterns that lead to reduced cancer occurrence/recurrence rates. Not only the association between cancer and combined medication will be identified, but their hidden causal relationships will be investigated through additional data analysis steps.
  2. In silico study of cellular signaling pathways of drug combination to predict their preventative (or carcinogenic) effect towards cancer development:
    Frequent drug combinations (at a population level) containing potential cancer protective drug components and chemotherapy drugs will be used to study their signaling pathways using computational or in silico design. The cancer protective drugs will be extracted from the literature including case/control and biological experimental studies. Harmonizing public knowledge bases, we will construct the gene regulatory network of targeted cancers. We will look at the gene interactions and drug-drug interactions to understand if the signaling pathways activated or deactivated by certain drug combinations will contribute in cancer prevention.
  3. Ligand-based chemical virtual screening to promote drug discovery
    Optimized computational methods can help reducing the huge cost of drug discovery which can reach a few billion dollars for one new drug. We investigate the application of machine learning algorithms in ligand-based drug discovery methods. More specifically, we identify targets for certain cancers that are proteins or genes present in cancer pathways and their activities can be altered by certain drugs. Afterwards, we predict the activity of different compounds presents in drug activity data bases toward the identified targets.

Luyao Chen (Scientific programmer)

Luyao Chen is an experienced programmer with 10+ years of experiences (including 8 years at Oracle). He is one of the backbones of our center and supports various collaborative projects, including but not limited to:

  • Speeding up big data analysis and querying by optimzing advanced Greenplum distributed database and graph database Neo4j
  • Develop and maintaining a service for propensity score matching, used by cohort selection in most of our studies
  • Drug repurposing: Use Cerner data to find out drugs or drug combinations for cancer treatment (brain cancer, breast cancer, pancreatic cancer etc.)
  • Sepsis2 Cerner/SBMI competition: serving as the main technical support for the national competition
  • Developing novel harmonizing algorithms for national birth data of 48 years
  • Data preparation for cross-sites diagnosis code embedding
  • Various other collaborative projects within and outside SBMI

Back to Top