Center for Secure Artificial intelligence For hEalthcare (SAFE) at School of Biomedical Informatics focuses on harmonizing methodologies in computer science, applied mathematics, biostatistics, chemistry and pharmacology to facilitate and speedup biomedical data research and discovery. Led by Xiaoqian Jiang, PhD, the SAFE consists of faculty, staff, programmers, and graduate students. The strength of combining secure and privacy-preserving solutions with advanced machine learning models to meet the emerging needs of healthcare make SAFE unique in exploring massive and sensitive data across different modality and sources. Here is a list of sample projects that are currently conducted by us.
Miran Kim (Assistant Professor)
- Efficient multi-key homomorphic encryption with packed ciphertexts
Secure and differentially private machine learning for distributed data
- We will present practical multi-key variants of homomorphic encryption scheme with packed ciphertexts, which will be used a wide range of applications in secure computation between multiple data owners. We will apply this technology to secure neural networks, where input data and pre-trained model are encrypted under different keys.
Development of secure genotype-phenotype association models with efficient correction for population stratification
- We will develop secure and privacy-preserving machine learning frameworks by harmonizing homomorphic encryption and differential privacy techniques. This secure technology would protect computation on sensitive data from distributed sources as well as the outcomes of data analysis.
Practical applications for securely outsourced genomic data analysis
- We will propose a novel framework to develop secure genotype-phenotype association models with efficient correction for population stratification based on the application of homomorphic encryption.
Secure outsourced genotype imputation using homomorphic encryption
- This project is to develop secure technology for using patients’ genomic data in clinical applications while ensuring patient information security and privacy during computation. We will develop homomorphic encrypted genome query algorithms to support secure storage and analysis of human genome data.
On-chip private computation of deep neural networks for face recognition
- This project will provide a secure framework of genotype imputation in genome-wide association study (GWAS) based on homomorphic encryption. This model can securely estimate genotypes of missing variants on encrypted genotypic data.
- We will develop on-chip computations with homomorphic encryption for face recognition. Once we can train homomorphic-encryption friendly neural network models for detection, we will implement a secure evaluation phase on encrypted data on trained models.
Yejin Kim (Assistant Professor)
- Federated TF: TF-based phenotyping methods need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible due to institutional privacy policies. We developed a novel solution to enable federated TF for computational phenotyping without sharing patient-level data. Our method can help derive useful phenotypes from EHRs to overcome policy barriers due to privacy concerns.
- Supervised TF: We developed a novel TF method for generating discriminative phenotypes. One of important characteristics that phenotypes should have is to be discriminative to a certain clinical outcome of interest such as mortality, readmission, cost, et al. To discriminate a high-risk group (high mortality), we incorporated the estimated probability of mortality from logistic regression during the decomposition process.
- Similarity-aware TF: We developed a novel TF method for generating distinct phenotype. Phenotypes should be distinct from each other, because otherwise clinicians cannot interpret and use the phenotypes easily
- Multi-modal TF: We developed multi-modal TF method to incorporate other modal data source (such as incorporating demographic data into diagnosis and medication history).
- Intensive Care Unit (ICU) phenotypes: Using a large publicly available dataset MIMIC-III from critical care units, we derived representative ICU phenotypes: sepsis with acute kidney injury, cardiac surgery, anemia, respiratory failure, heart failure, cardiac arrest, metastatic cancer (requiring ICU), end-stage dementia (requiring ICU and transitioned to comport care), intraabdominal conditions, and alcohol abuse/withdrawal.
- Combinatorial drug repositioning: Using a Cerner Health Facts dataset (that contains ~50M unique patients from 600 Cerner client hospitals), we derived combinations of drugs that is effective to prevent Alzheimer’s disease.
Reinforcement Learning for optimal treatment and diagnosis
- A personalized medication regimen for symptom management of Parkinson’s disease (PD): We derived medication regimens that are personalized to individual PD patients to reduce motor functionality impairments. Our model suggested a medication option iteratively that minimize expected motor functionality decline.
- A personalized test procedure for differential diagnosis: We proposed a novel decision process framework that detect the target disease by iteratively applying tests and reducing the ambiguity of disease diagnoses. It is based on partially observed MDP in which multiple tests can be performed simultaneously in partially observed environments. We developed solving schemes of the proposed decision process using integer programming for incorporating practical constraints. We applied our proposed model to derive a dynamic immunohistochemistry (IHC) staining test procedures that can detect lymphoid neoplasm with high accuracy while minimizing testing burden (i.e., time and cost).
Prediction or real-time detection of outcome
- Real-time detection of end of EEG suppression: Using the real-time EEG signal data, we detected end of EEG suppression after seizure, which can automatically monitor patient’s status with minimal human’s supervision.
Shayan Shams (Assistant Professor)
Privacy-protecting video and Image analysis:
We integrate big data and deep learning techniques to develop Artificial intelligence (AI) models on edge devices for live video processing. In this line of research, we use air-gapped embedding devices to constantly monitor senior people and cognitively impaired patients. These algorithms are capable of constant monitoring of the patients' status without violating their privacy and with minimal human supervision.
Breast cancer screening and diagnosis:
We are developing AI-driven and clinically useful multi-modality pipeline for breast cancer screening and diagnosis by incorporating imaging, mammograms and ultrasound images, and non-imaging information such as EMR and blood biomarkers. Our AI-driven pipeline imitates the clinical screening-to-diagnosis pathway to increase the specificity and sensitivity of breast cancer screening and diagnosis. Additionally, our model will be optimized for embedded edge devices, so it can be employed in mobile mammography units to extend the coverage to underserved communities.
Blueprint for tissue engineering:
The main focus is to develop AI models capable of generating blueprints for human tissue printing. Our end-to-end multi-modality deep learning algorithms use multi-faceted biological data from multi-omics, biomedical imaging and integrate information from each modality to tackle the challenges in soft tissue regeneration.
Periodontal disease screening and diagnosis:
We are developing deep learning models to screen dental X-rays for a variety of periodontal diseases. Our envision algorithm is capable of detecting the region of interests and classifying them to periodontal defects. This algorithm provides per tooth report and can improve periodontal diagnosis and eliminate the use of periodontal probes. Additionally, the algorithm can draw attention to certain image features and/or identify important overlooked image features to compensate for the variation in the skill and experience.
Measuring glioblastoma tumor volume from magnetic resonance imaging (MRI):
We are developing and will implement a comprehensive AI technology to achieve volumetric measurement of GBM tumor, distinguishing non-enhancing tumor infiltration vs edema vs post-radiation changes of FLAIR abnormality and identification of true progression from Pseudoprogression.
Real-time detection of EEG suppression:
We are developing multi-modality deep learning algorithms to use the real-time patient’s EEG signals and video to detect the end of EEG suppression after a seizure. This algorithm can lead to the development of a framework to automatically monitor patients' status with minimal human supervision.
Social media analysis:
As an innovative social sensing technology, social media data can provide real-time georeferenced information on human interests, responses, perceptions, and behavior in various situations. In this research, we aim to develop algorithms and frameworks to derive practical information from social media such as Tweeter. This practical information can help us to identify and investigate risky behaviors that can have correlations to communicable diseases infection or bad habits such as opioid addiction.
Third generation sequence assembly and alignment:
Since low-cost portable 3G sequencers have made on-field sequencing very affordable, we will develop embedded device friendly assembly and alignment programs to provide on-site edge analysis. These algorithms can make personalized medicine a viable and affordable option.
Luyao Chen (Scientific programmer)
Luyao Chen is an experienced programmer with 10+ years of experiences (including 8 years at Oracle). He is one of the backbones of our center and supports various collaborative projects, including but not limited to:
- Speeding up big data analysis and querying by optimizing advanced Greenplum distributed database and graph database Neo4j
- Develop and maintaining a service for propensity score matching, used by cohort selection in most of our studies
- Drug repurposing: Use Cerner data to find out drugs or drug combinations for cancer treatment (brain cancer, breast cancer, pancreatic cancer etc.)
- Sepsis2 Cerner/SBMI competition: serving as the main technical support for the national competition
- Developing novel harmonizing algorithms for national birth data of 48 years
- Data preparation for cross-sites diagnosis code embedding
- Various other collaborative projects within and outside SBMI