For a full list of publications, visit Dr. Cui Tao Google Scholar Profile
Xinyuan Zhang, Ziqian Xie, Yang Xiang, Imran Baig, Mena Kozman, Carly Stender, Luca Giancardo, Cui Tao; JMIR Dermatol 2022;5(4):e39113 Automatic skin lesion recognition has shown to be effective in increasing access to reliable dermatology evaluation;
however, most existing algorithms rely solely on images. Many diagnostic rules, including the 3-point checklist, are not considered
by artificial intelligence algorithms, which comprise human knowledge and reflect the diagnosis process of human experts. In this paper,
we aimed to develop a semisupervised model that can not only integrate the dermoscopic features and scoring rule from the 3-point checklist
but also automate the feature-annotation process. We first trained the semisupervised model on a small, annotated data set with disease
and dermoscopic feature labels and tried to improve the classification accuracy by integrating the 3-point checklist using ranking
loss function. We then used a large, unlabeled data set with only disease label to learn from the trained algorithm to automatically
classify skin lesions and features. After adding the 3-point checklist to our model, its performance for melanoma classification improved
from a mean of 0.8867 (SD 0.0191) to 0.8943 (SD 0.0115) under 5-fold cross-validation. The trained semisupervised model can automatically
detect 3 dermoscopic features from the 3-point checklist, with best performances of 0.80 (area under the curve [AUC] 0.8380),
0.89 (AUC 0.9036), and 0.76 (AUC 0.8444), in some cases outperforming human annotators. Our proposed semisupervised learning framework can
help with the automatic diagnosis of skin disease based on its ability to detect dermoscopic features and automate the label-annotation
process. The framework can also help combine semantic knowledge with a computer algorithm to arrive at a more accurate and more
interpretable diagnostic result, which can be applied to broader use cases.
View details at DOI: 10.2196/39113
Funded by: This research was partially supported by UTHealth Innovation for Cancer Prevention Research Training Program Pre-doctoral Fellowship
(Cancer Prevention and Research Institute of Texas Grant No. RP160015 and No. RP210042)
Abstract
Muhammad Tuan Amith, Licong Cui, Degui Zhi, Kirk Roberts, Xiaoqian Jiang, Fang Li, Evan Yu & Cui Tao; BMC Bioinformatics 23 (Suppl 6), 281 (2022) Model card reports aim to provide informative and transparent description of machine learning models to stakeholders.
This report document is of interest to the National Institutes of Health’s Bridge2AI initiative to address the FAIR challenges with
artificial intelligence-based machine learning models for biomedical research. We present our early undertaking in developing an
ontology for capturing the conceptual-level information embedded in model card reports. Sourcing from existing ontologies and
developing the core framework, we generated the Model Card Report Ontology. Our development efforts yielded an OWL2-based artifact
that represents and formalizes model card report information. The current release of this ontology utilizes standard concepts and
properties from OBO Foundry ontologies. Also, the software reasoner indicated no logical inconsistencies with the ontology.
With sample model cards of machine learning models for bioinformatics research (HIV social networks and adverse outcome prediction
for stent implantation), we showed the coverage and usefulness of our model in transforming static model card reports to a computable
format for machine-based processing. The benefit of our work is that it utilizes expansive and standard terminologies and scientific
rigor promoted by biomedical ontologists, as well as, generating an avenue to make model cards machine-readable using semantic web
technology. Our future goal is to assess the veracity of our model and later expand the model to include additional concepts to
address terminological gaps. We discuss tools and software that will utilize our ontology for potential …
View details at DOI 10.1186/s12859-022-04797-6
Funded by: This research was partially supported by NIH award No. RF1AG072799.
Abstract
Yi Nian, Xinyue Hu, Rui Zhang, Jingna Feng, Jingcheng Du, Fang Li, Larry Bu, Yuji Zhang, Yong Chen & Cui Tao; BMC Bioinformatics 23 (Suppl 6), 407 (2022) To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive
and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug
repurposing. Our objective is to construct a knowledge graph from literature to study the relations between Alzheimer’s disease (AD)
and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression.
We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier
and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110
filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates
that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two
(MR = 10.53, Hits@1 = 0.28). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting
evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge.
This paper shows that our graph mining model can predict reliable new relationships between AD and other
entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven
knowledge discoveries and the generation of novel hypotheses.
View details at DOI 10.1186/s12859-022-04934-1
Abstract
Samuel Wang, Jingcheng Du, Lu Tang, Cui Tao;
Studies in health technology and informatics, 2022 Jun 6;290:607-611. Measles is a highly contagious cause of febrile illness typically seen in young children.
Recent years have witnessed the resurgence of measles cases in the United States. Prompt understanding
of public perceptions of measles will allow public health agencies to respond appropriately promptly.
We proposed a multi-task Convolutional Neural Network (MT-CNN) model to classify measles-related tweets
in terms of three characteristics: Type of Message (6 subclasses), Emotion Expressed (6 subclasses),
and Attitude towards Vaccination (3 subclasses). A gold standard corpus that contains 2,997 tweets
with annotation in these dimensions was manually curated. A variety of conventional machine learning
and deep learning models were evaluated as baseline models. The MT-CNN model performed better than
other baseline conventional machine learning and the signal-task CNN models, and was then applied
to predict unlabeled measles-related Twitter discussions that were crawled from 2007 to 2019, and
the trends of public perceptions were analyzed along three dimensions.
View details at DOI: 10.3233/SHTI220149
Abstract
Yang Xiang, Jingcheng Du, Kayo Fujimoto, Fang Li, John Schneider, Cui Tao; The Lancet HIV, November 08, 2021 In 2019, the US Government announced its goal to end the HIV epidemic within 10 years,
mirroring the initiatives set forth by UNAIDS. Public health prevention interventions are a crucial part
of this ambitious goal. However, numerous challenges to this goal exist, including improving HIV awareness,
increasing early HIV infection detection, ensuring rapid treatment, optimising resource distribution,
and providing efficient prevention services for vulnerable populations. Artificial intelligence has
had a pivotal role in revolutionising health care and has shown great potential in developing effective
HIV prevention intervention strategies. Although artificial intelligence has been used in a few HIV
prevention intervention areas, there are challenges to address and opportunities to explore.
View details at
DOI 10.1016/S2352-3018(21)00247-2
Project: Using big data and deep learning on predicting HIV transmission risk in MSM population
Funded by: NIH award 1R56AI150272-01A1
Abstract
Jingcheng Du, Qing Wang, Jingqi Wang, Prerana Ramesh, Yang Xiang, Xiaoqian Jiang, Cui Tao;
Journal of the American Medical Informatics Association, Volume 28, Issue 9, September 2021, Pages 1964–1969 Clinical trials are an essential part of the effort to find safe and effective prevention and treatment
for COVID-19. Given the rapid growth of COVID-19 clinical trials, there is an urgent need for a better clinical trial
information retrieval tool that supports searching by specifying criteria, including both eligibility criteria and
structured trial information. We built a linked graph for registered COVID-19 clinical trials: the COVID-19 Trial Graph,
to facilitate retrieval of clinical trials. Natural language processing tools were leveraged to extract and normalize
the clinical trial information from both their eligibility criteria free texts and structured information from
ClinicalTrials.gov. We linked the extracted data using the COVID-19 Trial Graph and imported it to a graph database,
which supports both querying and visualization. We evaluated trial graph using case queries and graph embedding.
The graph currently (as of October 5, 2020) contains 3392 registered COVID-19 clinical trials, with 17 480 nodes
and 65 236 relationships. Manual evaluation of case queries found high precision and recall scores on retrieving
relevant clinical trials searching from both eligibility criteria and trial-structured information. We observed
clustering in clinical trials via graph embedding, which also showed superiority over the baseline (0.870 vs 0.820)
in evaluating whether a trial can complete its recruitment successfully. The COVID-19 Trial Graph is a novel representation
of clinical trials that allows diverse search queries and provides a graph-based visualization of COVID-19
clinical trials. High-dimensional vectors mapped by graph embedding for clinical trials would be potentially
beneficial for many downstream applications, such as trial end recruitment status prediction and trial
similarity comparison. Our methodology also is generalizable to other clinical trials.
View details at DOI 10.1093/jamia/ocab078
Funded by: This research was partially supported by NIH award Nos. R56AI150272 and R01AI130460
Abstract
Jingcheng Du, Yang Xiang, Madhuri Sankaranarayanapillai, Meng Zhang, Jingqi Wang, Yuqi Si, Huy Anh Pham, Hua Xu, Yong Chen, Cui Tao;
Journal of the American Medical Informatics Association, Volume 28, Issue 7, July 2021, Pages 1393–1400 Automated analysis of vaccine postmarketing surveillance narrative reports is important to
understand the progression of rare but severe vaccine adverse events (AEs). This study implemented and evaluated
state-of-the-art deep learning algorithms for named entity recognition to extract nervous system disorder-related
events from vaccine safety reports. We collected Guillain-Barré syndrome (GBS) related influenza vaccine safety
reports from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016. VAERS reports were selected
and manually annotated with major entities related to nervous system disorders, including, investigation,
nervous_AE, other_AE, procedure, social_circumstance, and temporal_expression. A variety of conventional
machine learning and deep learning algorithms were then evaluated for the extraction of the above entities.
We further pretrained domain-specific BERT (Bidirectional Encoder Representations from Transformers) using
VAERS reports (VAERS BERT) and compared its performance with existing models. Ninety-one VAERS reports were
annotated, resulting in 2512 entities. The corpus was made publicly available to promote community efforts
on vaccine AEs identification. Deep learning-based methods (e.g., bi-long short-term memory and BERT models)
outperformed conventional machine learning-based methods (i.e., conditional random fields with extensive features).
The BioBERT large model achieved the highest exact match F-1 scores on nervous_AE, procedure, social_circumstance,
and temporal_expression; while VAERS BERT large models achieved the highest exact match F-1 scores on
investigation and other_AE. An ensemble of these 2 models achieved the highest exact match microaveraged
F-1 score at 0.6802 and the second highest lenient match microaveraged F-1 score at 0.8078 among peer models.
View details at DOI 10.1093/jamia/ocab014
Project: Dynamic learning for post-vaccine event prediction using temporal information in VAERS
Funded by: This research was funded by NIH under award Nos. R01AI130460 and R01LM011829
Abstract
Jianping He, Fang Li, Xinyue Hu, Jianfu Li, Yi Nian, Jingqi Wang, Yang Xiang, Qiang Wei, Hua Xu, Cui Tao;
2022 IEEE 10th International Conference on Healthcare Informatics (ICHI) Biomedical relation extraction plays a critical role in the construction of high-quality
knowledge graphs and databases, which can further support many downstream applications. Pre-trained prompt
tuning, as a new paradigm, has shown great potential in many natural language processing (NLP) tasks.
Through inserting a piece of text into the original input, prompt converts NLP tasks into masked language
problems, which could be better addressed by pre-trained language models (PLMs). In this study, we applied
pre-trained prompt tuning to chemical-protein relation extraction using the BioCreative VI CHEMPROT dataset.
The experiment results showed that the pre-trained prompt tuning outperformed the baseline approach in
chemical-protein interaction classification. We conclude that the prompt tuning can improve the efficiency
of the PLMs on chemical-protein relation extraction tasks.
View details at DOI:10.1109/ICHI54592.2022.00120
Abstract