Data Science and Informatics Core for Cancer Research

Cancer as a leading cause of death has been studied at all levels, ranging from molecules, to cells, to individuals, and to population.  While data generated at each level have been extensively analyzed by specific informatics approaches, mining data across different levels can provide deeper insight into disease mechanisms and facilitate translation to clinical practice (Figure 1).  However, challenges include the enormous volume and heterogeneity (variety) of the data.  Furthermore, new technologies applied to cancer research can generate data at extremely high speed. These high volume, high velocity and high variety data (“big data”) pose a significant challenge to conventional data analysis methodologies.   These challenges can be met by data science, a budding field that develops novel approaches to analyze big data, providing integrative solutions to analyze cancer data at all levels.

Figure 1 information

The Data Science and Informatics Core for Cancer Research (DSICCR) at UTHealth is funded by the Cancer Prevention and Research Institute of Texas (CPRIT RP170668) to translate the cutting-edge data science and informatics research to easily accessible, high-quality, and user-friendly software and services to advance cancer research.  The DSICCR will build a “big data” infrastructure for cancer research and provide data science and informatics services to cancer researchers at all levels, including molecular, cell, tissue and organ, individual, and population research (Figure 1). The DSICCR will educate cancer researchers about the latest data science and informatics methods and their application in cancer research. DSICCR will significantly advance cancer research through the application of cutting-edge data science, and thereby help to find cures for cancer and reduce cancer deaths in Texas.

DSICCR is directed and managed by Dr. W. Jim Zheng, Associate Professor at the School of Biomedical Informatics and the director of the Bioinformatics and High Performance Computing Service Center at UTHealth with many years of industry and academic research, development and operation experiences. With a cohort of co-leaders in clinical informatics, bioinformatics, biostatistics and data science, DSICCR provides expertise in genomics, proteomics, imaging analysis, complex system modeling, electronic health record mining, clinical data warehouse, and clinical decision making. It also has expertise to configure, deploy and maintain large-scale hardware and software infrastructure for cutting edge data science and informatics research. Thus, DSICCR will be unique in its ability to employ data science and informatics approach to support multi-scale cancer research and clinical projects that generate heterogeneous datasets - especially projects that focus on understanding cancer by integration of different data modalities.