BMI 6318 - Big Data in Biomedical Informatics

(web-based and classroom instruction)
3 semester credit hours/meets part of the advanced informatics competencies
Prerequisites: BMI 5007 or Consent of instructor

This course will expose students to the technologies used to solve 'Big Data' problems in biomedicine and healthcare. Through hands-on exercises, we will learn how to distill actionable information from large data leveraging multiple machines. We will cover the data science toolboxes for processing data sets with distributed algorithms, how to apply machine learning models in this context and finally, evaluate and report on the analysis. Students will be required to complete hands-on exercises and working knowledge of Python and SQL is required.

Upon successfully completing this course, students will:

  • Structure extremely large datasets for input and output.
  • Design a data analysis pipeline using 'big data'.
  • Map from business needs to a proposed analytical design using a very large datasets.
  • Evaluate the results and utility of data analysis and make an effective argument.

These objectives will be pursued by hands-on examples using Python-based data analysis libraries such as Pandas and pySpark. We will be using modern container technologies (Docker) and databases built to store “Big Data.”