(web-based and classroom instruction)
3 semester credit hours/meets part of the advanced informatics competencies
Prerequisites: BMI 5007 or Consent of instructor
This course will expose students to the technologies used to solve 'Big Data' problems in biomedicine and healthcare. Through hands-on exercises, we will learn how to distill actionable information from large data leveraging multiple machines. We will cover the data science toolboxes for processing data sets with distributed algorithms, how to apply machine learning models in this context and finally, evaluate and report on the analysis. Students will be required to complete hands-on exercises and working knowledge of Python and SQL is required.
Upon successfully completing this course, students will:
These objectives will be pursued by hands-on examples using Python-based data analysis libraries such as Pandas and pySpark. We will be using modern container technologies (Docker) and databases built to store “Big Data.”