Skip to Content
SBMI logo

BMI 5007 Methods in Health Data Science

3 semester credit hours
Lecture contact hours: 2; Lab contact hours: 3
Web-based and classroom instruction
Lab Fee: $30

Course Description:
The course introduces methods in health data science – defining the problem, accessing, and loading the data, formatting into data structures required for analysis. This course covers the basics of computational thinking to define a computational solution, methods to access healthcare data from variety of sources (EHR data, UMLS, Medline, etc.), and in different data formats. The students will apply methods for data wrangling and data quality assessments to structure the data for analysis. The students will be introduced to basics of design and evaluation of algorithms and application of data structures for healthcare data. The course will use Python programming language and basic python libraries for data sciences such as numpy, scipy, matplotlib and pandas.

Students should expect a good amount of programming exercises for each week. This course is not an introduction to programming, and not a course to improve programming skills. Students are expected to have some experience with introductory / beginner level Python programming.

Upon successful completion of the course, students will:

  • Abstract a business need for data analysis and define appropriate computational problem
  • Design and analysis (time complexity) of simple algorithms
  • List basic data structures and their characteristics, applications in biomedicine
  • Retrieve biomedical data from multiple sources formats – specifically flat files (text), tabular data (CSV), structured data (JSON, XML)
  • Implement Python programs to load data and apply basic data wrangling to structure output.

Pre-Requisite effective Spring 2020

Students must exhibit competence in basic python programming. Students should be able to write python scripts (.py file) and execute the file from a command line.

"Basic" python programming is defined as ability to work with

  1. Variables - define, access
  2. Data Types and conversion – integer, str, float, bool
  3. Use of appropriate operators – assignment, comparison, logical, arithmetic, identity and containment operators.
  4. Control flows and loops (if..else, while, for, break, continue)
  5. Lists, Dictionaries - creation, access, add or remove items
  6. File – input and output operations – open, close, read, write.
  7. Errors – try, except, troubleshoot errors.

Required Reading: Complete Chapters 1 to 13 “How to Think Like a Computer Scientist: Interactive Edition” available at https://runestone.academy/runestone/books/published/thinkcspy/index.html

Pre-requisite Quiz:

  1. 5 python scripting questions – live coding from your computer
  2. Python scripts cannot use any imported modules
  3. Total time of 90 minutes
  4. Closed book exam – no online or in-person materials
  5. Proctored exam – video, audio and screen recorded
  6. Total of 5 points.
  7. Must score at least 4 points.

Sample questions:

  1. Write a python script to calculate and print the mean and mode for the list of numbers [5, 6, 8, 4, 3, 7, 9, 3, 2, 6, 2, 1, 4, 3, 5, 8]
  2. Write a python script to generate a list of numbers between 500 and 550, that are multiples of 7.
  3. Write a python script to count the number of items have a “A” value, from the following dictionary, {“S1” : “A”, “S2” : “B”, “S3” : “A”, “S4” : C, “S5” : “A”, “S6” : “B” }

Important Instructions:

  1. Contact the instructor for access to the quiz.
  2. The grading of quiz will take 2 working days. Plan ahead to take the quiz and obtain the approval before the regular registration deadline.
  3. Late registration is strongly discouraged for BMI 5007.
  4. A score of 4 and above is required to be approved for the course.