Foundations of Data Science - STA551

Course Objectives

This is a data science survey course.  The first part of this course will be dedicated to data science foundations. Topics include statistical models, machine learning algorithms, model performance metrics, and major resampling algorithms. The second part will focus on data science processes. Topics include data science project life cycle, model selection, validation, and performance evaluation, and data science ethics. The last part of the course will discuss data science infrastructure and pipelines.

Course Topics

  • Introduction to data science and tools
  • Short review of statistical methods for data science
  • Introduction to machine learning and data mining algorithms for data science
  • Model KPIs
  • Converting business questions to data science problems
  • Data collection and sampling design
  • Exploratory data analysis
  • Feature extraction and selection
  • Model building
  • Communication and visualization 
  • Model deployment best practice
  • Data Science ethics

Example Syllabus