Textbook

  • Data Science in Production by Ben Weber (2020, $5 for the ebook/pdf). A sample of the first three chapters is available at the publishers page linked here.

Lecture Schedule

Lecture 1: Serving ML Models Using Web Servers

Reference: Chapter 2
  • Learning Goals:
    • Be able to set up a Python environment
    • Be able to set up a jupyter session with SSH tunneling
    • Be able to secure a web server
    • Be able to use Flask to serve a ML model

Lecture 2: Serving ML Models Using Serverless Infrastructure

Reference: Chapter 3
  • Learning Goals:
    • Be able to differentiate hosted vs managed solutions
    • Assess deops effort for web server vs serverless deployments
    • Be able to deploy a ML model using Google Cloud Functions and AWS Lambda Functions

Lecture 3: Serving ML Models Using Docker

Reference: Chapter 4, upto 4.2
  • Learning Goals:
    • Be able to reason the pros and cons of container technologies
    • Be able to differentiate containers from virtual machines
    • Be able to create a new Docker image using Dockerfile
    • Be able to upload the image to a remote registry

Lecture 4: Kubernetes for Orchestrating ML Deployments

Reference: Chapter 4, 4.3 onwards
  • Learning Goals:
    • Understand the uses of Kubernetes
    • Be able to set up a single node Kubernetes cluster using kubectl and minicube
    • Be able to serve a prediction model on a container in the Kuebernetes cluster
    • Be able to deploy a prediction model on Google Kubernetes Engine (GKE)

Lecture 5: ML Model Pipelines

Reference: Chapter 5
  • Learning Goals:
    • Learn how to manage a model building workflow
    • Learn how to set up automated jobs using cron
    • Learn the basics of Apache Airflow
    • Learn a managed workflow tool (Google Cloud Composer)

Lecture 6: PySpark Ecosystem

Reference: Chapter 6
  • Learning Goals:
    • Understand the components on a spark cluster
    • Be able to use PySpark and spark dataframes
    • Be able to use models from MLLib
    • Be able to work with a managed solution such as Databricks

Lecture 7: Streaming Model Deployments

Reference: Chapter 8
  • Learning Goals:
    • Understand the difference between a streaming model deployment workflow vs a batch model deployment workflow
    • Learn the basics of streaming with Apache Kafka
    • Be able to differentiate between a batch Pyspark workflow and a Pyspark streaming workflow

Lecture 8: Online Experimentation

Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/model-ab-testing.html
  • Learning Goals:
    • Know the considerations for A/B testing of models before full rollouts
    • Be acquainted with a few statistical hypothesis tests and how sample sizes are determined
    • Be able to create simple experiments using planout and a flask based deployment setup