Statistical Models and Methods for Business Analytics

IDS 575

Informal name: Machine Learning Core

Overview

The goal of this class is to cover the foundations of modern statistics and machine learning methods complementing the data mining focus of IDS 572. In other words, you will get up to speed with the requisite background, as well as the key theoretical underpinnings of modern analytics. We will do so through the lens of statistical machine learning. Lectures will be complemented with hands-on exercises in R.

Previous Editions

Logistics

Textbook and Materials

Software

Schedule (tentative)

08/23 and 08/30: Review of the Basics of Probability, Calculus and Linear Algebra (Different Logistics: from 3.00 to 6.00 PM at DH 210)

08/26 : Supervised Learning: Linear Models and Least Squares, k-Nearest Neighbor Methods

09/09 : Towards Regression: Statistical Decision Theory, Curse of Dimensionality, Linear Regression, Categorical Variables, Interaction Terms

09/16 : Regression I: Bias-variance Trade-off, Subset Selection, Cross-Validation

09/23 : Regression II: Ridge Regression, LASSO (Least Absolute Shrinkage and Selection Operator)

09/30 : Classification: Linear Discriminant Analysis, Logistic Regression, Model Assessment and Selection: AIC, BIC and Validation

10/07 : The Bootstrap, Maximum Likelihood Estimation and Review of Linear Models

10/21 : Business applications of regression, classification and likelihood maximization

10/28 : Expectation Maximization and Sampling (Markov Chain Monte Carlo)

11/04 : Tree Methods, Adaboost and Gradient Boosting

11/11 : Random Forests, Multivariate Adaptive Regression Splines and Support Vector Machines

11/18 : Kernel Trick, Introduction to Unsupervised Learning, Association Rules

11/25 : Unsupervised Learning: Clustering, Principal Component Analysis and Spectral Clustering

12/02 : Time Series and Supervised Learning, and the ARMA Model

Assignments

  1. 08/26: Assignment 0: Signing up on the forum and self study of chapters 3-5 from R for Data Science. Due on 09/08 (no submission needed).
  2. 09/09: Assignment 1 out. Due on 09/22
  3. 09/23: Assignment 2 out. Due on 10/06
  4. 10/07: Assignment 3 out. Due on 10/27
  5. 11/04: Assignment 4 out. Due on 11/17
  6. 11/18: Assignment 5 out. Due on 12/01

These involve reimplementing statistical techniques and understanding their behavior on interesting datasets. Always mention sources in your assignment solutions. Submission deadline is BEFORE 11.59 PM on the concerned day. Late submissions will have an automatic 20% penalty per day. Use Blackboard for uploads.

Exams

  1. 10/14: Exam I (BSB 145, and during class hours)
  2. 12/09: Exam II (Lecture Center F001, and during class hours)

These are closed book, but one 8.5x11-inch handwritten cheatsheet is allowed. No computers and communication devices are allowed.

Grades

Miscellaneous Information