# Statistical Models and Methods for Business Analytics

IDS 575

### Informal name: Machine Learning Core

## Overview

The goal of this class is to cover the foundations of modern statistics and machine learning methods complementing the data mining focus of IDS 572. In other words, you will get up to speed with the requisite background, as well as the key theoretical underpinnings of modern analytics. We will do so through the lens of statistical machine learning. Lectures will be complemented with hands-on exercises in R.

## Previous Editions

- Spring 2019 (has videos!)
- Spring 2018 (has videos!)
- Fall 2017 (has videos!)

## Logistics

- Semester: Fall 2019
- Lectures: Mondays 6.00 PM to 8.30 PM at DH 220
- Optional Recitations: Fridays 5.00 PM to 6.00 PM at Lecture Centre C6
- Staff
- Instructor: Dr. Theja Tulabandhula
- Teaching Assistant: Parshan Pakiman

- Online communication: Forum (sign up needed!)
- Offline communication:
- Instructor Office Hours: Thursdays 1.00 to 2.00 PM at UH 2404
- TA Office Hours: Thursdays 10.30 AM to 12.30 PM at L270 (ETMSW Bldg.)

## Textbook and Materials

- Textbook I: Elements of Statistical Learning II.
- Textbook II: An Introduction to Statistical Learning with Applications in R.
- Refresher on Probability
- Refresher on Linear Algebra

## Software

- The R programming language and the RStudio IDE.
- Learning R: R for Data Science

## Schedule (*tentative*)

#### 08/23 and 08/30: Review of the Basics of Probability, Calculus and Linear Algebra (Different Logistics: from 3.00 to 6.00 PM at DH 210)

- Calculus refresher pdf 1
- Calculus refresher pdf 2
- Probability refresher 1 source
- Probability refresher 2
- Linear Algebra refresher 1
- Linear Algebra refresher 2

#### 08/26 : Supervised Learning: Linear Models and Least Squares, k-Nearest Neighbor Methods

- Lecture note
- Machine Learning Mindmap
- Linear regression by gradient descent
- Handwritten note
- Lecture video 1
- Lecture video 2

#### 09/09 : Towards Regression: Statistical Decision Theory, Curse of Dimensionality, Linear Regression, Categorical Variables, Interaction Terms

#### 09/16 : Regression I: Bias-variance Trade-off, Subset Selection, Cross-Validation

#### 09/23 : Regression II: Ridge Regression, LASSO (Least Absolute Shrinkage and Selection Operator)

#### 09/30 : Classification: Linear Discriminant Analysis, Logistic Regression, Model Assessment and Selection: AIC, BIC and Validation

#### 10/07 : The Bootstrap, Maximum Likelihood Estimation and Review of Linear Models

#### 10/21 : Business applications of regression, classification and likelihood maximization

#### 10/28 : Expectation Maximization and Sampling (Markov Chain Monte Carlo)

#### 11/04 : Tree Methods, Adaboost and Gradient Boosting

- Lecture note
- Handwritten note
- XGBoost, LightGBM and Catboost implementations.

#### 11/11 : Random Forests, Multivariate Adaptive Regression Splines and Support Vector Machines

#### 11/18 : Kernel Trick, Introduction to Unsupervised Learning, Association Rules

#### 11/25 : Unsupervised Learning: Clustering, Principal Component Analysis and Spectral Clustering

#### 12/02 : Time Series and Supervised Learning, and the ARMA Model

## Assignments

- 08/26: Assignment 0: Signing up on the forum and self study of chapters 3-5 from R for Data Science. Due on 09/08 (no submission needed).
- 09/09: Assignment 1 out. Due on 09/22
- 09/23: Assignment 2 out. Due on 10/06
- 10/07: Assignment 3 out. Due on 10/27
- 11/04: Assignment 4 out. Due on 11/17
- 11/18: Assignment 5 out. Due on 12/01

These involve reimplementing statistical techniques and understanding their behavior on interesting datasets. Always mention sources in your assignment solutions. Submission deadline is BEFORE 11.59 PM on the concerned day. Late submissions will have an automatic 20% penalty per day. Use Blackboard for uploads.

## Exams

- 10/14: Exam I (BSB 145, and during class hours)
- 12/09: Exam II (Lecture Center F001, and during class hours)

These are closed book, but one 8.5x11-inch handwritten cheatsheet is allowed. No computers and communication devices are allowed.

## Grades

- Assignments: 8% + 8% + 8% + 8% + 8%
- Exams: 22% (Exam I) + 30% (Exam II)
- Participation/Quizzes: 8% (online and offline)

## Miscellaneous Information

- This is a 4 credit graduate level course offered by the Information and Decision Sciences department at UIC.
- Please see the academic calendar for the semester timeline.
- Students who wish to observe their religious holidays (http://oae.uic.edu/religious-calendar/) should notify the instructor within one week of the first lecture date.
- Please contact the instructor at the earliest, if you require accommodations for access to and/or participation in this course.
- Please refer to the academic integrity guidelines set by the university.