Zoom links

11am lectures, 3pm lectures


                  LecturesReading and Materials
Week 1

Welcome to SML201

Lecture 1 (Rmd source): evaluating R expressions, printing to the console, variables, conditionals, functions.

Lecture 2 (Rmd source): functions review, print vs. return, logical expressions, vectors, vectorized operators

Reading: DataCamp's Intro to R, Ch. 1, 2, 5

Reading: Poldrack Ch. 3

Just for fun: Physician salary data

Week 2

Lecture 1 (Rmd source): indexing with logical vectors, parallel vectors, function composition and pipes, data frames, intro to tidyverse.

Lecture 2 (Rmd source): wrangling data with dplyr/tidyverse: filter, arrange, rename, select, summarize, mutate, and group_by. lecture draft

Reading (primary): R4DS Ch. 5

Reading (secondary): Poldrack Ch. 5.1-5.3.

Reading (exercises): Datacamp Introduction to the Tidyverse Ch. 1 and 3.

Week 3

Lecture 1 (Rmd source): problem solving with dplyr/tidyverse. Using sapply (lecture draft)

Lecture 2 (Rmd source): dplyr odds and ends. Named arguments. Review of sapply. A first look at DataViz with ggplot. (lecture draft.)

Reading R4DS Ch. 5 (continue reading)

Reading: Datacamp Introduction to the Tidyverse Ch. 1 and 3. (continue practicing)

Reading: Healy Ch. 3

Week 4

Lecture 1 (Rmd source): Introduction to DataViz with ggplot (cont'd);

Intro to Predictive Modeling: Linear Regression (Rmd source). (lecture draft.) Predictive modeling slides.

Lecture 2: Predictive modeling slides, cont'd. Logistic regression part 1 (Rmd source). Logistic regression part 2 (Rmd source). Bar charts with ggplot (Rmd source). Splitting datasets and cross-validation (Rmd source). lecture draft.

Video: SML201: Why the categorical version of a variable works better on the training set

Reading (primary): Healy Ch. 3

Reading (exercises): Datacamp Introduction to the Tidyverse, Data visualization chapters.

Week 5

Barcharts 2 (Rmd source). Histograms (Rmd source). Measuring performance of classifiers (Rmd source). Dataset splits (Rmd source). SSE/MSE/RMSE (Rmd source). Barcharts: summary (Rmd)

Interpreting regression coefficients

Reading (primary): Healy Ch. 4

Week 6

Variable selection and cross-validation (Rmd source)

Videos: Probability, odds, betting odds, odds on a bookmaker website, log odds, log odds 2, variable selection with cross-validation, replicate, sampling datasets

Fairness in Machine Learning. Video. Video: Calibration.

Reading: Poldrack Ch. 6

Reading: Angwin et al, Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.

Reading: Corbett-Davies et al, A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear.

Reading (advanced): Julia Dressel and Hany Farid, The accuracy, fairness, and limits of predicting recidivism. Sam Corbett-Davies and Sharad Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning (more technical). Margaret Mitchell et al, Model Cards for Model Reporting.

Week 7

Tidy data (Rmd source). Video

A look at DataViz

Morning lecture: Recording

Afternoon lecture: Recording (better quality)

Software: To knit to pdf, you need to install MiKTeX (Windows) or MacTeX (Mac). Alternatively, you can use Rstudio Cloud.

Reading: Healy Ch. 1

Reading: Healy Ch. 2.1

Reading: Healy Ch. 3.2

Reading: Shalizi, Using R Markdown for class reports

Just for fun: Napoleon's march on Moscow

Just for fun: Cross-national differences in happiness: Cultural measurement bias or effect of culture?

Just for fun: LaTeX and Donald Knuth's email habits

The Challenger disaster. Richard Feynman demonstrates the effect of cold temperature on the o-rings

Music during the break: Brian Wilson by Barenaked Ladies

Week 8

Fairness recap

Probability distributions. Code, Rmd source.

Recordings: Tuesday morning lecture, Tuesday afternoon lecture

Recordings: Thursday morning lecture, Thursday afternoon lecture

Intro to Probability, pt. 2 (Rmd source)

Reading: OpenIntro Stistics (4th ed) Ch. 3. (link to free pdf)

Just for fun: You can load a die but you can't weight a coin

Music during the break: Collect Call and Gimme Sympathy by Metric

Music during the break: Crabbuckit and Man I Used to Be by k-os. Crabbuckit (The Good Lovelies cover)

Week 9

Probability review (Rmd)

P-values (Rmd)

Tuesday lecture recording: morning lecture, afternoon lecture

Thursday lecture recording: morning lecture, afternoon lecture

Reading: Poldrack Ch. 7, Ch. 8.1-8.4, Ch. 9.1-9.3

Music during the break: Free Man in Paris and Both Sides, Now by Joni Mitchell

Week 10

P-values (Rmd) continued

The t-statistic (Rmd)

Hypothesis testing design recipe (Rmd)

Tuesday morning lecture, Tuesday afternoon lecture.

Thursday morning lecture, Thursday aftenroon lecture

Reading: Poldrack 15.1-15.3

Music during the break: Chris Hadfield, Ed Robertson, and the Wexford Gleeks, Is Somebody Singing. The Beatles and Paul McCartney's OPP badge, Sgt. Pepper's Lonely Hearts Club Band intro and With A Little Help From My Friends

Music during the break: The Right Honourable Stephen "Stingo" Harper and Yo-Yo Ma, With a Little Help from My Friends. And another cover with Yo-Yo Ma, with Rosa Passos: Chega de Saudade

Week 11

Hypothesis testing. Code (Rmd).

Hypothesis testing: summary

Confidence intervals

Tuesday lectures: Tuesday morning, Tuesday afternoon

Thursday lectures Thursday morning, Thursday afternoon


Music during the break: Spadina Bus by The Shuffle Demons. Some Chords by deadmau5.

Week 12

Inference with Linear Regression. Code (Rmd)

Comparing group means. Code (Rmd)

A brief intro to artificial neural networks. Andrej Karpathy's GoogleNet labelling interface.

Lecture recordings: Tuesday 11am, Tuesday 3pm. Thursday 11am, Thursday 3pm.

Guest lecture: Ganes Kesari

Reading: Poldrack Ch.14

Reading: Poldrack Ch. 15

Music during the break: Tom Sawyer by Rush.