SML201: Introduction to Data Science

Spring 2020

Course team


Course description   Introduction to Data Science provides a practical introduction to the burgeoning field of data science. The course introduces students to the essential tools for conducting data-driven research, including the fundamentals of programming techniques and the essentials of statistics. Students will work with real-world datasets from various domains; write computer code to manipulate, explore, and analyze data; use basic techniques from statistics and machine learning to analyze data; learn to draw conclusions using sound statistical reasoning; and produce scientific reports. No prior knowledge of programming or statistics is required.

Course assignments

Projects (topics are tentative)
Project 1: Auditing the COMPAS score (11%). Due: March 30 9p.m.
Project 2: Cancer and Gene Expressign Data (11%). Due: April 13 9p.m.
Project 3: Risk Prediction for ICU Patients (12%). Due: May 11 9p.m.
Problem sets
Problem set 1 (3%): vectors and data frames (3%). Due: Feb. 21 Feb. 24 9 p.m.
Problem set 2 (3%). Due: March 2 9p.m.

Every student has a total of 6 grace days they can use throughout the term (except for Project 3) to avoid a lateness penalty of 10% per 24 hours, rounded up to the nearest whole number of days

In-class tests
Term test 1: Thursday March 12
Term test 2: Tuesday April 28

Precept Assignments
Week of Feb 3: Precept 1: Intro, functions (Rmd source). Solutions (Rmd source)
Week of Feb 10: Precept 2: Vectors and Data Frames (Rmd source). Solutions.

Logistics

Class meetings
The morning section meets at McComick Hall 101 on Tues 11:00am-12:20pm and Thurs 11:00am-12:20pm. The afternoon section meets at Robertson Hall 001 on Tues 3:00pm-4:20pm and Thurs 3:00pm-4:20pm.
Instructor office hours
Regular office hours are TBA. Email for an appointment
Preceptor office hours
Preceptors will be available during scheduled office hours or by appointment.

Course information

Evaluation
35%: Projects
6%: Problem Sets
32%: Tests
5%: iClicker quizzes
22%: In-precept assignments

Resources

Software

Please install R and RStudio as soon as possible.

Reading

Textbooks
Statistical Thinking for the 21st Century by Russell A. Poldrack (Free e-boom at the book website)
Data Visualization: A practical introduction by Kieran Healy (Free e-book draft available from the book website)
R for Data Science by Garrett Grolemund and Hadley Wickham. (Free e-book available from the book website)
SML201 students will have access to online DataCamp courses for free, courtesy of DataCamp. See the course Piazza for details on how to sign up.

An inclusive environment

We strive to build and maintain an inclusive environment in class — an environment that allows every student to reach their full potential. Please do not hesitate to contact me and/or your preceptor to let us know if you need special accommodation or with any concerns.

Design credit: CS229, Jan 2019.