Course description This seminar course will support studens as they work on a data science project with a dataset that they selected. The course introduces several core techniques in data science, in lectures and in mini-projects. Students will select a dataset of interest to them and produce an analysis or a data product, and a project report. Students will combine domain knowledge and technical expertise to produce their analyses and/or data products.
Lateness penalty: 5% of the possible marks per day, rounded up. Assignments are only accepted up to 72 hours (3 days) after the deadline.
We will be using the Python NumPy/SciPy stack in this course. Python 2 and Python 3 are both acceptable.
The most convenient Python distribution to use is Anaconda. If you are using an IDE and download Anaconda, be sure to have your IDE use the Anaconda Python.
I recommend the Pyzo IDE available here. Jupyter Notebooks are favored by some people, though I recommend developing using an IDE.
If your project requires a substantial amount of compute power, I recommend signing up for AWS Educate to obtain $100 in free credits for AWS. Instructions for running RStudio Server on AWS Educate are here. GCP and Microsoft Azure also offer free credits for students.
Design credit: CS229, Jan 2019.