Course Syllabus
Course Description:
Intro to statistical data science, using computing tools to gather, manage and analyze large and complex data sets. Topics include data wrangling and formatting, web scraping, data analysis, statistical modeling techniques, text mining, and language processing.
Course Outline:
There are five major topics in Introduction to Data Science:
1. Data wrangling and formatting using the tidyverse set of R libraries.
2. Exploratory data analysis, including data visualization using ggplot2.
3. Data acquisition using web scraping and APIs.
4. Statistical modeling and inference.
5. Machine learning.
We’ll work through many of these topics simultaneously. This class is all about building skills and techniques to begin your data science journey.
Location
We will meet in person in Olin 207 on the second floor of Olin Hall.
Office Hours
In-person Office hours: TTh 11:30 am -12 pm (PT), 2:30-3 pm (PT) in Olin 219
Zoom Office hours: MWF 10:10:30 am
Office hours Zoom link: https://whitman.zoom.us/j/2381981909
Note: Please contact me if you would like to schedule extra office hours
Preferred method of contact: email ptukhim@whitman.edu.
If I'm free, I'll respond pretty quickly, but don't wait for me, keep working at whatever prompted you to reach out.
Textbook:
There are several useful textbooks to learn data science tools with tidyverse. I will point out the most useful chapters for each topic, but you are more than welcome to use other resources as well. The list of useful resources:
R for Data Science by Hadley Wickham
Modern Data Science with R by Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton.
Modern Dive: Statistical Inference via Data Science by Ismay and Kim
Tidyverse Skills for Data Scienceby Carrie Wright, Shannon E. Ellis, Stephanie C. Hicks, and Roger D. Peng
Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin
RMarkdown: The Definitive Guide by Yihue Xi
Foundations of Data Science by Blum, Hopcroft, and Kannon
Hands-On Machine Learning with R by Bradley Boehmke & Brandon Greenwell
OpenIntro Statistics, 2nd Ed. by Diez, Barr, & Cetinkaya-Rundel
Lecture Notes are based on Data Science in a Box materials
Software:
We will use the (free) statistical package R and the RStudio interface via RStudio Workbench. Familiarity with R/RStudio is not required. You will be able to access RStudio Workbench on any device with internet access by clicking this link . Use your MathLab account credentials to access Rstudio Workbench. If you forgot your password you can reset it. If you need help with that, email Dustin Palmer at palmerdl@whitman.edu.
Course Goals
By the end of the semester, we hope
- to develop skills in the use of R for data analysis.
- to learn common tasks associated with data, e.g. filtering, merging, sorting, pivoting
- to manipulate numerical, categorical, and time-like data.
- to learn a variety of data visualizations, e.g. scatter plots, bar graphs, histograms, and maps.
- to learn techniques of exploratory data analysis, make hypotheses, and tell an evidence-based story about new data sets.
- to learn to read the documentation and search the web in order to apply new methods and visualizations on your own.
- to write brief reports that tell the story of the data supported by visualizations.
- to understand how data analysis can be used and misused in the areas of public policy, decision-making, and social change.
Time Commitment
Intro to Data Science is a 4 cr class. Generally speaking, you should spend about 3 hours a week in class and 8 hours per week outside of class working on assignments, presentations, and projects. It is a good idea to schedule your time outside of class and stick to that schedule.
Canvas Modules
All of the work you are expected to complete will be organized in Canvas modules. Just follow the modules and you'll be fine. You won't miss anything. What follows is a summary of the kinds of work you'll be doing in the class.
Course Assessment
Your grade will contain the following components.
1. Reading Responses/Class Prep/In-class Discussion (15%): Individual class periods are designed with the assumption that you have read the day's material in advance. To support this activity some days will have a short pre-class "Readback" form for you to fill out. Remember it's timestamped and is due the night before class. No late Readbacks will be accepted.
2. Weekly Labs (35%): Each Thursday we'll start a lab designed to explore new techniques for working with data. Most labs are designed to take about 2-3 hours to complete, so you'll most likely need to finish them outside of class. Labs are due in Canvas the following week.
3. Mini-Projects (15% each): Two Mini-Projects will be assigned during the course.
4. Final Project (including a presentation) (20%): Your Final Project will represent a complete exploration of a large-scale data project, suitable for use in a portfolio of your work. The Final Project is in lieu of a Final Exam. More details to follow.
All assignments must be readable, and when appropriate, all work must be shown to receive credit.
Late work will receive a 5 percentage points deduction per calendar day, e.g. a grade of 85% would be reduced to 80% up to 24 hours later. No work is accepted more than 7 calendar days after the deadline (unless other arrangements have been made before the due date). My main recommendation to avoid the late submission penalty is to pay close attention to deadlines and start working on the assignments early to avoid the stress of trying to complete them at the last minute.
You are encouraged to work together on labs and in-class activities, but all work you submit must be your own (unless the assignment specifically states otherwise). The first act of academic dishonesty will result in a score of zero on the item in question. A subsequent offense will result in an F for the course. Students should consult the Academic Honesty Procedures if they have any questions.
Course Grade
In this, class I regard a “B” as the default grade you get for doing what is expected.
An “A” requires going above & beyond – show intellectual curiosity, striving to understand the “big ideas,” don’t stop at the recipe.
A “C” means you pass – but barely, with serious gaps in your knowledge that you need to address.
Any grade lower than a "C" means that you do not pass the course.
Final letter grades will be determined as follows:
Letter Grade | Weighted Score |
---|---|
A + | 97-100 |
A | 93-96 |
A- | 90-92 |
B+ | 87-89 |
B | 83-86 |
B- | 80-82 |
C+ | 77-79 |
C | 73-76 |
C- | 70-72 |
D+ | 67-69 |
D | 63-66 |
D- | 60-62 |
F | 0-59 |
Important Notes:
- Any student needing accommodations should inform the instructor. Students with disabilities who may need accommodations for this class are encouraged to notify the instructor and contact the Academic Resource Center (ARC) early in the semester so that reasonable accommodations may be implemented as soon as possible. All information will remain confidential.
- Academic dishonesty and plagiarism will result in a failing grade on the assignment. Using someone else's ideas or phrasing and representing those ideas or phrasing as our own, either on purpose or through carelessness, is a serious offense known as plagiarism. "Ideas or phrasing" includes written or spoken material, from whole papers and paragraphs to sentences, and, indeed, phrases but it also includes statistics, lab results, artwork, etc. Please see the student handbook for policies regarding plagiarism.
- In accordance with the College’s Religious Accommodations Policy, I will provide reasonable accommodations for all students who, because of religious observances, may have conflicts with scheduled exams, assignments, or required attendance in class. Please review the course schedule at the beginning of the semester to determine any such potential conflicts and let me know by the end of the second week of class about your need for religious accommodations. You can contact your academic advisor or Adam Kirtley, Whitman’s Interfaith Chaplain, for support in making this request. If you believe that I have failed to abide by this policy, here is a link to the Grievance Policy, Grievance Policy | Whitman College where you can pursue this matter.
Tentative course schedule