Soc 383 Syllabus

*Statistics for an Open Science: Categories? Classification? Prediction?

Spring 2025

Course leaders:

Jeremy Freese
jfreese at stanford
David Broska
dbroska at stanford
Olivia (Olive) Jin
oliviajin at stanford

Class: Mondays & Wednesdays 9:45-11:20am (Wallenberg 319)

Lab: Time Unknown (Location Unknown)

Office hours: Freese, Friday, 1-2:30 (but contact in advance) or by appt (via Zoom); Broska: TBD; Jin: TBD

This syllabus is provisional and anything therein is subject to change.

Overview

This course continues to develop your skills to draw accurate, substantively meaningful inferences from quantitative social data, as well as your ability to evaluate quantitative evidence presented by others.

We will proceed by extended consideration of how social scientists analyze data involving categories. These models and techniques are frequently used in quantitative social science and so understanding them is valuable in its own right. More importantly, close and practical consideration of modeling strategies for these outcomes advances one’s understanding of fundamental principles of statistical inference, data analysis, and modeling in ways that go beyond any specific set of techniques.

There is no pretense that this course, or the overall sequence, suffices to impart the requisite skills for a social science research career that is based primarily on analysis of quantitative data. If any of you think that’s how this biz works: sorry, you are mistaken. Instead, that will likely entail additional training while in graduate school, more first-hand experience with the craft of data analysis, and a commitment to staying fresh with training over your career. Methodological competence, much less expertise, will remain an ongoing project for as long as you are engaged in research.

Goals

We will work to advance your training on the following fronts over this course:

Your understanding of the logic, flexibility, and elegant beauty of maximum likelihood estimation.
Your familiarity with a variety of models that are used frequently in contemporary social science.
Your capacity to teach yourself new models or models not covered in any courses you have taken.
Your abilities to interpret and present results from data analysis.
Your appreciation of the work–and, not incidentally, fun–of applied data analysis, and your facility with its basic craft.
Your comfort with mastering details of a particular dataset.
Your ability to use statistical software–specifically, R–to analyze quantitative data to answer social research questions.

Prerequisites

This course follows Sociology 381 and 382. Accordingly, I presume familiarity with basic linear algebra, the material covered in a standard introduction to social science statistics for undergraduates, and the fundamentals of linear regression.

Materials

Readings

With the 2021 edition of the course I embarked upon a project, which was to put together a full set of online lecture notes for this class (hereafter The Notes). I made strong progress on The Notes then, made even more in preparation for the next editions, and will make still more by the end of this class. The Notes for this year’s class will be complicated by the switch in our sequence from Stata to R.

Accordingly, the primary materials for the quarter are The Notes. The Notes are intended to encompass the basic didactic material for the course.

The vision is that: - There is not some separate book you are supposed to be following along with apart from The Notes. - There is not some separate set of slide decks that represent the lecture materials apart from The Notes. - There is not some body of data analysis knowledge that I am looking for you to gain from the course that is wholly separate from The Notes, which is not to say that there isn’t skill and craft to all this that involves doing data analysis for oneself. - Years from now, these notes will still be there (or, if they move URLs, Google will help you find them).

The course has had a reading list that includs some pointers to applications of different methods in practice. As of now, I haven’t refreshed these recently, as that has been lower on my to-do list than pushing forward the above ambitious vision for The Notes.

The homepage for the notes is here

Software

Computing in the course will be conducted using R.

Tasks

Problem sets

There is not a final paper for this course. The course references an established body of knowledge about data analysis that I take it as this course’s mission to begin to impart. Having you concurrently embark upon and complete a paper that also happens to involve data analysis to me seems like obvious mission creep.

Instead, the work of the course takes the form of problem sets. (I used to call these “exercises,” but ever since coming to Stanford, a puzzling enough number of students would call them “problem sets” that I decided to lean into it rather than fight it.)

The problem sets have two components.

Concept comprehension. These are questions about materials covered in class.
Practice. These ask you to do tasks that follow from course materials on data, often on data of your own choosing.

Each problem set is topically related and associated with some subset from The Notes.

However, given the uncertainty about the pace of the course, we do not know at the outset what exactly will be due when. But we will settle on the Wednesday of the week before, what is to be completed by the next due date.

In the interests of promoting best practices for reproducible research, everything you turn in that involves doing data analysis must be generated from code (i.e, an .R file) that you will submit along with the exercise. In this class, I would like this to be done using Quarto within R.

The do-file must be sufficient to reproduce the submitted numbers from a data file that you could, if asked, provide. These do files must be appropriately annotated (using Stata’s commenting features) to indicate what part of the output corresponds to what. We will provide details.

You are encouraged to discuss your work with your fellow students and to learn from them, but you should complete work on your own.

For those components of assignments that involve estimation and interpretation of data, you should not use the same (or virtually the same) variables as another student.

Due date for problem sets

Starting at an agreed-upon date and time, the due date for problem sets will be the same each week, extending into the week after the last class. The explicit deadline for problem sets is Friday at 11:59pm, but students should understand that course staff will not be available to answer questions after 5pm.

Late work: If you are going to be late with your problem sets, let David or Olive know in advance of the deadline. We are unlikely to penalize problem sets that come in within 3 days of the deadline. Otherwise late work may be subject to penalty, and will be penalized in cases in which the student is not in communication with us about the delay. Turnaround for grading of late work may also be much slower, as it will have less immediate priority than current course matters.

Problem set guidelines

Upload a .pdf document created with Quarto to Canvas. It should show the code that you used to arrive at your answer.
If you do not use data from this repository on the course website, please upload the data, too.
To load a Stata data set hosted online, you can adapt the following code:

df <- haven::read_dta("http://www.boydetective.net/cda/dta/californiahomes.dta")

Note that this requires the package haven to be installed on your computer, e.g. by typing install.packages("haven") into the console (only once). You must also invoke library(haven) before using the read_dta function.

Grading

Your final grade will be primarily based on the problem sets. Class participation may serve as a tiebreaker for those near the line between two possible grades. Notably weak participation may result in an outright deduction from one’s final grade. Rudeness or other detractions from a class environment congenial for learning and teaching may also be penalized (although, this being graduate school, my hope is that we have no problems with that).

Academic Accommodations

The class follows legal and Stanford policies regarding academic accommodations and hence endorses the recommended syllabus text included here.

Honor Code and Fundamental Standard

The class followed Stanford policy regarding the Honor Code and Fundamental Standard and hence endorses the suggested syllabus text included here

Generative AI

Unless otherwise specified, the class follows the Stanford official guidance regarding academic integrity and generative AI that is presented here. This means that you are prohibited from using AI to draft text for you in the same way you would be prohibited from having a friend draft text for you. You can use AI to help enhance your text if you want, as long as your are judicious and mindful about it and are not letting the AI write for you. On the other hand, you are allowed and encouraged to use AI to help create and fix problems with code, with the important cavaeat that you are ultimately responsible for the code properly doing what the problem sets etc. has asked.