Skip Navigation

Statistical & Data Sciences

Statistical and Data Sciences at Smith College
The Statistical & Data Sciences (SDS) Program links faculty and students from across the college interested in learning things from data. At Smith, students learn statistics by doing—class time emphasizes problem-solving and hands-on contact with data. Many courses employ student-driven projects that allow students to pursue their interest in fields such as economics, psychology, political science, sociology, engineering, biology, environmental science, neuroscience and geology.

Announcements

Upcoming Talks & Lectures

The Statistical & Data Sciences Program hosts regular talks & lectures that are free and open to the public. No prior exposure to statistics is presumed. Stay tuned to our events page for exciting presentations coming up!

Intermittent Events

Please see the Western Mass Statistics and Data Science Meetup for additional events.

In-Person Attendance

In keeping with Smith’s core identity and mission as an in-person, residential college, SDS affirms College policy (as per the Provost and Dean of the College) that students will attend class in person. SDS courses will not provide options for remote attendance. Students who have been determined to require a remote attendance accommodation by the Office of Disability Services will be the only exceptions to this policy. As with any other kind of ADA accommodations, please notify your instructor during the first week of classes to discuss how we can meet your accommodations. 

Requirements

  • Identify and work with a wide variety of data types (including, but not limited to, categorical, numerical, text, spatial and temporal) and formats (e.g. CSV, XML, JSON, relational databases, audio, video, etc.). 
  • Extract meaningful information from data sets that have a variety of sizes and formats.
  • Fit and interpret statistical models, including but not limited to linear regression models. Use models to make predictions, and evaluate the efficacy of those models and the accuracy of those predictions.
  • Understand the strengths and limits of different research methods for the collection, analysis and interpretation of data. Be able to design studies for various purposes.
  • Attend to and explain the role of uncertainty in inferential statistical procedures.
  • Read and understand data analyses used in research reports. Contribute to the data analysis portion of a research project in at least one applied discipline.
  • Compute with data in at least one high-level programming language, as evidenced by the ability to analyze a complex data set.
  • Work in multiple languages and computational environments.
  • Convey quantitative information in written, oral and graphical forms of communication to both technical and nontechnical audiences.
  • Assess the ethical implications to society of data-based research, analyses, and technology in an informed manner. Use resources, such as professional guidelines, institutional review boards, and published research, to inform ethical responsibilities.

The program is designed to produce highly skilled, versatile statisticians and data scientists who possess powerful abilities for analyzing data. As such, SDS students learn not only how to build statistical models that generate predictions, but how to validate these models and interpret their parameters. Students learn to use their ingenuity to “wrangle” with complex data streams and construct informative data visualizations.

The major in statistical & data sciences consists of 10 courses, including depth in both statistics and computer science, an integrating course in data science, a course that emphasizes communication and an application domain of expertise. With the exception of SDS 192, 201/220, 291, and 410 (or any mandatory S/U course), students may switch any one of their remaining major courses from graded to S/U.

Advisers
Benjamin Baumer, Shiya Cao, Kaitlyn Cook, Randi Garcia, Albert Y. Kim, Katherine Kinnaird, Scott LaCombe, Lindsay Poirier. If you wish to declare an SDS major and need an advisor, please fill out this form at https://bit.ly/sds_advisor.

Study Abroad Adviser
Scott LaCombe

Requirements

See the major diagram below for prerequisites, and see the Note on course substitutions following the description of the major.

SDS Major Diagram

 

  1. Foundations and Core (5 courses): The following required courses build foundational skills in mathematics, statistics and computer science that are necessary for learning from modern data.
    • SDS 201 or SDS 220: Introductory Statistics
    • SDS 291: Multiple Regression
    • CSC 110: Introduction to Computer Science or CSC 120: Object-Oriented Programming
    • SDS 192: Intro to Data Science
    • MTH 211: Linear Algebra
  2. Statistical Depth (1 course): One additional course that provides exposure to additional statistical models.
    • SDS 290: Research Design and Analysis
    • SDS 293: Modeling for Machine Learning
    • MTH/SDS 320: Mathematical Statistics
    • SDS 390: Topics in SDS. Offerings may vary; previous versions of this course include:
      • Bayesian Statistics
      • Ecological Forecasting
      • Structural Equation Modeling
      • Statistical Analysis of Social Networks
  3. Programming Depth (1 course): One additional course that deepens exposure to programming.
    • CSC 151: Programming Languages
    • CSC 210: Data Structures
    • CSC 220: Advanced Programming Techniques
    • CSC/SDS 235: Visual Analytics (must take programming intensive track)
    • CSC 240: Computer Graphics
    • SDS 270: Programming for Data Science in R
    • SDS 271: Programming for Data Science in Python
    • CSC 294: Computational Machine Learning
    • CSC/SDS 352: Parallel & Distributed Computing
  4. Communication (1 course): One course that focuses on the ability to communicate in written, graphical and/or oral forms in the context of data.
    • CSC/SDS 109: Communicating with Data
    • FYS 105: Ethics of Big Data
    • FYS 189: Data and Social Justice
    • CSC/SDS 235: Visual Analytics
    • SDS 236: Data Journalism
    • SDS 237: Data Ethnography
  1. Application Domain (1 course): Every student is required to take a course that allows them to conduct a substantial data analysis project evaluated by an expert in a specific domain of application.

    Please consult our continuously-updated, nonexhaustive list of previously approved application domain courses, which includes:
    • SDS 300: Applications of Statistical & Data Sciences
    • Dual-prefixed research seminars offered by SDS:
      • GOV/SDS 338: Research Seminar in Political Networks
      • CSC/SDS 354: Seminar: Music Information Retrieval
      • PSY/SDS 364: Research Seminar on Intergroup Relationships
    • Research seminars (normally 300-level) or special studies of at least two credits. Normally, the domain would be outside of mathematics, statistics and computer science.
    • Departmental honors theses in another major (normally not MTH or CSC)

A student and their adviser should identify potential application domains of interest as early as possible, since many suitable courses will have prerequisites. Normally, this should happen during the fourth semester or at the time of major declaration, whichever comes first. The determination of whether a course satisfies the requirement will be made by the student’s major adviser.

  1. Capstone (1 course): Every student is required to complete a capstone experience, which exposes them to real-world data analysis challenges.
    • SDS 410: Capstone
  2. Electives: (as needed to complete to 10 courses): Provided that the requirements listed above are met, any of the courses listed above may be counted as electives to reach the 10 course requirement. Five College courses in statistics and computer science may be taken as electives. Additionally, the following courses may be counted toward completion of the major:
    • MTH 246: Probability
    • CSC 230: Introduction to Database Systems
    • CSC 252: Algorithms
    • CSC 256: Intelligent User Interfaces
    • CSC 290: Artificial Intelligence
    • CSC 330: Database Systems
    • CSC 390: Seminar on Artificial Intelligence

Notes on course substitutions:

  • CSC 110 or 111 may be replaced by a 4 or 5 on the AP computer science exam.
  • SDS 201 may be replaced by a 4 or 5 on the AP statistics exam.
  • Replacement by AP courses does not diminish the total number of courses required for either the major or the minor (see Electives above). Any one of ECO 220, GOV 203, PSY 201, or SOC 204 may directly substitute for SDS 220 or SDS 201 without the need to take another course, in both the major and minor. Note that SDS 220 and ECO 220 require Calculus.
  • MTH 211 may be replaced by petition in exceptional circumstances.
  • Five-College equivalents may substitute with permission of the program.
  • SDS 107 and EDC 206 are important courses but do not count for the major or the minor.
  • An Honors Thesis (SDS 430D) generally cannot substitute for the capstone SDS 410.

The Major in Mathematical Statistics

Students interested in doctoral programs in Statistics should consider the Major in Mathematical Statistics jointly operated by SDS and MTH.

The Minor in Statistical & Data Sciences

The minor in statistical & data sciences consists of six courses, according to the following requirements:

  • Four courses: all core courses required for the major, but not MTH 211
  • Any course satisfying the programming depth requirement for the major
  • Any course satisfying the communication requirement for the major

Should these three requirements be fulfilled with fewer than six courses, any of the courses in SDS or CSC that count toward the major may be counted towards the minor. Ordinarily, no more than one course graded S/U will be counted toward the minor.

The Minor in Applied Statistics

The interdepartmental minor in applied statistics offers students a chance to study statistics in the context of a field of application of interest to the student. The minor is designed with enough flexibility to allow a student to choose among many possible fields of application.

The minor consists of five courses. Among the courses used to satisfy the student’s major requirement, a maximum of two courses can count toward the minor. Ordinarily, no more than one course graded S/U will be counted toward the minor.

Students who have taken AP statistics in high school and received a 4 or 5 on the AP Statistics Examination will not be required to repeat the introductory statistics course, but they will be expected to complete five courses to satisfy the requirements for the minor in applied statistics.

The student must take one of the following courses and no more than one of these courses will count toward the minor. (Students presenting a 4 or 5 on the AP Statistics Examination will receive exemption from this requirement.)

  • PSY 201: Statistical Methods for Undergraduates
  • SDS 201: Statistical Methods for Undergraduates
  • SDS 220: Introduction to Probability and Statistics
  • ECO 220: Introduction to Statistics and Econometrics
  • GOV 203: Empirical Methods in Political Science
  • SOC 201: Evaluating Information
  • SOC 202: Quantitative Research Methods

The student must also take both of the following courses:

  • SDS 290 Research Design and Analysis
  • SDS 291 Multiple Regression

The student must choose two (or more) application courses. Courses not on the following list must be approved by the student’s SDS adviser if they are to count toward the minor.

  • BIO 232: Evolution
  • BIO 230: Genomes and Genetic Analysis
  • BIO 231: Genomes and Genetic Analysis Lab
  • BIO 266: Principles of Ecology
  • BIO 267: Principles of Ecology Laboratory
  • BIO 334: Bioinformatics & Comparative Molecular Bio
  • ECO 240: Econometrics
  • ECO 311: Seminar: Topics in Economic Development
  • ECO 351: Seminar: The Economics of Education
  • ECO 362: Seminar: Population Economics
  • ECO 363: Seminar: Inequality
  • ECO 396: Seminar: International Financial Markets
  • EGR 389: Techniques for Modeling Engineering Processes
  • GOV 312: Seminar in American Government
  • MTH 246: Probability
  • PSY 319: Research Seminar in Biological Rhythms
  • PSY 325: Research Seminar in Health Psychology
  • PSY 335: Research Seminar in the Study of Youth and Emerging Adults
  • PSY 358: Research Seminar in Clinical Psychology
  • PSY 369: Research Seminar on Categorization and Intergroup Behavior
  • PSY 373: Research Seminar in Personality Psychology
  • PSY 375: Research Seminar on Political Psychology
  • SOC 202: Methods of Social Research

Students planning to minor in applied statistics should consult with their advisers when selecting applications courses. Some honors theses and special studies courses may apply if these courses focus on statistical applications in a field.

Also see the concentration in statistics within the mathematics major offered by the Department of Mathematics and Statistics.

It is possible for a Smith student to obtain a master of science in statistics from the University of Massachusetts Amherst in five years (four years at Smith plus one at UMass), through the Fifth Year MS in Statistics Program. Interested students should consult with the director of the program.

Students interested in pursuing graduate work in statistics or data science should consult with their major adviser to plan an appropriate course of study. In either case, a solid foundation in mathematics (calculus I, II, and III, as well as linear algebra) is essential.

Graduate Programs in Statistics

The ASA maintains several lists of graduate programs in statistics that may help you find options that suit your needs.

Graduate Programs in Data Science

As a newer discipline, programs in data science are still in their infancy. The ASA maintains a list of graduate programs in “Big Data”, although this should not be conflated with data science. A more comprehensive list of data science degree programs is maintained by datascience.community.


Courses

Please see the SDS section of the online course catalog for the most recent information.

Choosing a First Statistics Course

A student who wishes to study statistics may place themself according to the following guidelines.

A student with prior work in calculus or discrete math at college should start with Introduction to Probability & Statistics (SDS 220 or SDS 201, 5 credits). This is the recommended statistics course for biological sciences majors, and satisfies the basis requirement for engineering, environmental science, neuroscience and psychology. This is also the recommended course for a student who took AP statistics but didn't take the exam, or received a grade of 3 or below. ECO 220 is also a course at this general level.

A student with four years of high school math (but little or no calculus) should select SDS 201 or PSY 201 (Statistical Methods for Undergraduates). SDS 201 also satisfies the basis requirement for psychology. Other introductory courses at this level include GOV 203 and SOC 201.

A student with less preparation should select SDS 107 (Statistical Thinking) or SDS 109 (Communicating with Data).

A student who received a score of 4 or 5 on the AP Statistics Exam should take SDS 290 (Research Design and Analysis) or SDS 291 (Regression Analysis).

Taking Statistics Away (EGR Majors)

Please see the guidelines for Picker Engineering majors.

Courses Offered Through the Program

  • SDS 107: Statistical Thinking
  • CSC/SDS 109: Communicating with Data
  • FYS 189: Data and Social Justice
  • SDS 192: Introduction to Data Science
  • SDS 201: Statistical Methods for Undergraduates
  • SDS 220: Introduction to Probability and Statistics
  • CSC/SDS 235: Visual Analytics
  • SDS 236: Data Journalism
  • SDS 237: Data Ethnography
  • SDS 270: Programming for Data Science in R
  • SDS 271: Programming for Data Science in Python
  • SDS 290: Research Design and Analysis
  • SDS 291: Multiple Regression
  • SDS 293: Modeling for Machine Learning
  • SDS 300: Applications of Statistical & Data Sciences
  • MTH/SDS 320: Mathematical Statistics
  • GOV/SDS 338: Research Seminar in Political Networks
  • CSC/SDS 352: Parallel and Distributed Computing
  • CSC/SDS 354: Seminar: Music Information Retrieval
  • PSY/SDS 364: Research Seminar on Intergroup Relationships
  • SDS 390: Topics in Statistical & Data Sciences
  • SDS 400: Special Studies
  • SDS 430D: Honors Thesis
  • SDS 410: Capstone

Cross-Listed Courses

  • CSC 111: Introduction to Computer Science Through Programming
  • CSC/MTH 205: Modeling in the Sciences
  • MTH 211: Linear Algebra
  • MTH 246: Probability
  • AST 200: Astronomical Data Science
  • CSC 212: Programming With Data Structures
  • CSC 230: Introduction to Database Systems
  • CSC 252: Algorithms
  • CSC 256: Intelligent User Interfaces
  • CSC 290: Artificial Intelligence
  • CSC 294: Computational Machine Learning
  • CSC 330: Database Systems
  • CSC 390: Seminar on Artificial Intelligence

This page is intended to help EGR majors and their advisers identify appropriate courses at other universities that will satisfy the statistics requirement for the EGR major. It supplements the memo sent to EGR faculty on February 20, 2017. As noted in that memo, “equivalence of courses taken elsewhere [are] determined by...[a] qualified member of the SDS program.” Herein, we delineate the criteria used to determine equivalence in order to promote transparency and ensure a uniform experience for all.

The following criteria are used to verify that a course taken to satisfy the statistics requirement for the EGR major (hereafter “COURSE”) is satisfactory:

  • Rigor: COURSE must be at or above the level of rigor of SDS 220. This is the primary criteria.
  • Statistical reasoning: COURSE must include statistical topics like hypothesis testing, confidence intervals, and regression—not just probability topics like random variables, distributions and expected value.

Exception: Students who have earned a 4 or 5 on the AP statistics exam can waive these requirements. They can fulfill their statistics requirement by taking any non-introductory course in probability or statistics (e.g., MTH 246, SDS 290, SDS 291, SDS 293, etc.).

SDS faculty will use the following set of questions to guide their thinking on whether a course meets the above criteria. Normally, a replacement course would satisfy all or nearly all of these questions.

  • Does COURSE cover most or all of the topics listed in the description for SDS 220?
    • An application-oriented introduction to modern statistical inference: study design, descriptive statistics; random variables; probability and sampling distributions; point and interval estimates; hypothesis tests, resampling procedures and multiple regression.
  • Does COURSE include linear regression as a topic in the syllabus?
  • Does COURSE use a comprehensive textbook?
  • Does COURSE include any prerequisites (e.g., calculus) that indicate mathematical maturity?
  • Is COURSE for statistical practice (like SDS 220) and not just for statistical concepts (like SDS 107)?
  • Does COURSE explicitly mention the use of a statistical computing environment like R, SPSS, Stata, JMP or SAS (that is, something beyond Excel or TI calculators)?
  • Does COURSE include the word “business” in the course title or textbook? Smith College does not give credit for business classes.

EGR majors should consult this page first, and then present a syllabus (preferably electronic) to the SDS study abroad adviser. Although SDS 220 is a 5-credit course, the number of credits is not a determining factor.

List of Previously Approved Courses

For reference only, we provide a list of previously approved courses. Courses change over time and vary by instructor -- students should understand that just because a course was previously approved in the past does not guarantee that it will be approved in the future.

Previously Approved

Previously Not Approved

  • STAT 240, UMass
    • This course is considered equivalent to SDS 201, which does NOT fulfill the statistics requirement for EGR. Note also that STAT 240 is only 3 credits, whereas SDS 220 is 5 credits. 
  • ECE 214: Introduction to Probability and Random Processes, UMass
    • This course would be considered equivalent to MTH 246. Note the Calc III requirement.

Fall 2023

  • SDS 100: Reproducible Scientific Computing with Data (Casey Berger; Will Hopper; Nic Schwab)
  • FYS 189: Data and Social Justice (Lindsay Poirier)
  • SDS 192: Introduction to Data Science (Shiya Cao; Jared Joseph; Jared Joseph)
  • SDS 201: Introductory Statistics (Will Hopper)
  • SDS 220: Introduction to Probability and Statistics (Rebecca Kurtz-Garcia; Rebecca Kurtz-Garcia; Scott LaCombe)
  • SDS 237: Data Ethnography (Lindsay Poirier)
  • MTH 246: Probability (Kaitlyn Cook)
  • SDS 270: Advanced Programming for Data Science (Jared Joseph)
  • SDS 271: Programming for Data Science in Python (Casey Berger)
  • SDS 291: Multiple Regression (Kaitlyn Cook; Will Hopper)
  • SDS 300: Disability Inclusion and Data Analytics (Shiya Cao)
  • PSY/SDS 364: Research Seminar: Intergroup Relationships (Randi Garcia)
  • SDS 390: Ecological Forecasting (Albert Y. Kim)
  • SDS 400: Special Studies
  • SDS 404: Honors Thesis
  • SDS 410: Capstone (Albert Y. Kim)

Spring 2024

  • SDS 100: Reproducible Scientific Computing with Data (Justin Baumann; Katherine Kinnaird; Nic Schwab)
  • SDS 192: Introduction to Data Science (Shiya Cao; Jared Joseph)
  • SDS 201: Statistical Methods for Undergraduates (Will Hopper)
  • SDS 220: Introduction to Probability and Statistics (Albert Y. Kim; Albert Y. Kim; Rebecca Kurtz-Garcia)
  • SDS 270: Programming for Data Science in R (Jared Joseph)
  • SDS 290: Research Design and Analysis (Randi Garcia)
  • SDS 291: Multiple Regression (Will Hopper; Kaitlyn Cook)
  • SDS 293: Modeling for Machine Learning (Katherine Kinnaird)
  • SDS 300: Applications of SDS in Marine Ecology (Justin Baumann)
  • MTH/SDS 320: Mathematical Statistics (Rebecca Kurtz-Garcia)
  • SDS 390: Topics in Biostatistics (Kaitlyn Cook)
  • SDS 400: Special Studies
  • SDS 404: Honors Thesis
  • SDS 410: Capstone (Shiya Cao)

 

 


“Employment of statisticians is projected to grow 27 percent from 2012 to 2022, much faster than the average for all occupations. Growth is expected to result from more widespread use of statistical analysis to make informed business, healthcare, and policy decisions.”
Bureau of Labor Statistics


Emeriti

Katherine Halvorsen
Professor Emerita of Mathematics and Statistics

Research Associate

Nicholas Horton
Research Associate in Statistical & Data Sciences

Program Committee

Shannon Audley
Associate Professor of Education & Child Study
 

R. Jordan Crouser
Associate Professor of Computer Science
 

Glenn Ellis
Professor of Engineering
 

Howard J. Gold
Professor of Government
 

Suzanne Z. Gottschang
Professor of Anthropology and of East Asian Studies
 

Mary Harrington
Tippit Professor in the Life Sciences (Psychology)
 

Caroline Melly
Associate Professor of Anthropology
 

Philip K. Peake
Professor of Psychology
 

Cornelia Pearsall
Professor of English Language & Literature
 

Marney Pratt
Laborator Instructor in Biological Sciences
 

Susan Sayre
Associate Professor of Economics
 

Vis Taraz
Assistant Professor of Economics
 

Terry-Ann Craigie
Associate Professor of Economics
 

Think statistics is just about calculating things? Think again. 

The fields of statistics and data science are growing exceptionally fast. As technology continues to reshape our world, more and more data are being collected on any number of subjects. There is a growing belief among decision makers that these data can be useful. Yet the process of transforming data into actionable information is challenging.

To analyze modern streams of data, government agencies, nonprofits (NGOs) and private industries seek data analysts with technical skills (programming ability), the ability to reason quantitatively about data and uncertainty, and strong communication skills (in written, oral and visual forms). People with these skills are in high demand.

Statisticians use their deep understanding of mathematics and probability theory to reason about variation and uncertainty in data. For example, if a drug was observed to have a positive effect on patient outcomes in a clinical trial, was that effect large enough—given the sample size and assumptions about how the data was collected—to justify concluding that the drug actually worked? Statisticians build, validate and interpret models. They design experiments and collaborate with scientists of all stripes to make precise estimates of unknown quantities.

Data science is an emerging field that combines elements of mathematics, statistics and computer science to extract meaning from data. Data scientists work with large, complex, messy and live data sources. Often working on questions that are not well-defined, data scientists use their creativity and technical ability to dig deep into "Big Data." They build models, make predictions, and develop static and dynamic ways to visualize data.

While at Smith, statistics students have created innovative classroom activities, authored honored theses, developed sophisticated statistical software and contributed to the Office of Institutional Research. Graduates have found internships with the National Institutes of Standards and Techonology and the New York Mets, employment with GoogleMIT’s Lincoln Laboratory, and MassMutual’s Data Science Development Program. Student have also gone on to graduate school at UC Berkeley, The Harvard School of Public Health, Ohio State University, and the University of Massachusetts.

Contact

Department of Statistical & Data Sciences

Wright Hall 226
Smith College
Northampton, MA 01063

Phone: 413-585-3520
Email: kdunphy@smith.edu

Kelley Dunphy
Administrative Assistant 

Randi L. Garcia 
Chair, Program in Statistical & Data Sciences