Data Science Institute

Curriculum

The curriculum for the Data Science Master's Program consists of nine credits: eight required courses, one of which is the experiential project course, and one elective.

The program can be completed in 12 months (September to August). All students begin the program in September; there is no option for starting in the spring semester. Students may elect to complete the program over 16, 21, or 24 months, and most do so. In some cases, exceptionally well-prepared students might be able complete their work in nine months.

Fifth-year masters students must complete the program in one year (September to August). 

For students taking longer than 12 months, full-time status for visa purposes is two credits per semester (and only one credit in the final semester). For fifth-year master's students, full-time status is three credits per semester. 

Required Courses

Develops all aspects of the machine learning pipeline: exploratory data analysis, visualization, feature engineering, handling missing data, modeling, interpretability, presentation in the context of real-world datasets. Classical models and techniques for classification and regression are included (linear and logistic regression with regularization, support vector machines, nearest neighbors, decision trees, random forests, XGBoost). We will use the Python data science ecosystem (e.g., sklearn, pandas, matplotlib). Typically offered in the Fall semester. 

This course covers the storage, retrieval, and management of various types of data and the computing infrastructure (such as various types of databases and data structures) and algorithmic techniques (such as searching and sorting algorithms) and query languages (such as SQL) for interacting with data, both in the context of transaction processing (OLTP) and analytical processing (OLAP). Students will be introduced to measures for evaluating the efficacy of different techniques for interacting with data (such as ‘Big-Oh’ measure of complexity and the number of I/O operations) and various types of indexes for the efficient retrieval of data. The course will also cover several components of the Hadoop ecosystem for the processing of "big data." Additional topics include cloud computing, NoSQL databases, and modern data architectures. Introduction to some of the concepts and techniques of computer science essential for data science will also be covered. Typically offered in the Fall semester.

Examination of probability theory and mathematical statistics from the perspective of computing. Topics selected from random number generation, Monte Carlo methods, limit theorems, stochastic dependence, Bayesian networks, dimensionality reduction. Prerequisites: APMA 1650 or equivalent; programming experience is recommended. Typically offered in the Fall semester.

 

A modern introduction to inferential methods for regression analysis and statistical learning, with an emphasis on application in practical settings in the context of learning relationships from observed data. Topics will include basics of linear regression, variable selection and dimension reduction, and approaches to nonlinear regression. Extensions to other data structures such as longitudinal data and the fundamentals of causal inference will also be introduced. Typically offered in the Spring semester.

We know we want to build more equitable technology, but how? In this course we’ll review the latest developments in how to build more equitable algorithms, including definitions of (un)fairness, the challenges of explaining how ML works, making sure we can get accountability, and much more. Typically offered in the Spring semester.

Deep Learning belongs to a broader family of machine learning methods. It is a particular version of artificial neural networks that emphasizes learning representation with multiple layers of networks. Deep Learning, plus the specialized techniques that it has inspired (e.g. convolutional neural networks, recurrent neural networks, and transformers), have led to rapid improvements in many applications, such as computer vision, machine learning, sound understanding, and robotics. This course gives students an overview of the prominent techniques of Deep Learning and its applications in computer vision, language understanding, and other areas. It also provides hands-on practice of implementing deep learning algorithms in Python. A final project will implement an advanced piece of work in one of these areas.

CSCI 1470 is offered in the Spring semester and CSCI 1420/DATA2060 is a recommended prerequisite. CSCI 2470 is typically offered in the Fall semester and CSCI1420/DATA2060 is a required prerequisite.

Data science techniques and tools are all around us. Machine learning is a term used across many different disciplines, and often people use machine learning tools without a thorough understanding of how and why the tools work. This course will provide students with a foundation of machine learning grounded in the mathematical models behind the techniques. The course will cover the theory, computational methods, and visualization inherent in the application of machine learning models. In this course, you will learn the statistical learning framework, common assumptions in the data generation process, the mathematics behind machine learning models, including supervised and unsupervised techniques, as well as how to implement machine learning models in Python from scratch. Typically offered in the Fall semester. 

The practicum experience is a hands-on thesis project that entails an in-depth study of a current problem in data science. Students will synthesize their knowledge of probability and statistics, machine learning, and data and computational science. Students will work in teams on projects with Brown faculty members or with external companies. The project will be completed as part of a course that includes additional career-oriented skills development. Typically offered in the Fall, Spring, and Summer semesters.

Domain knowledge relevant to individual interest, 1 credit, must be a graduate level course with 4-digit course number starting with a non-0 digit. Most graduate level CSCI and APMA courses qualify. Please contact the DGS if you plan to take a course from a different department.