Data Science Institute

Data Science Grants @ Brown 2023

In 2023, DSI awarded nine grants in to data science researchers in a range of disciplines at Brown.

Grants awarded range from $1,750 to $60,000, supporting projects ranging from health applications of machine learning to improving data science education, from using text mining to understand the changing nature of social elites to creating better models for human-robot interactions. These projects will engage researchers at all levels, including high school students and undergraduates.

2023 DSI Seed Grants

Pablo León VillagráPI: Pablo León Villagrá, Cognitive, Linguistic, and Psychological Sciences

Categorization is at the core of human cognition; it mediates all our interactions with the world, providing structure and allowing us to see superficially different entities as belonging to the same class. Previous research has highlighted how rapidly and efficiently children can learn abstract categories, often from as little as one word. This capacity is especially striking early in development: within the first few years of life, children successfully learn complex categories, ranging from the color categories of their language to the structure of biological taxonomies. While some elements of human categorization are likely innate, many aspects must be learned throughout development. Focusing on these developmental changes across childhood offers a unique insight into how individual experiences shape our category representations. However, despite considerable research, there is an ongoing debate about the properties of these categories and whether all children learn categories in the same way.
We propose a novel methodology that addresses the limitations of current developmental studies by creating a tablet application incorporating experimental procedures that can adapt to the child’s knowledge and past performance and thus efficiently produce detailed behavioral data. We use this technique to collect a large dataset of individual children’s categorical development to understand how children’s concepts change over time.

Peihan MiaoPI: Peihan Miao, Computer Science

To facilitate data analysis on genomic data while protecting patient privacy, our project involves the development of robust privacy-preserving cryptographic protocols using secure multi-party computation (MPC). This method allows multiple distrusting entities to perform joint computations on their private data without exposing the data to one another. Our aim is to create fast, secure, and scalable MPC protocols tailored for storing and analyzing genomic data on public servers or in the cloud, without sacrificing efficiency and accuracy. Towards this goal, we focus on several key problems such as normalization for gene expression data, principal component analysis for genomics, and others that have found numerous applications in genomics research.

Cartoon by Edward Koren, New Yorker, 1999PIs: Zhenchao Qian and Guixing Wei, Population Studies TrainingCenter and Spatial Structures in the Social Sciences

The New York Times has been publishing wedding announcements since 1851. Every announcement comes with a story of love and showcases couples’ unique social backgrounds and accomplishments. NYT editors aim to choose couples with any type of achievements. Yet couples who appear in the announcements are much more likely to be a graduate from an elite university, a congressional staffer, an elite lawyer, or an investment banker than the average American. It is the place where you meet elites. Tracking wedding announcements over time offers data to understand the rise of new elites and the decline of old. This project proposes to analyze the NYT wedding announcements digitally available since 1980 and examine changes in what constitutes the new upper class, with a special emphasis on increasing diversity (minorities, immigrants, and LGBTQ) and changing gender roles (family and work).  


Photo of beach and water at sunset/sunrisePIs: Sarah Lummis and Emanuele DiLorenzo

The goal of the Ocean, Climate, and Ecosystems (OCE) data science research internship program is to provide high school students with an opportunity to acquire some of the basic tools for data science and research, and conduct and publish a personalized science project. Students will be in-person at Brown for two weeks over the summer to get hands-on experience and begin to develop a research project. This project will be worked on over the course of the next year and will be supported through weekly meetings with mentors at Brown. The students will return for two weeks in summer 2024 to present their work and help support the next cohort of students as they start their research internship!

COVID-19 SEIR graphPI: Alice Paul, Biostatistics

This project will create an interactive, online book on R for data analysis in public health. With data science and coding skills becoming more essential to the work done in public health, this resource will help to support data fluency and foster cross-disciplinary research.  While this tool is motivated by the public health community at Brown, health data science is a rich source of interesting and complex data that is ideal for motivating and illustrating data science concepts in a way that would appeal to a broad audience.  Additionally, there will be a corresponding summer workshop for students, faculty, and staff to get introduced to R.

Kim GallonPI: Kim Gallon, Africana Studies, Black Health Heritage Data Lab

This year-long project will collect data on the uptake of digital health technologies among communities of color living in select areas of Providence. The goal of the data collection is two-fold: 1. To produce data to drive interventions that will increase community members' use of patient portals and access to personal health records.  2. To increase data fluency among youth of color in Providence through data work. 


Somdatta Goswami
Somdatta Goswami

Adriana Coll de Pena
Adriana Coll de Pena
PI: Somdatta Goswami, Applied Mathematics

Adriana Coll de Pena, Biomedical Engineering

The current literature on DNA mobility and migration is based on the Ogston and Reptation models, but they have been subject to debate. Characterization of long single-stranded RNA (ssRNA), as well as double-stranded RNA (dsRNA), is limited, and various factors impact the mechanics of separation. Microfluidic electrophoresis can prove to be an efficient method to detect dsRNA impurities in synthetic messenger RNA (mRNA) products. Studying the parameters affecting the electrophoretic mobility of mRNA/ssRNA and dsRNA is important, particularly the longer fragments. To refine the current mathematical model used to define the electrophoretic mobility of different nucleic acid species with different chain lengths, we aim to propose a physics-informed neural network (PINN) model and a physics-informed deep operator network (PI-Deep-ONet) model to determine the unknown field variables in the equations that control the electrophoretic mobility. Additionally, a neural operator-based classifier (DeepONet-based classifier) model will be proposed to determine the % impurity of dsRNA in the mRNA samples. These models can be used to predict electrophoretic mobility for different chain lengths of nucleic acids or their mixtures and develop efficient contamination diagnostics assays for new vaccine constructs.

two photos: a dog wearing movement tracking equipment, and a robot

PIs: Daphna Buchsbaum, CLPS, and Stephanie Tellex, Computer Science

The aim of this project is to use human-dog interaction as a model for human-robot interaction. We hypothesize that by modeling human-robot interaction after the way that people interact with dogs, we can improve the speed and accuracy of our robot’s collaboration with their human partner. Existing state of the art methods that use human-human interaction for modeling human-robot interaction are computationally intensive, and the resulting robot behavior and communication are often unintuitive or opaque to human partners, increasing workload as well as the chance of errors. Dogs present a promising potential model for human-robot cooperation, because through domestication they have been selected for communicating and partnering with humans.

Kathi FislerPI: Kathi Fisler

This project will conduct research on effective pedagogy for teaching students to be responsible data scientists. Guided by established results about learning from cognitive science, we will iterate on designs of exercises that teach students about various core responsible data science concepts such as consent, privacy, and misinformation. The anticipated outcome will be findings on how to design instructional activities in this space, as well as research-based evaluation of specific activities with a population of Brown undergraduates.