Data Science Institute

DSI-Funded Research

DSI has made grants to Brown researchers for data science projects in a range of disciplines.

Stay tuned for future funding opportunities for data science research at Brown.

Projects Funded

Please see here for grants awarded in 2023. 

Predicting the Course of Chemical Reactions with Deep Reinforcement Learning

Brenda Rubenstein, Cancan Huang, Leonard Sprague, Gopal Iyer (Chemistry)

Deep Learning Segmentation of Liver Structures, Tumors, and Effects of Treatment from Medical Images

Ben Kimia, Ben Hsieh (Engineering)

An Introductory Workshop on Machine Learning Methods for Undergraduate Students in MCM and STS

Theo Lepage-Richter (Modern Culture and Media)

Topological Data Analysis of Dynamic Tumor Architecture

Ian Wong, Dhananjay Bhaskar (Engineering), Lorin Crawford (Biostatistics)

Predictive models for microbial response to antibiotic nanoparticles: a hybrid approach of machine learning and mechanistic models

Zhijin Wu, Nicola Neretti (Biostatistics), with Jingyi Chen and Yong Wang (U of Arkansas)

SHARE: Secure Healthcare and Administrative Records Environment 

Seny Kamara (Computer Science), David Yokum, Kevin Wilson (The Policy Lab)

Big Virtual Reality in the Brown YURT for Data Visualization

David Laidlaw (Computer Science)

Machine Learning for Small Data -omic Problems: Identifying the Immunosuppressive Proteomic Signature for Adipose-Derived Stem Cells

Adrienne Parsons (Biotechnology Graduate Program), Lorin Crawford (Biostatistics), Eric Darling (Molecular Pharmacology, Physiology, and Biotechnology)

“Forecasting Patterns of Delirium and Early Recovery After Acute Stroke,” 

Michael Reznik, Rhode Island Hospital and Carsten Eickhoff, Center for Biomedical Informatics

This is a project to develop new ways to diagnose delirium after stroke. Data from wearable (wrist) sensors will be used with machine learning to identify delirium phenotypes. Patterns identified may also be predictive for motor recovery after stroke, and similar machine learning techniques will be used to  identify activity-based phenotypes corresponding to post-stroke functional outcomes. (Co-funded by the Office of the Vice-President for Research)

“Markers of Premature Biological Aging in Chronically Homeless Individuals,” 

Eric Jutkowitz , School of Public Health and John McGeary, VA Hospital and Warren Alpert Medical School.

This study examines premature biological aging (PBA) in homeless individuals on a molecular level, in order to fill key gaps in the literature and develop pilot data (i.e., big epigenetic data) to support an R01 grant application. This work will be the first in the country to examine PBA in individuals with prolonged exposure to an environmental stress, such as homelessness. (Co-funded by the Office of the Vice-President for Research)

"A Quantitative Measure of Freedom of Assembly," 

Jesse Shapiro, Economics Department, and collaborators from University of Michigan and University of Calgary

The project uses machine-learning methods to predict protest activity in countries around the world using a combination of Google query data and financial indices. The project uses an economic model to translate these predictions into measures of the freedom of assembly in different countries.  

“Deep Learning for Alzheimer's Prediction,” 

Amy Greenwald, Computer Science, and Eric Jutkowitz, School of Public Health, with Katherine Kinnaird (Smith College) and students

This project uses machine learning to predict Alzheimer’s disease using Medicare data. The goal is to make earlier diagnoses possible by identifying patterns in individuals’ healthcare encounters.

“Effects of Twitter Bots on Climate Change Discourse,” 

Timmons Roberts, Tom Marlowe, EEB/IBES, Climate Development Lab

This research focuses understanding the role of automated bot accounts on Twitter in shaping discourse formation around climate change related events. In particular, we are focusing on characterizing the topics of discourse surrounding the Paris Climate agreement and the United States' subsequent withdrawal to better understand the activities and influence of Twitter bot accounts.

“Developing methods for historical mapping of populations and industrial firms,” 

John Logan, Sociology, S4, Scott Frickel, Sociology, IBES, Andras Szom, CCV

To develop methods to extract and organize data into a spatial database, from historical text records of urban development in the US (census, city directories, records on industrial and commercial firms). The project uses innovative applications of OCR methods, integrated in new ways with GIS. In addition to furthering the investigators’ specific projects on urban history in Providence, the methods that are developed can be applied to other US cities.

“Using Data to understand maternal healthcare outcomes on the Thailand-Burma border,” 

Neil Sarkar, Brown Center for Biomedical Informatics,  Sudheesha Perera, Warren Alpert Medical School

Located on the Thailand-Myanmar border, the Mae Tao Clinic has played a vital role in providing reproductive health care to Burmese migrants for three decades throughout the world’s longest-running civil war. The clinic’s Health Information System tells this remarkable story through the lens of data, and presents a rare opportunity to quantitatively examine the clinical drivers of maternal health outcomes in a resource-constrained setting. Our goal is to expand upon the clinic's information system, and ultimately produce actionable models of maternal health and sustainable data collection methods that serve as a template for future data-driven reproductive health research.

“Sleeping in the Wild: Perception versus Computation of Sleep Quality over 10,000,000 Nights,” 

Jeff Huang, Computer Science, Nediyana Daskalova, Computer Science

Sleep has been gaining public interest and awareness as a important factor for health, productivity, and happiness. We are collaborating with the developers of the most popular sleep tracking app on Android phones to analyze their data set with over 10 million nights of sleep records from over 60,000 people, possibly one of the largest collection in history. The findings from this dataset are intended to be widely disseminated to the public via popular press. This dataset allows us to make societally-relevant inferences about sleep quality measurements, jetlag and social jetlag (weekend differences), and understand individual-level versus population-level sleep habits. Our findings will validate existing sleep theories and findings on those topics, as well as introduce new knowledge about less common types of sleepers like frequent travelers or night shift workers (often difficult to study in a clinical setting). We seek to understand how people subjectively perceive sleep quality in comparison to computable sleep measures over long periods of time (spanning years). Our collaboration spans multiple disciplines, institutions, and is seeded by a partnership between commercial application developers and academics.

“Advancing Neural Network Analysis of Cosmological Data Sets,” 

Jonathan Pober, Physics, Joshua Kellner, Physics, and collaborator from University of Pennsylvania

Using machine learning techniques to help understanding of the first galaxies, specifically, adding to the field by developing methods to understand (measure) uncertainties in predictions about physical processes in galaxy formation. [need uncertainty to constrain possibilities] This study proposes combining a fully convolutional neural network with Extended Kalman Filtering – “At the end of this grant, we aim to have an FCN-EKF technique that provides an understandable uncertainty for a parameter prediction, which can be confirmed using limiting cases such as cosmological redshift slice cubes containing only a noise signal.”

“Geospatial Remote Sensing Data,” 

James Kellner, EEB, IBES

This is a new collaboration to change the way we interact with geospatial remote sensing data, and fundamentally advance the kinds of information that can be extracted from high-resolution image time series. The project addresses the conceptual problem of automating the extracting non-quantitative information from image time series, and the practical problem of managing the massive amounts of data involved in this kind of information creation and storage. The project looks at 3 specific technological problems: (1) image alignment and co-registration, (2) identification of objects within image subsets, and (3) geometric point pattern matching to track objects through time (when relevant). The research will focus on two specific themes: quantifying surface water changes in the Arctic, and monitoring populations of individual rain forest canopy trees in continental South America.