Data Science Institute

Data Science Grants @ Brown 2024

In 2024, DSI awarded nine seed grants to data science researchers across a range of disciplines at Brown.

DSI awarded a total of $283,287 to nine data science projects across the university, ranging from topics in Physics to Anthropology to Medicine. These projects will engage researchers at all levels, including high school students and undergraduates.

2024 DSI Seed Grants

 

David Laidlaw (CS)Ziang Liu (CS PhD Student), James Kellner (EEOB), James Tompkin (CS), Matthew Harrison (DAM) 

Our project will explore knowledge-discovery tools with an application to global remote sensing data from NASA’s GEDI instrument.  Within this application area, the tools will help generate insight and answer questions about global ecosystems and ecology.  The tools themselves will expand the repertoire of unsupervised explainable machine learning methods.  We also anticipate that the tools will generalize to other application areas, automating and accelerating scientific discovery from data more broadly.

DSI gedi map

Albert Larson (DEEPS), Ali Shafqat Akanda (Civil and Environmental Engineering, University of Rhode Island)

Water-borne diseases such as cholera continue to impede global wellness and human health. Decision-support tools like early warning systems can aid in community engagement, and in preparation and response to anomalous and recurring disasters. Better climatic data in these early warning systems improve outcomes. Ocean environmental variables are known to influence the behavior of hydroclimatic patterns. For example, the Indian Ocean Dipole is a periodic event that affects rainfall and temperature in South Asia as well as Eastern African regions; regions that are regularly affected by cholera outbreaks.


Our focus country for this proposed study, Bangladesh, suffers from year-round scarcity of safe water and sanitation access due to a monsoon climate. Water sources are regularly polluted with cholera and other pathogens due to drastic environmental changes throughout the year. Here, we will create a web application that shows the results of using harmonized land and oceanic data to analyze and forecast the incidence of cholera in Bangladesh. We use a lagged supervised learning approach for the algorithm basis to transform meteorological forcings and environmental variables into prediction of population health. We shall also test the applicability of this approach for recently affected regions in Eastern African countries to see the coherent impacts of the Indian Ocean Dipole in nearby geographies.

Bangladesh

Matt LeBlanc and Jennifer Roloff (Physics)

Particle physicists at the Large Hadron Collider (LHC) are using massive datasets to explore the fundamental interactions of the universe. This project seeks to improve the efficiency of Monte Carlo simulations, which are a crucial tool used to interpret these data. Currently, the most advanced simulations produce large amounts of 'pathological' events with negative or anomalously large weights. The effects of these pathological weights can be mitigated by producing larger numbers of simulated events, which strains computational and storage resources. One proposed method to mitigate these effects is an algorithm called 'cell reweighting,' which averages the weights of events that are ‘similar’ according to some user-defined metric that creates a lower-dimensional representation, or embedding, of the dataset.

This project will refine existing cell reweighting algorithms by implementing improved metrics using optimal transport techniques, enhancing the accuracy of the data embedding. Further improvements will be explored by leveraging AI to directly learn embeddings of the data, substantially increasing the speed of such algorithms. This project has the potential to alleviate computational bottlenecks in particle physics research, with implications for other fields requiring efficient low-dimensional representations of data.

Jennifer Roloff
Jennifer Roloff
Matt LeBlanc
Matt LeBlanc

 

 

OCE program
Ocean, Climate, and Ecosystems Research Internship 2023-2024 Cohort

Sarah Lummis and Emanuele Di Lorenzo (DEEPS/IBES)

The Ocean, Climate, and Ecosystems (OCE) Research Internship aims to bridge the significant gap in STEM education by providing high school students with hands-on research experience in environmental sciences and data analysis. Despite the escalating environmental challenges our planet faces, opportunities for young scholars to contribute to meaningful scientific research, especially in the crucial fields of oceanography, climate science, and marine ecosystem conservation, remain scarce. This proposal seeks to leverage Brown University's resources and the DSI’s commitment to advancing data fluency to cultivate the next generation of scientists and researchers.

RIPTA logoAlice Paul (Biostats) and Serdar Kadioglu (CS)

The Rhode Island Public Transit Authority (RIPTA) is the primary transit service within Rhode Island. The planning, scheduling, and data analysis teams within RIPTA use past ridership data to drive future operational decisions. This project will use this data to help support RIPTA's mission of providing reliable and cost-effective service across the community. First, this project will estimate the impact of the Washington Bridge closure on ridership and service levels. Second, we will work with RIPTA to develop a data-driven way to maximize service under resource constraints. A key objective will also be to make this decision-making tool transparent and explainable to the public.

Jordi A. Rivera Prince (Anthropology), Blanca Payne (Anthropology), Annalisa Heppner (North Burial Ground), 

Nearly 325 years of United States history can be found between North Main Street, I-95, and Branch Avenue—the City of Providence Parks Department’s own North Burial Ground (NBG). However, a majority of this history is rendered inaccessible as much of the data is not in a structured format. Rather, the data is found directly on the 100,000 tombstones found across the 150 acre historic cemetery, and in some cases in handwritten records. NBG is at a key juncture in which systematic documentation of tombstones is an urgent necessity in order to preserve a significant piece of Providences' past, especially as natural processes and human action can cause destruction to the tombstones.

The North Burial Ground Documentation Project is a community-engaged research project to record, digitize, and transcribe tombstones and written records at NBG. With support from the DSI Data Science @ Brown Seed Grant, the NBG Documentation Project team will continue documentation efforts, and begin the production of an image processing pipeline to assist in data digitation processes. The NBG Documentation Project is a collaboration between Brown Anthropology and North Burial Ground, with connection and technical support from the Community-Engaged Data and Evaluation Collaborative (CEDEC) at Brown. Data from the North Burial Ground Documentation Project will be used to explore a variety of social issues throughout Providence's history, including changes in religious beliefs, public health and life expectancy throughout time, and investigate histories of communities typically not centered in US colonial narratives.

 
 
North Burial Grounds
North Burial Grounds
Rivera
Jordi Rivera Prince, Annalisa Heppner, Blanca Payne

 

Seda Salap-Ayca
Seda Salap-Ayca

Seda Salap-Ayca (DEEPS/IBES)

"Mappy Python Diaries" is a living textbook project designed to enhance educational accessibility and interdisciplinary research in Geographic Information Science (GIS) through the use of Open Educational Resources (OER). By leveraging platforms like Jupyter Book and GitHub, the project aims to provide high-quality, cost-free educational materials that promote data fluency and collaborative learning.  The project will also demonstrate a commitment to collaboration and community engagement, actively involving undergraduate students in the process of updating educational materials. By providing students with hands-on experience in curriculum development and content creation, not only the quality of the educational resources will be enhanced,  but also a culture of knowledge-sharing and mentorship within the academic community will be cultivated.

 

Jun Tao Headshot
Jun Tao

Jun Tao (Medicine and Epidemiology), Ellie Pavlick (CS), Philip Chan (Medicine)

This proposal will collect initial data to develop and fine-tune an AI-based Chatbot called CHIA (Chatbot for HIV Prevention and Action) based on the existing large language model (LLM) (e.g. Chat Generative Pre-Trained Transformer [ChatGPT]) to harness motivational interviewing (MI) to promote HIV pre-exposure prophylaxis (PrEP) uptake among men who have sex with men (MSM), with a particular focus on Black/African American and Hispanic/Latino MSM. The study encompasses four core domains using the AI-based Chatbot CHIA: risk assessment, education and awareness, performing MI-based counseling, and linkage to PrEP care.

Nikos Vasilakis
Niko Vasilakis
Christelle Alvarez
Christelle Alvarez
Michael Greenberg
Michael Greenberg

Nikos Vasilakis (CS), Christelle Alvarez (Egyptology & Assyriology), Michael Greenberg (CS, Stevens Institute of Technology)

The Digital Pyramid Texts project aims to leverage computational and statistical methods to democratize access to the 4000-year-old Pyramid Texts, the earliest known ritual and religious collection of texts of humanity, for both scholars and the general public. This interdisciplinary collaboration brings together Egyptologists and Computer Scientists to identify and develop innovative techniques for processing, storing, querying, and analyzing large-scale datasets derived from ancient Egyptian inscriptions carved in the pyramids of the Old Kingdom. The DSI grant will facilitate the development and deployment of the inaugural version of the Digital Pyramid Texts platform, which will focus on the digitization and transformation of high-definition images (orthoimages) and facsimiles of these inscriptions into computer-readable objects. We envision a future where anyone can interact meaningfully with these texts — with the simplicity and ease of other visual interfaces that require little to no expertise.