Data Science Institute

Data Science Fellows

The Data Science Fellows program offers a unique opportunity for students to collaborate with a participating faculty member to infuse data science tools and practice in existing undergraduate courses.

Linda Clark with studentsUndergraduates interested in data science and collaborative work with a faculty member have the opportunity to enroll in the Data Science Fellows course  (DATA 1150, offered each fall semester). This course will prepare the Data Science Fellows to serve as consultants for faculty wishing to enhance data science curricula at Brown.

This program is offered in collaboration with the Sheridan Center for Teaching and Learning as a part of the the Sheridan Center's Learning Collaborative program. Students learn a core set of data science practices, active learning pedagogies, and collaborative communication skills.

DATA 1150: Data Science Fellows

Offered Fall semester.

Data science is growing fast, with tools, approaches, and results evolving rapidly.  This course is for junior and senior level students with familiarity with data science tools and skills, seeking to apply these skills and teach others how to implement and interpret data science.  Working in conjunction with a faculty sponsor, this course teaches students communication skills, how to determine the needs (requirements) for a project, and how to teach data science to peers.  These valuable agile skills will be an incredible advantage moving forward in your professional development.  

Students interested in enrolling in DATA 1150 will need to complete an application form prior to pre-registration, describing their interest in becoming a Data Science Fellow and their experience with data science.  While Data Science Fellows can come from any concentration, they should have some experience or expressed aptitude (formal or informal) in one of the following areas: programming, statistics/applied math, or data visualization.  Any questions about the Data Fellows course should be sent to Linda Clark at Linda_Clark@Brown.edu.

Data Science Fellows will earn up to 10 hours per week (paid at the UTA department rate for the course) during the semester for their work with faculty.

Prerequisites

  • Advanced experience in and familiarity with data science strategies and/or tools (for example, coding, statistics, or visualizations) are required for this course.  
  • (Data Fluency Certificate students must complete DATA 0200 prior to DATA 1150)
  • Junior or Senior standing
  • Permission of the instructor
  • Completion of the data science fellows interest form

Course Topics

  • What is data science?
  • What is consulting/ how to run a consulting meeting
  • Project management
  • Interdisciplinary understanding
  • Data science topics
  • Teaching data science
  • Learning outcomes for courses/classes
  • Discovering and communicating possible problem solutions/ project sustainability
  • Assessing student learning
  • Cognitive load and active learning
  • Giving/receiving feedback/continuous improvement
  • Reporting and presenting

See this information for students from the Sheridan Center for Teaching and Learning

Have you ever thought about introducing your students to a new way of thinking about your discipline through data? Is there a data related tool out there that might enhance your teaching? Are you a little overwhelmed by thinking about ‘what is data science anyway’? If you find yourself thinking about these issues, please consider applying to work with a Data Science Fellow during the Fall 2024 to assist you with your data science related course development.

To apply, complete this interest form. For more information about prior projects, see the examples below.

Request a Data Science Fellow

Data Science Course Design Institute

Data Science in My Course?

Have you ever thought about introducing your students to a new way of thinking about your discipline through data?  Is there a data related tool out there that might enhance your teaching?   Are you a little overwhelmed by thinking about ‘what is data science anyway’?   If you find yourself thinking about these issues, the Data Science Course Design Institute (DSCDI) is a great opportunity for you to begin to develop new, or enhance existing data science related content in your course.  

As a participant in the DSCDI you will:

  • Acquire a better understanding of the scope of the field of data science
  • Observe and discuss strategies to teach data science content, with an emphasis on differentiating among different levels of technical skills
  • Write a data science learning outcome for your course
  • Create a outline for a data science related teaching and learning experience 

In addition to the content knowledge, participants in the institute receive a $750.00 stipend to faculty research accounts.  Participants are also given priority to collaborate with a Data Science Fellow in Fall 2024.  The Data Science Fellow is an undergraduate student with data science skills to assist the faculty member with implementing the plan generated in the course design institute.  Data science fellow stipends are compensated by the Sheridan Center and thus no cost to you or your department. 

Past participants in the DSCDI find the implementation of their curricular innovations greatly advanced by the collaborative discussions and information in the DSCDI.  Some examples of past projects include:

  • Developing in-class tutorials on R and Python
  • Enhancing data visualizations of challenging statistical principles
  • Text mining literature corpus to identify key readings to align with learning outcomes
  • Leveraging Python to study Arabic proper name roots and cultural connections
  • Leveraging pre-scripted Python notebooks to facilitate data analysis and results for lab based courses.
  • Developing machine learning examples using domain specific data
  • Exploring image segmentation to identify a process to categorize ancient images for classroom instruction
  • Providing replicable data analysis from scholarly articles
  • Developing a web-based tool to determine foreign language proficiency standards
  • Web scraping and text mining student survey responses to provide formative feedback for curricular content
  • Developing a new DATA First Year Seminar course at Brown
  • Developing a series of instructional videos about using the pandas package in Python
  • Integrating data visualizations and better survey writing into Chinese language courses
  • Designing an easily editable Community Profile Flyer template to quickly create personalized flyers for organizations in Rhode Island that work with Black communities
  • Building an OCR model with the goal of accelerating the pace at which researchers and learners can parse massive amounts of hieroglyph data

The Spring 2024 DSCDI will be completely remote, starting in early February 2024 through April 2024.  There will be one synchronous meeting (one and a half hours long) each month (February, March and April).  There will be a total of three modules for the DSCDI, for each module participants will be expected to:

  • Read one to two articles (guiding questions provided)
  • Post a reaction to the institute discussion forum
  • Respond to a colleague’s reaction
  • Record developing data science related plans for their own teaching  
  • Complete an ‘entry ticket’ for each synchronous meeting

Return here for the 2025 DSCDI Interest Form.

See this information for faculty from the Sheridan Center for Teaching and Learning

Students who wish to register for DATA 1150 are required to complete an application form prior to spring pre-registration to be considered for the course. Please email Linda Clark (linda_clark@brown.edu) to apply or get more information.

Return here for the 2025 Data Fellows application. 

DSF banner

Explore Data Fellows Projects

2023 Data Fellows

Student: Naphat Permpredanun, Faculty: Rebecca Nedostup

"I have been working with Prof. Rebecca Nedostup to develop a pipeline to compare the corpora extraction between the Generative AI and Library Databases. This project would be built as an assignment for students in the Chinese History class to learn about the idea of Corpora extraction between the old and new convention. Therefore, the project is divided into two parts : A research on the difference between two sources using the different types of questions about Chinese History (such as Factual Question, Reading suggestion, etc), and a creating of the pipeline for a future researcher or student to use. I utilized the jupyter notebook to start the aggregation of API for Generative AI (such as ChatGPT, Bard, Claude) and combining them with web scraping on the library database."

Student: Rainy Wortelboer, Faculty: Karianne Bergen

"I have been working with Professor Karianne Bergen to develop a new DATA First Year Seminar (FYS) course at Brown. Currently, there are no first year seminars in DATA, MATH, APMA, or CSCI. Our focus for this semester was to outline the course structure and potential modules. With feedback from Brown students and resources from data science professors at peer institutions, we finalized the focus of our DATA FYS: students will explore the numerous forms and roles of data science, investigate its historical context and ethical implications, and practice communicating data effectively. I drafted modules for the course, including readings, discussion questions, assignments, and active learning activities. Specifically, I created a scaffolded Colab notebook teaching the fundamentals of data analysis, and designed assignments for a potential capstone project, where students will apply data science to their fields of interest, analyzing existing datasets or creating their own, complete with ethical analysis and visualizations."

Student: Emily Sanchez, Faculty: Marissa Gray

"I have been working with Dr. Marissa Gray, a professor of Biomedical Engineering, to develop a series of instructional videos about using the pandas package in python. Dr. Gray teaches a year-long capstone class where clinicians come in with a problem and students spend the semester trying to understand the problem and find a sollution. Since most projects have a data science element, but many students don't have a data science or coding backgrounds, understanding data has been a challenge. The videos, as well as pre-populated Google Colab notebooks, will serve as a crash course for the students who have no python experience, and those who have experience but have not used python for data science yet."

Students: Kaitlyn Williams and Zach Sickles, Faculty: Liz Chen and Neil Sarkar

Kaitlyn Williams: "I have been working with Dr. Liz Chen and Dr. Neil Sarkar to develop course content for their Biomedical Informatics class about reading and analyzing health data. I am working to get students started with Oscar and GitHub to be able to do projects on their own computers. In addition, I am creating content on how to read and download the data and access what they are supposed to be looking at using some Python commands. I used HackMD to create guides for these assignments and created a virtual environment in Oscar to run assignments. Additionally, I am helping to transfer the BIDSS Manual into a more user friendly and writer friendly format on HackMD, inputting our guidebooks as some of the content for the site."

Zach Sickles: "Working with Kaitlyn, I have also been working with Dr. Liz Chen and Dr. Neil Sarkar to develop course content for their Biomedical Informatics class about reading and analyzing health data. I am working to migrate course labs from Goole Docs in Julia to Jupyter Notebooks in Python. In addition, I am creating brand new lab content such as how to use Jupyter Notebooks and on data exploration for new data sets in Python, Julia and R."

Student: Colby Porter, Faculty: Lulei Su

"I have been working with Prof. Lulei Su to integrate data visualizations and better survey writing into Chinese language courses. Prof. Su is hoping to better integrate surveys of university students in China into Brown courses, helping students from Brown learn about Chinese perspectives on certain issues. Building off of previous work on this, I created a guide to writing survey questions with easily-visualizable results, with step-by-step instructions for entering such questions into a Chinese survey platform. Additionally, I created a Google Colab template that uses Plotly to help users easily create a variety of interactive visualizations from the data exported by the survey platform. Finally, I am also helping to create a guide for statistical analysis of student achievement over time."

Student: Johnny Chen, Faculty: Kim Gallon

"I have been working with Dr. Kim Gallon and members of the Black Health Heritage Lab on multiple community outreach projects. With Terina Keller's guidance, I designed a Community Profile Flyer template that is easily editable to quickly create personalized flyers for organizations in Rhode Island that work with Black communities. We have since progressed to designing modular page templates for creating community guides for these organizations. I am also working with Asha Baker to design, develop, and release a browser game that aims to strengthen the data literacy skills of children at the Vincent Brown Recreation Center. The content of the game is informed by and seeks to supplement the skills listed in the Rhode Island Core Standards for Mathematics from the Rhode Island Department of Education. This game is being developed in the Godot game engine with hopes that the game will be playable in both mobile and desktop browsers."

Student: Moon Hwan Kim, Faculty: Pierre de Galbert

"Working alongside Professor Pierre de Galbert, our initiative focuses on integrating open-source tools and GitHub into the academic curriculum. This endeavor involves establishing a GitHub Classroom that integrates a variety of educational resources, including open-source datasets, comprehensive worksheets, detailed homework assignments, and classroom lesson plans. Additionally, we provide PowerPoint presentations that guide students through the setup and installation processes of Stata, Git, and GitHub.

The primary objective of this project is to modernize the classroom experience by leveraging tools that facilitate streamlined code grading and evaluation through GitHub Classroom. This approach not only enhances the efficiency of the educational process but also exposes students and educators to the practices of code and dataset sharing prevalent among scholars in similar fields through GitHub.

A significant outcome for students, particularly those in the education studies department who may not have prior exposure to such tools, is gaining hands-on experience with Git in a research context. This involves engaging in research study replication, which serves as a practical application of their learning. Furthermore, this initiative opens doors to the broader realm of code collaboration, offering students insights into collaborative practices and workflows that are increasingly vital in various academic and professional settings. By embracing these modern methodologies, we aim to enrich the learning experience and equip students with relevant, in-demand skills in today's digital landscape."

Student: Seong-Heon Jung, Faculty: Christelle Alvarez

"I have been working with Professor Christelle Alvarez on a two pronged project. One part of the project was to hold Data Science and AI workshops for her graduate seminar EGYT 2200 Monumentality and Texts in ancient Egypt. The students gained both hands-on experience with building a machine learning model and participated in discussion about the capabilities of AI and the responsibilities that follow with their use. The second, ongoing part of the project is to build an OCR (Optical Character Recognition) model for hieroglyphs. Currently, human transcription is the only way to identify what is written on, say, a wall full of hieroglyphs. We hope this tool will enable to transform three-dimensional, photogrammetrically captured images of inscriptions into accessible, searchable, and analyzable content. This could greatly accelerate the pace at which researchers and learners can parse through massive amounts of hieroglyph data, opening up new opportunities in the study of ancient texts."

Student: Felicity Hade, Faculty: Frank Donnelly

"I have been working with Frank Donnelly, head of GeoData@SciLi, to create a value-added dataset of crime offenses in Providence. The Providence Police Department publishes the last 180 days of crime incidents alongside general descriptions of the locations where these offenses occurred. My project aims to achieve the following primary goals: (1) Geocode the data such that the crimes can be visualized as points on a map, (2) categorize offenses by type, and (3) create a process to seamlessly add new data every 180 days. I have implemented this project using a combination of QGIS software and Python."

Student: Layla Lynch, Faculty: Rebecca Kartzinel

"I have been working with Professor Rebecca Kartzinel to create an RShiny interactive visualization tool that can be used to portray selective phylogenetic trees of the angiosperm phylum. The tool's main function is to allow users to select family classifications that they would like to visualize and generate an updated phylogenetic plot by "keeping the tips" of the selected families. The other main tool function allows users to select a broader classification such as a well recognized clade or class or order and plot the respective phylogeny."

2022 Data Fellows

Student: Asha Baker, Faculty: Linda Clark

"For my project, I worked with Professor Clark in order to host an intro to Data Science workshop for students of color on campus. The workshop was a collaboration between the DSI and BCSC with the aim of encouraging more students of color to become involved with Data Science (especially the Data Science Fluency Certificate) . The workshop used recent data from TWTP (the BCSC's orientation program for first years of color) and a Google Co-lab notebook to teach skills such as importing and cleaning data, exploring data through statistics and visualizations, and making meaning of said data. While the workshop was my primary deliverable, I am still working to gather and communicate insights to Dean Harris of the BCSC from the TWTP data in order to support future programming."

Student: Raymond Dai, Faculty: John Cayley

"I have been working with professor John Cayley to help develop a measure of visual similarity between CJKV characters to assist in an digital art project. Project work involves trying to determine similarity between characters based on metrics such as pixel-edit-distance, radical structure, and stroke-order edit distance. This will be used to create a database connecting common characters to their most similar visual neighbors that can be used for course material. It will also be deployed on an art project where random characters in a Mandarin Chinese passage mutate to different characters of visual similarities."

Student: Lily Ward-Diorio, Faculty: Andrew Creamer

"For my project, I worked with Andrew Creamer, the Science Data Specialist at Brown. Our goal is to make the data in Brown's Digital Repository more discoverable. First, I researched FAIR principles and the DataCite Schema for storing metadata. Next, we developed a list of descriptors and persistent identifiers (PIDS) that each dataset in the BDR should have. Then, I used OpenRefine to write a script to transform metadata in a spreadsheet format into a xml format using the DataCite Schema."

Student: Geordie Young, Faculty: Karianne Bergen

"For my project, I have been working with Professor Karianne Bergen, a professor of Data Science and Earth, Environmental & Planetary Sciences. The goal of our project is to create the syllabus for her course in the Spring of 2023 titled EEPS 1720: Tackling Climate Change with Machine Learning. My work has primarily been through the reading of academic journals on the topic, and working with Professor Bergen to select the readings that are the most suitable for the course. Additionally, we have been working to finalize the way in which it would be the most optimal for student's presentations on the readings should be structured throughout the semester."

Student: Filip Aleksic, Faculty: Marrisa Gray

"I have been working with Dr. Marrisa Gray to develop a course about wearable sensor data in the ENGN department. One aspect of the course will be building simple wearable sensors to collect physiological data, but our focus on this project was to integrate data science and machine learning in the course so that we can analyze the sensor data that comes from the sensors that students develop, along with the demographic data that they provide. To be more specific, we are interested in predicting whether a student is resting, walking, or running based on their heart rate and demographic data. I utilized Jupyter Notebooks to walk the students through a life cycle of data science and to create the assignments for them. This included data collection, data cleaning, data exploration, data visualization, and predictive modeling using various Python packages."

Student: Nuj Naguleswaran, Faculty: Stephanie Gaillard

"This past semester, I have been with professor Stephanie Gaillard to develop a specialized tool related to learning and understanding French curriculum standards. Specifically, we have been trying to create an interactive website that allows students to understand the different levels/skills that need to be learned in the ACTFL/CEFRL (American and European language standards). Students and faculty alike benefit from this tool as they can use it to understand and teach what mastery milestones exist and how to switch between both frameworks. The website uses a mix of Javascript (primarily for interactive components of the website), HTML/CSS, and a variety of specialized packages."

Student: Frankie Fan, Faculty: Frank Donnelly

"I worked with faculty member Frank Donnelly to create a workflow to extract, transform, and load data into a curated geodatabase that GIS@Scili will publish and update on an on-going basis. I was responsible to design and create a documented and functioning workflow from beginning to end that others can build on. The ultimate goal of the project is to have a publicly available, curated geospatial data that GIS@Scili will update on an on-going basis."

Student: Mia Mitchell, Faculty: Rebecca Kartzinel

"For my project, I have been working with Professor Rebecca Kartzinel to build a website that can be queried by the public, greenhouse staff, and students in her class (BIOL 0430) from an existing collections database from the IBES Greenhosue. Therefore, people may understand the phylogenic relationships of the species within the greenhouse. For the front-end of the website, I am using Vue Javascript, and for the backend, I am using Google Firebase. Finally, for the NoSQL data, I plant to use Google Firestore to house my database which will pull from other plant APIs across the web to fill in gaps about nomenclature and clades."

Student: Sophie Blumenstein, Faculty: Daniel Ibarra

"For my project, I am working with Professor Daniel Ibarra in the first-year seminar 'Historical Climatology and Global Climate Change.' I am assisting on both front-end and back-end aspects of the course by supporting students in finding and analyzing datasets for independent final research projects and by documenting my work to provide infrastructure for future course iterations."

Recent Data Fellows News

View All Data Fellows News
Congratulations to all of the graduating students earning their doctoral, master’s, and undergraduate degrees from DSI-related programs this year!
Read Article
Brown’s Data Science Fellows program pairs faculty members with Brown undergraduates who have data science skills and experience to bring to the project.
Read Article