This group is made up of about 20 members from a wide range of disciplines, both faculty and staff data scientists. They advise DSI’s leadership on the data science needs of researchers and trainees in all disciplines. After three years, our inaugural co-chairs are stepping down; many thanks to Meenakshi Narain (Professor of Physics) and Thomas Serre (Professor of CLPS and Brain Science) for all they have done to help shape DSI for the past three years, and are looking forward to continuing to work with them!
We’re excited to have Margot and Stephen on board and to be expanding our advisory board’s reach on campus. We asked our new co-chairs to answer some questions about their work:
1. What types of data do you typically work with in your research? How has the scale of data changed over the course of your career?
Margot: My research relies primarily on data from large household surveys (e.g., the U.S. Census or the Consumer Expenditure Survey), as well as administrative data at the state level (e.g., data from the National Vital Statistics System, or the State-by-State Spending on Kids Dataset that I recently assembled with colleagues at the Urban Institute). In my field, there has also been an increasing availability and reliance on administrative data sources, such as data from state agencies. Partly this is due to an increasing emphasis on causal inference, but it is also due to the desire for very large sample sizes and the tremendous cost of fielding a large household survey. Administrative data sources have the benefit of large sample sizes and high generalizability, but the downside of limited information. Increasingly, social scientists are also relying on less generalizable but potentially rich sources of information about human behavior from online sources (e.g., Twitter, Facebook, parenting blogs) using computational tools. The changing scale and sources of data raise thorny and interesting questions for this demographer!
Stephen: Our group does work that covers data from text to images and videos to radio astronomy. We’re also interested in structured and semi-structured data like knowledge graphs. Since starting over a decade ago, certainly the scale of available data has increased dramatically. But in addition, the rise of methods that are data-hungry like deep learning have shown us that we often don’t have enough of the data that we need. The data that is available is often messy, noisy, and unlabeled. So the need for high-quality data, like training data for machine learning, is a problem that’s only getting more important. It’s not enough to collect big data. We also often need to shape the data to be useful.
2. What methods do you use when working with data?
Margot: Much of my research relies on large, longitudinal data sources, and so the methods I use are those that are suitable for these types of data. I use a variety of methods for analyzing social data—survey data analysis that is more descriptive in nature, as well as quasi-experimental methods to estimate causal effects. Some of my research also relies on traditional demographic techniques such as decomposition, standardization and microsimulation.
Stephen: Our group focuses on a couple of areas for teaching computers when labeled training data is scarce. The first is programmatic weak supervision, where users write heuristic rules that vote on the labels for unlabeled data. These rules can disagree and abstain, so we create unsupervised probabilistic models that can learn to resolve their disagreements and create high quality training data. The second is zero-shot learning, which is a type of machine learning where we train models to solve tasks just from descriptions of those tasks. When successful, this is really useful because then we provide the model with descriptions of new tasks that it has never seen before. The model can then potentially solve them without the need for labeled training data.
3. What social challenges motivate your work?
Margot: Most of my current research is motivated by the recognition that both families and governments provide vital support for child health and development, especially for the 50% of American children who live in families with low incomes. Public sector investments in families such as those all over the news during the pandemic (e.g., the Child Tax Credit, SNAP, the Earned Income Tax Credit) have the potential to narrow striking gaps in children’s opportunities by both increasing families’ resources and directly affecting children. My research examines the interaction between governments and families in affecting children’s opportunities and development. Some of my newest research is also focused on understanding the substantial challenges involved in helping families obtain the public benefits for which they are eligible, as well to identify promising strategies for reducing these challenges and increasing participation in public safety net programs that are vital to the health and development of low-income children, children of color and children in immigrant families. In some of this research, I am partnering with Code for America, a data-driven organization committed to breaking down barriers between people and government.
Stephen: A lot of the applications we work on fall under the category of information extraction, meaning we have unstructured data and we want to pull out key information in a structured form that can be queried and analyzed. This problem arises in lots of domains where researchers, regulators, and reformers need access to better quality data, from healthcare to studying the criminal justice system.
4. What are your hopes for data science at Brown?
Margot: I’m excited by the potential for growing connections between DSI and the social sciences at Brown, as a supplement to the traditional disciplinary cores of data science. Using data to inform and improve social policies and social problems is at the heart of much research in sociology, economics and other fields, and it would be wonderful to continue to expand research and training opportunities for faculty and students on campus working in these areas. More broadly, it’s exciting to think about increasing connections between data-driven social scientists and the broader network of data scientists at Brown.
Stephen: I hope that data science will continue to grow as a bridge across disciplines. Brown is a particularly good place for forming transdisciplinary teams. We have the opportunity to connect the rapid advances in data science methods with the applications that can most benefit from them.
5. What’s your favorite thing about life in Rhode Island?
Margot: Since it’s summer right now, Del’s Lemonade (and Mr. Lemon!) and the beach! More seriously, though, I like that Providence offers lots of urban amenities without as many of the downsides of bigger cities, like terrible traffic. The proximity of beaches, mountains and cities within our broader region makes this a great place to live and work!
Stephen: Providence is by far the most walkable place I’ve ever lived. I love being able to get essentially wherever I need to go on foot through pretty neighborhoods.