Data Science Institute

Rapid Analysis and Visualization of Output from Topic Models

Research Project

A series of methods in genomics use multilocus genotype data to assign individuals membership in latent clusters that often correspond to geographic regions or methods of subsistence. These methods belong to a broad class of topic models, such as latent Dirichlet allocation used to analyze text corpora.

Chart An inference from topic models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from topic models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. We developed and are extending Pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with an interactive visualization. Pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared to competing methods.

Research Lead

Sohini RamachandranDirector of the Center for Computational Molecular Biology; Associate Professor of Biology; Associate Professor of Computer Science at Brown University

 

Funding Sources

NSF (CAREER to SR)