Most of my work involves analyses of public genomic datasets for answering important questions in biology and medicine. Here's a brief summary of some of my published work and work-in-progress.

Prioritizing metagenomic gene products in disease

It is now well-established that the gut microbiome is disrupted in disease phenotypes such as Inflammatory Bowel Disease (IBD) including changes in both microbial community composition and functional pathways. However, there is still a gap in translating this knowledge to actionable interventions in preventive, diagnostic or wellness schemes. In this project, we describe a computational protocol for discovering putative bioactive microbial peptides directly from gut metagenomes which can be further interrogated with physiological assays.

Tumor evolution

I have developed computational methods for transforming whole genome tumor profiles into evolutionary trees or phylogenies. A tumor phylogeny or tree is a graphical representation of changes that occur when going from the root represented by an ancestral healthy or normal state to more advanced tumorigenic states. The specific changes themselves depend on the starting data and can be genomic aberrations including mutations, structural rearrangements and copy number changes or RNA-level information. In addition to providing insights on the flow of aberrant genome information in tumor progression, the tumor phylogeny highlights parallel cellular pathways that can lead to similar outward cancerous phenotypes, thus capturing tumor subtypes and their underlying mechanisms of action. For a quick primer, please see my book chapter. For building evolutionary trees from genomic data, we often want to start with genomic regions that serve as tumor progression markers. Often, these are regions with a copy number different from the diploid copy number, typical of healthy,normal tissue types. Before the development of single-cell technologies, it was often important to delineate heterogeneity in tumor tissues which consist of heterogeneous cell populations.


In Subramanian et al. (2012), I presented the application of a generalized tumor phylogeny pipeline to breast tumor array comparative genome hybridization (aCGH) copy number variation data. The resulting tree illustrated parallel pathways for the co-amplification of the gene ERBB2 consistent with literature for treatment sensitive and resistant patients. The pipeline involved developing statistical methods for inferring meaningful progression markers and for discretizing such information in a form amenable to downstream tree reconstruction algorithms.


To improve the accuracy of progression marker inference, in Subramanian et al. (2013), I developed a novel multi-sample Hidden Markov Model “HMM-CNA”. HMM-CNA is capable of inferring evolutionarily meaningful progression markers from cohorts of patient tumor profiles and these markers accurately reconstruct evolutionary histories in simulated data.


In Subramanian et al. (2015), we extended phylogeny inference to single-cell sequencing data by designing a novel yet simple heuristic for retrieving features from single cell DNA sequences for phylogeny building.