Indiana University
IUSM IU
IU School of Medicine
Yunlong Liu Lab
back BACK
Research - Yunlong Liu Lab

Research

The Liu Laboratory (Laboratory for Computational Genomics) uses systems biology approaches to understand regulatory mechanisms of gene expression, including transcriptional regulation, post-transcriptional regulation, and epigenetic regulation.  This area involves several interdisciplinary components, including functional genomics, genetics, computational and statistical modeling, computer science/engineering, and data management.  


Transcriptional regulation

Identifying binding sites of functional transcription factors is an important aspect of understanding the regulatory mechanisms that control cellular responses to certain biological perturbation. This is because these binding sites are a key part of control signals, and their dysfunction can contribute to the occurrence and progression of many diseases. Most current strategies aim at searching for common sequence motifs in the promoter regions of co-regulated genes, based upon the assumption that the binding site of one or a few transcription factor(s) reside in the regulatory region of the gene. During the past several years, we have developed a model-based approach, MotifModeler [1], that identifies functional binding sites from array-derived gene expression data. This approach has several unique features. First, it incorporates combinatorial effects of binding sites of different transcription factors. Second, in addition to identification of binding sites, MotifModeler estimates functional effects (stimulatory or inhibitory roles on gene expression) of predicted motifs under the conditions compared. Third, the prediction is based on mRNA expression levels under contrasting conditions, e.g., presence or absence of drug, so that it finds sequence motifs relevant to particular biological effects rather than just finding consensus motifs across multiple species. We have applied this approach to several biological systems, such as interferon stimulation on peripheral blood monocytes [1], anabolic responses in bone cells through mechanical loading and administration of bone morphogenic proteins (BMPs) [2], regulatory mechanisms responsible for fetal alcohol syndrome [3], and androgen dependency in prostate cancer [4]. Through these applications, MotifModeler demonstrated significant advantages in identifying transcriptional mechanisms in complex biological systems.


Alternative splicing

Alternative splicing is a major source of proteome diversity; deregulation of the splicing process is associated with a variety of disease states and conditions. In collaboration with Dr. Jeremy Sanford’s group at University of California at Santa Cruz, we have recently completed a transcriptome-wide mapping of the direct mRNA targets of Splicing Factor 2 (SF2) protein, a nucleo-cytoplasmic shuttling RNA binding protein that is required for regulating and selecting splicing sites in eukaryotic mRNA. This is one of the first mappings of the binding patterns of an RNA-binding protein using next generation sequencing technology; this work is published in Genome Research [5]. In addition, we also initiated an effort to identify cis-acting RNA elements from exon array-derived alternative splicing data. The preliminary data has been presented at the 8th IEEE International Conference on BioInformatics and BioEngineering.

We recently initiated an effort to elucidating alcohol-induced alternative splicing in liver (funded by the NIAAA). In this project, we hypothesize that one of the ways that alcohol damages the liver and leads to liver diseases is by altering the amount and types of proteins that are produced in the liver. We propose to study one of the ways that the production of liver proteins is regulated- alternative splicing. Alternative splicing can result in the formation of different forms of proteins and can alter the amount of a given protein, and in some circumstances has been shown to be associated with disease. We will investigate the effects of alcohol exposure on the process of alternative splicing and protein production in liver. This has potential implications in diagnosis and understanding of alcoholic liver disease.  In this study, we will test the hypothesis that distinct combinations of cis-acting elements dictate alcohol-induced patterns of alterative splicing. Our three specific aims are: (1) Identify alcohol-induced alternative pre-mRNA splicing in rat hepatoma cells and rat liver; (2) Computational discovery of alcohol-induced cis-acting regulatory elements; and (3) Test the generalization of the model to human cells.


microRNA regulation

We are also interested in investigating the functions of microRNA, which plays critical roles in multiple biological processes, including cell cycle control, cell growth and differentiation, apoptosis, and embryo development. Our endeavors on the microRNA regulation include three major directions. First, we developed a methodology that enables identifying functional microRNAs that are responsible to the global gene expression changes under certain biological conditions [3,4]. Using array-derived gene expression data, this algorithm is very useful in raising testable hypotheses for putative microRNA functions. Second, we are currently working on using ChIP-seq-derived genome-wide binding patterns of RNA Polymerase II, to identify microRNA regulatory regions as part of an effort to identify regulatory circuits that include microRNA. Third, we recently launched a web-based database system, miR2Disease, that documents known relationships between microRNA deregulation and human disease, and therefore produced the first microRNA-disease network [6].


Epigenetic regulation

As part of NIH roadmap, epigenetics is an emerging frontier of science that involves the study of changes in the regulation of gene activity and expression that are not dependent on gene sequence. We are interested in understanding the regulatory roles of two important epigenetic markers, DNA methylation and histone modification. Despite recent genome-wide mapping of epigenetic markers in several tissues, there is a significant gap between knowing "where they bind" and understanding "how these binding patterns affect gene expression". We are interested in using systems biology approaches to construct an integrated model to understand this question. We discovered that by integrating transcription factor binding data with DNA methylation data, gene expression profiles can be much better interpreted.


Establishing bioinformatics infrastructure for next generation sequencing

As an emerging technology, next generation sequencing will play an increasing role in genomic/genetic research. It generates enormous amounts of data; each run produces >1 terabyte of sequence data. Consequently, for these types of studies, the data management and bioinformatic analysis of the results can be a major bottleneck. In order to face this challenge,

my laboratory has developed the MyTrack system (http://watson.compbio.iupui.edu/mytrack), a web-based data portal that allows users to manage, maintain, visualize, and share functional genomic data produced through various high throughput technologies, including next generation sequencing technology, array-related technology, and SNP genotyping technology. Using a local version of UCSC Genome Browser, this system is designed to facilitate the increasing need for data and project management associated with high throughput sequencing technology. The successful development of such system is critical to convince reviewers that our institution has the capability to manage and share the large amount of information that will be produced using this technology, and therefore has great impact in future grant applications for the IU School of Medicine.

We have extensive experience in analysis of next generation sequencing data. We have been involved in the data analysis for several projects in collaboration with investigators at IUSM, the Ohio State University, the University of California at Santa Cruz, and Harbin Institute of Technology. These projects involve different kinds of biological applications, including transcriptome analysis (RNA-seq, with Drs. Clare [7], Sanford, and Nephew), protein-DNA (ChIP-seq, with Drs. Huang [8,9], Nephew, and Dong) and protein-RNA binding analysis (CLIP-seq, with Dr. Sanford and Dr. Edenberg [5,10], epigenetic regulation (ChIP-seq on histone marks, with Dr. Huang, and methyl-seq for epigenetic studies, with Drs. Huang and Lin).

My group recently won the 2009 CAMDA (Critical Assessment of Massive Data Analysis) Best Presentation Award; CAMDA is an annual conference, initially designed to evaluate bioinformatics methods for analyzing microarray data, which last year shifted to methods for next generation sequencing data (http://www.camda2009.org). In this contest, we designed a statistical approach that uses ChIP-seq-derived RNA polymerase II binding data to identify promoter regions and transcription start sites of primary microRNAs, and therefore enable the investigation of regulatory elements that control microRNA transcription and help explain the microRNA-mediated regulatory network. This work was also featured in the Genomeweb news.


References <Complete publication list>:

1. Liu Y, Taylor MW, Edenberg HJ (2006) Model-based identification of cis-acting elements from microarray data. Genomics 88: 452-461.

2. Chen AB, Hamamura K, Wang G, Xing W, Mohan S, et al. (2007) Model-based comparative prediction of transcription-factor binding motifs in anabolic responses in bone. Genomics Proteomics Bioinformatics 5: 158-165.

3. Wang G, Wang X, Wang Y, Yang JY, Li L, et al. (2008) Identification of transcription factor and microRNA binding sites in responsible to fetal alcohol syndrome. BMC Genomics 9 S1: S19.

4. Wang G, Wang Y, Feng W, Wang X, Yang JY, et al. (2008) Transcription factor and microRNA regulation in androgen-dependent and -independent prostate cancer cells. BMC Genomics 9 S2: S22.

5. Sanford JR, Wang X, Mort M, Vanduyn N, Cooper DN, et al. (2009) Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res 19: 381-394.

6. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, et al. (2009) miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res 37: D98-104.

7. Radovich M, Clare SE, Pardo I, Hancock BA, Sledge GW, et al. (2009) Next-Generation Whole Transcriptome Sequencing of Triple-Negative Breast Tumors and Normal Tissues. San Antonio Breast Cancer Symposium. San Antonio, TX.

8. Feng W, Liu Y, Wu J, Nephew KP, Huang TH, et al. (2008) A Poisson mixture model to identify changes in RNA polymerase II binding quantity using high-throughput sequencing technology. BMC Genomics 9 S2: S23.

9. Camerlengo T, Ozer HG, Teng MP, F., Yang P, Li L, et al. Enabling Data Analysis on High-throughput Data in Large Data Depository Using Web-based Analysis Platform – A Case Study on Integrating QUEST with GenePattern in Epigenetics Research; 2009; Washington DC.

10. Wang X, Wang G, Shen C, Li L, Mooney SD, et al. (2008) Using RNase sequence specificity to refine the identification of RNA-protein binding regions. BMC Genomics 9 S1: S17.