I am interested in solving the statistical and algorithmic problems in computational biology. The beauty of this exciting area lies on the fact that it has a direct impact in the real world and statistics and algorithms really matter in data mining here. I have worked on problems of sequencing DNA by hybridization using gapped probes, calculating statistics for sequencing proteins by tandem mass spectrometry, exploring statistical properties of assemblies for whole genome shotgun methods, estimating repeat structures of a genome from random reads in assembly, finding transcription factor binding sites by comparative genomics, analyzing microarray expression data, etc. Currently, I am mainly working on the following problems: (1) develop algorithms to find TFBS by comparative genomics, especially in mammalian. (2) build better model to explain the evolution of non-coding DNA sequences. (3) explore better ways of integrating heterogeneous data to predict gene function, predict protein-protein interactions, build gene regulatory network, and so on.

