General research interests:

As a strong proponent of systems theory, my consistent research interest is in extracting valuable, holistic, and general knowledge and insights from large biological data sets.
  • Genotype-phenotype connections From a genetics perspective, one of my main goals is to uncover the hidden genotype-phenotype links. Given that current assumptions are still quite rudimentary, biologically inspired new models (heterogeneity, regulatory networks, etc.) are very likely to be used to unveil new discoveries within vast existing datasets.

  • RNA & Protein kinetics Transcription initiates the process of translating genetic information into function. Despite the extensive coverage in molecular biology textbooks, our understanding of this process remains far from complete. My focus is on fundamental transcription processes, particularly those related to gene structure variations (such as alternative splicing, gene fusion, etc.), which are of great interest to me. Furthermore, I would like to study various post-transcriptional and post-translational modifications that profoundly affect functional outcome. These modifications tend to operate independently of genetic information and exhibit adaptive responses to environmental changes, ranging from the cellular to the individual level.

  • Systematic approaches in genetic enhancement We now have a remarkable opportunity to reconfigure the genetic code to drive breeding improvement, whether by enhancing existing phenotypes or by engineering entirely new traits. Integrating the knowledge gleaned from the above areas with additional systematic genomic strategies can greatly accelerate this process. This includes refining the predictability of gene editing, aiding in the optimization of gene editing tools, and using genome selection for precision breeding predictions.

  • The game and business between plants and microbes My belief in the role of microbes was strengthened by the significant improvement in my diarrhea problems after taking supplements containing intestinal flora (without any conflict of interest). My curiosity turns to understanding the importance of microbes in the context of plants: how they fight, trade, or take what they want in a temporary equilibrium with each other. Microorganisms are emerging as a key biotic factor that contributes to the understanding of GxE.

Current Projects:

  • Contributions of genetic heterogeneity to complex traits Genome-wide association studies (GWAS) are now a routine tool for complex traits — and it is widely recognized that only a tiny fraction of the genetic variation can be explained by mappable loci. Although the simplest explanation for this remains polygenicity, genetic heterogeneity may also contribute. I am working to develop a new pipeline to deal with the genetic heterogeneity in plants.

  • Discovering DNA methylation directly from pacbio long reads Whole-genome bisulfite sequencing (WGBS) is considered the "golden standard" for DNA methylation identification, but typically <80% of BS-seq reads (100~150bp) can be unambiguously mapped, leaving many regions unexplored. Particularly those un- & multi-mappable sequences (e.g. repeated elements) are of high interest for epigenetics. Long-read sequencing (LRS) technology provides nearly complete genome coverage and, fortunately, the modification information. In this project, I would like to improve the current approach of reading epigenetic modifications directly from the raw sequencing data. Through constructing the PAN-methylome in Arabidopsis, we would next discover the links between complex genetic variation and DNA methylation.

  • The 1001 Phenomes Project The availability of the 1001 genomes has been critical for the Arabidopsis community in performing GWAS. These GWAS have been conducted on a variety of phenotypes. Except for a few, these phenotypes have always been measured on subsets of at most a few hundred accessions. In this project, we generated high quality phenotypes for 1000 accessions from the 1001 collection. The large number of accessions and the finely controlled environment should make these data very powerful for finding genetic associations, and highlighting the dynamics of genetic architecture.

  • The 1001g+ Project With the massive development of long-read sequencing technologies, we are approaching the time when every individual will have her/his/its own complete genome. Using Arabidopsis as a model, I'm exploring how many mistakes have been made and how hidden/new biology can be obtained with (nearly) complete genomes at the population/species level.