We are a computational genomics lab based in the Department of Biomedical Informatics and Data Science (BIDS) at Yale University, School of Medicine. Our research is dedicated to creating highly efficient computational methodologies for genomic applications, including genome assembly, read alignment, variant calling, and string indexing. We have developed a series of de novo genome assembly algorithms (e.g. hifiasm) that have been extensively utilized across a variety of large-scale sequencing projects, such as the Human Pangenome Reference Consortium, the Vertebrate Genomes Project, and the Darwin Tree of Life project. Within these projects, we also work closely with collaborators to explore the applications of genome assemblies. Our lab is always open to new collaboration opportunities with both basic science and clinical research groups.


  • We are currently seeking postdocs, students, and staff to join our team. For more information, please view our available vacancies.

Research interests

Complex genome reconstruction with de novo assembly

De novo assembly, especially de novo haplotype-resolved assembly, has been a central problem and remains one of the most challenging tasks in bioinformatics for four decades. It involves multiple advanced algorithms such as sketching, alignment and many branches in graph theory, and demands programming skills of the highest level. We have developed a series of de novo assembly algorithms, including hifiasm, hifiasm (Hi-C) and hifiasm (UL), which are designed to produce optimal genome assemblies by combining different data types. These algorithms have been widely used and have already become the dominant long-read genome assemblers. Currently, we are particularly interested in developing de novo assembly algorithms for complex genomes with polyploid alterations such as cancer genomes and polyploid plant genomes.


Comprehensive variant calling and interpretation

For the human genome, variant calling is typically performed through read alignment, which aligns fragmented reads back to the human reference genome. However, the generic reference genome often lacks specific personal information, leading to potential inaccuracies and biases, especially within highly repetitive and structurally different regions. Consequently, there is a rapidly growing demand for de novo genome assembly—a methodology that reconstructs the genome without relying on a reference. Leveraging our computational expertise, we aim to develop innovative variant calling and interpretation methods that are based on de novo genome assembly.


Selected publications

  • Cheng H, Asri M, Lucas J, Koren S, Li H#. “Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.” Nat Methods (2024).

  • Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H#. “Haplotype-resolved assembly of diploid genomes without parental data.” Nat Biotechnol (2022).

  • Cheng H, Concepcion GT, Feng X, Zhang H, Li H#. “Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.” Nat Methods (2021).

  • Cheng H, Wu M, Xu Y#. “FMtree: a fast locating algorithm of FM-indexes for genomic data.” Bioinformatics (2018).