What is Pan Genome?

During the long evolution of species, each individual has developed extremely specific genetic traits as a result of natural and anthropogenic selection, among other factors. In the classical view of intraspecific genetic variation, the genome of each individual is described as a small set of variants on a common reference genome. Common in application is the analysis of variation-trait associations based on SNPs in population genetic studies for QTL, GWAS, etc.

 

In recent years, comparative analyses of genomes or genomic fragments of multiple individuals of the same species have shown that a single reference genome is not sufficient to capture the genetic diversity of a species: only 50%-80% of resequenced data from different ecotypes can generally be compared to a reference genome.

 

These findings suggest that genomes within a species may differ in more significant ways (including the diversity of structural variants), which may contain one or more genes. And a large number of studies have shown that SV plays a key role in important agronomic traits (e.g., resistance to biotic and abiotic stresses, flowering time, plant architecture, yield, seed or fruit quality. These results imply that the functional gene content of a species is more variable than previously thought.

 

Thus, for a species, much meaningful genetic information may be lost if only a single reference genome is used for the study of genetic domestication variation. All of these factors together have driven the construction and study of plant and animal pangenomes.

 

Differences Between Plant and Animal Pangenomes

The target of pangenome studies may be different in different species: bacterial genomes are mainly composed of coding genes and relatively few non-genetic sequences, so bacterial pangenome studies are more focused on protein-coding gene content. In plants and animals, on the other hand, because of the large number of non-genetic sequences with certain functions on the genome, both sequence and gene forms are mainly studied in eukaryotes. Unlike plant pangenomes, the published animal pangenomes are more sequence-based.

 

Quantitatively, plants have a much larger pan-genome. Studies of multiple human pangenomes have found 5 Mb of additional sequence in Asian and African human genomes relative to human reference genomes based primarily on samples of European origin. This human whole genome study showed that the human whole genome has 10% more than the reference genome. Other animal pangenomes include pigs (72.5 Mb of extra sequence), and mice (14-75 Mb of non-reference sequence in each of 16 mice), which are all within three times the size of the human pangenome.

 

Why is the plant pangenome much larger than the mammalian pangenome?

The answer to this question can be found by considering the mutational and population genetic processes that generate species diversity: as with single-nucleotide variation, structural variation initially appears as mutations; neutral variants are subject to genetic drift, while other variants are either fixed (positive selection) or lost (negative selection). Thus, the key parameters for analyzing the pan-genome are the mutation rate of structural variants and the effective population size to control genetic drift, as well as the relative proportion of neutral SVs. Compared to animals, plants have more inbred lines, possess more agronomic traits, can reproduce in more ways, and therefore have some numerical advantages that lead to the expectation of having larger pan-genomes.

 

At the mutational level, the large number of recently amplified transposons (TEs) can generate sequence duplications and deletions, providing ample substrate for non-equivalent homologous recombination. Active TEs can also mobilize adjacent sequences and produce structural changes. Hybridization with other taxa can likewise add new taxa; from the perspective of natural genetic variation, the effect of such gene migration is similar to that of mutation. In addition, because flowering plant species have a history of ancient duplication, the remaining effective redundancy may allow more SV (especially deletions) to be neutral. A second key parameter is effective population size, as larger populations produce more mutations and can accommodate more persistent variation due to drift. The larger effective population size in plants compared to mammals explains the more than 10-fold increase in persistent single nucleotide variation.

 

The Application of Animal Pangenome

For some animals related to agricultural livestock breeding, there are often very many strains/subspecies/variants at the same intraspecific level due to artificial intervention or natural environmental selection. The great differences in phenotypes and genotypes between these populations before and after domestication may be hidden in the genome of each strain, so it is particularly important to have a pan-genome that reflects the commonality within these animal strains and the differences between breeds.

 

Usually, in addition to pan-genome construction, core gene, dispensable gene and private gene are performed to study species-related agronomic traits and functional genes. Structural variation is the main focus of pangenome research, and structural variation information such as PAV (presence and absence variation), inversions, translocations and copy number variation is used to find the key variation loci that cause different traits among strains. In addition, the pan-genome can be combined with 3D epistatic regulation, population genetic evolutiongenome-wide association analysistranscriptional co-expression, differential metabolites and database construction and storage for in-depth data mining.

 

References:

  1. Golicz, Agnieszka A., et al. "Pangenomics comes of age: from bacteria to plant and animal applications." Trends in Genetics 36.2 (2020): 132-145.
  2. Gong, Ying, et al. "A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals?." Journal of Animal Science and Biotechnology 14.1 (2023): 1-19.