Genome-wide association study and genomic selection to assist forest breeding of eucalyptus
Forest tree breeding is a domestication method that maintains biodiversity, mitigates climate changes of the targeted species and provides a wide range of commercial plant products to fulfill human’s needs. However, conventional tree domestication is a very time/labor consuming process that only considers phenotypes in the breeding. With the recent advances in next-generation sequencing (NGS), genome-wide association study (GWAS) and genomic selection (GS) has the potential to be applied to accelerate forest tree breeding. The tree species of interest is Eucalyptus species (E.grandis and E.urophylla). The species is one of the most planted commercial trees in the world, due to its fast growth rate, wide climate adaptability and preferable wood and fiber properties in the pulp and paper industry. In this study, we would like to accelerate the breeding cycle of Eucalyptus by performing GWAS and GS methods based on the single nucleotide polymorphisms (SNPs) data collected from 1118 Eucalyptus trees that genotyped with EucHIP60K SNP-chip. To enable the GWAS and GS method evaluation, sufficient computational capacity from HPC with parallel computation is required to perform the more complex multivariate analysis for large size of genotypic data. The GWAS and GS computational models will be used to identify the relationship between genetic data and complex phenotypic traits. Moreover, GWAS models detect genomic variations by scanning the whole Eucalyptus genome including both common and rare alleles, which is suitable for large populations to detect their effect and to yield polymorphisms values. Whereas, GS model develops a model by utilizing phenotypic and genotypic data collected from a training population. These models are then used to predict the genomic breeding value of progeny in future generations. This study is a continuation of our previous project (SNIC 2015/7-50). Previously, we have evaluated the impact of the size and genetic composition of the training set, the density and distribution of the SNPs that on the fit on of different models, we try to find out the best model and suitable parameters that can get the highest prediction accuracy for each trait. (Manuscript of “Evaluating the accuracy of genomic prediction of growth and wood traits in two Eucalyptus species and their F1 hybrids” submitted in to BMC Plant Biology.