« Home « Kết quả tìm kiếm

Bias in estimates of variance components in populations undergoing genomic selection: A simulation study


Tóm tắt Xem thử

- The objectives of this study were to examine the effects of GS on estimates of VC in the analysis of different sets of phenotypes and to investigate VC estimation using different methods.
- (2) Pheno 1 + 2 : phenotypes from both the conventional phase and GS phase (1 – 35 years).
- Results: In general, both the ssGBLUP and ssBR models with all the phenotypic and genotypic information (Pheno 1 + 2 ) yielded biased estimates of additive genetic variance compared to the P-base model.
- When the phenotypes from the conventional breeding phase were excluded (Pheno 2.
- P-AM led to underestimation of the genetic variance of P-base..
- Compared to the VCs of G-base, when phenotypes from the conventional breeding phase (Pheno 2 ) were ignored, the ssBR model yielded unbiased estimates of the total genetic variance and marker-based genetic variance, whereas the residual variance was overestimated..
- Conclusions: The results show that neither of the single-step models (ssGBLUP and ssBR) can precisely estimate the VCs for populations undergoing GS.
- Overall, the best solution for obtaining unbiased estimates of VCs is to use P-AM with phenotypes from the conventional phase or phenotypes from both the conventional and GS phases..
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- 2 Nordic Cattle Genetic Evaluation, DK-8200 Aarhus, Denmark Full list of author information is available at the end of the article.
- To date, although there are currently several genomic prediction models available, the choice of stat- istical model and which data should be used to estimate VCs in the genomics era remain unclear..
- genotyped individuals are implicitly imputed, ssBR re- quires the explicit imputation of the markers for non- genotyped individuals, followed by fitting of the marker ef- fects in the model.
- and a lack of information from the previous conven- tional breeding scheme (Pheno 2.
- Table 2 presents the means and standard deviations (SDs) of the estimated VCs and heritabilities over replicates.
- In general, the use of single-step methods with all the phenotypes and ge- notypes (Pheno 1 + 2 ) yielded biased estimates of the total genetic variance of P-base.
- P-AM led to the underestimation of genetic variance (P <.
- As expected, we obtained unbiased VC estimates and heritability when using P- AM with phenotypes only from the conventional phase (Pheno 1.
- In contrast to the VCs from P-base, when using phenotypes only from the conventional phase (Pheno 1.
- the genetic variance was significantly underes- timated, and the residual variance was significantly over- estimated.
- Furthermore, when including phenotypes from the GS phase in the model (Pheno 1 + 2.
- the genetic variance was significantly underestimated, although un- biased estimated residual variance was observed.
- In con- trast to the VCs from G-base, when ignoring data from the conventional breeding phase (Pheno 2.
- For ssBR, the convergence of the Gibbs sampler was assessed by estimating Monte Carlo error (MCE) (via batch means).
- The MCEs for the estimates of the total genetic variance and residual variance were at the level of 10 − 3 , and for the estimates of marker variances, they were at the level of 10 − 6 .
- In contrast to VCs from P- base, when using phenotypes only from the conventional phase (Pheno 1.
- unbiased total genetic variance and re- sidual variance were obtained.
- Conversely, when including phe- notypes from the GS phase in the model (Pheno 1 + 2.
- the marker-based genetic variance was unbiased, but the total genetic variance was significantly underestimated (P <.
- In contrast to the VCs from G-base, when ignoring data from the previous conventional breeding phase (Pheno 2.
- ssBR yielded unbiased estimates of the total genetic variance and marker-based genetic vari- ance, whereas the residual variance was overestimated..
- The first question addressed in this study was aimed at determining the impact of the choice of phenotypes from different phases of a breeding program on the esti- mation of VCs.
- Selection of data to be included in the estimation of genetic variance.
- We showed that P-AM yielded unbiased estimates of VC when including the phenotypes from the GS phase.
- We also demonstrated that when using phe- notypes only from the conventional selection phase (Pheno 1.
- This was mainly caused by ignoring the in- formation from the selection decisions.
- In the present study, based on the scenario of Pheno 2 with P-AM, our results con- firmed the reduction of genetic variance due to ignoring the information from the conventional breeding phase;.
- i.e., there are no phenotypes to account for the selection conducted in the previous period, and previous selection cannot be properly handled in the current model.
- There- fore, the use of P-AM including only the phenotypes from the GS phase (Pheno 2 ) resulted in biased estimates of VC..
- It can be expected that using such a base population (more recent) would lead to smaller es- timates of genetic variance than using P-base.
- Conse- quently, it may be improper to compare VCs estimated in the GS phase (e.g., Pheno2) with the VCs in P-base, which is generally referred to by the pedigree.
- Our results confirmed that when phe- notypes only from the GS phase (Pheno 2 ) were used, both ssGBLUP and ssBR yielded smaller genetic variance estimates compared with the VCs in P-base (Table 2)..
- In the present simulation study, we directly used the allele frequencies calculated from P-base to avoid the compatibility issue between the G and A 22 matrices in ssGBLUP .
- [34] by regres- sing the gene contents of ancestors on the genotypes of the progenies..
- Two estimates of genetic variance in ssBR.
- [21, 22], the ssBR model is essentially a marker effect model with all markers fitted in the model.
- Consequently, this feature re- sults in a model with two estimates of additive genetic var- iances, i.e., the total genetic variance ( σ 2 ε ) and the marker- based genetic variance, which can be obtained by multi- plying P m.
- When using phenotypes only from the conventional phase (Pheno 1.
- the estimated total genetic variance was unbiased.
- however, the marker-based genetic variance was biased upwards, although the allele frequencies from P- base were used.
- This result can be explained by the fact that a small proportion (1.2%) of animals (only progeny- tested bulls) in the pedigree were genotyped, resulting in poor imputation for non-genotyped animals and biased estimation of marker variance..
- Apart from the estimated marker variance, the total genetic variance shows a relationship with the condi- tional variance of the breeding values of non-genotyped individuals (g 1 ) given the breeding values of genotyped individuals (g 2.
- A residual vector (ε) ac- counting for the remaining portion of the breeding values that could not be modelled by the imputed markers was added to the marker-based breeding values to obtain the final g 1 .
- In addition, as pointed out by [21], in the single-step method, we do not observe g 2 , but M 2 .
- this indicates that the conditioning is on the ob- served marker information, and the conditional genetic variance estimated in ssBR is therefore actually only an approximation of the genetic variance..
- This study contributes to a better understanding of the effects of GS on VC estimation.
- The results show that neither of the single-step models (ssGBLUP and ssBR) can precisely estimate the VCs for populations undergo- ing GS.
- Furthermore, this study has demonstrated that when the complete data are analysed with both pre-GS data and data from the GS phase, the classic P-AM can yield unbiased estimates of VC.
- Therefore, an implica- tion of these findings is that the best solution for obtain- ing unbiased estimates of VC is to use P-AM with phenotypes from the conventional phase or phenotypes from both conventional and GS phases..
- Populations that were similar to the Danish Jersey dairy cattle population in terms of the breeding scheme and population structure were simulated over a 35-year period with 5 replicates for each scenario.
- A recurrent mutation rate of for both markers and QTLs was set to establish mutation-drift equilibrium in the historical generations..
- The number of recombination per chromosome (per Morgan) was sampled from a Poisson distribution with a mean equal to the length of the chromosome, and cross- overs were uniformly located along the chromosome..
- This part of the simulation was implemented with QMSim software [36].
- Generation 3000 was used as the base population, in which 40,000 SNPs were randomly chosen from the pool of 300,000 markers, and 2000 QTLs were randomly chosen from the pool of 3000 QTLs.
- (2) In the next phase, 20 years of conventional.
- Only cows in the first lactation, how- ever, were assumed to have phenotypes.
- (3) In the last phase, 15 years of genomic selection were simulated..
- (2) Pheno 1 + 2 : phenotypes from both the conventional phase and genomic selection phase (1–35 years) were used.
- An overview of the subsets used and the average number of individuals in the pedigree, phenotypes, and genotypes for each scenario over 5 replicates are shown in Table 1.
- The genetic variance of P-base was calculated from the variance of “true” breeding values (TBVs) based on animals from the founder popula- tion, while the genetic variance of G-base was calculated from the variance of TBVs based on animals from years 18, 19, and 20, i.e., the last three years before the start of GS (an- imals in G-base were related).
- A σ 2 g Þ , where A is the numerator relationship matrix, and σ 2 g is the additive genetic variance.
- The model equation of the regular ssGBLUP model [16, 17] was the same as model (1) but used an H matrix that combines the marker-based (G) and pedigree-based (A) relationship matrices to replace the numerator relation- ship matrix (A) in the classical animal model.
- where A −1 22 is the inverse of the pedigree-based relation- ship matrix for the genotyped individuals, and G is con- structed according to [19]:.
- a Pheno 1 : phenotypes from only the conventional phase (1 – 20 years) were used.
- Pheno 1 + 2 : phenotypes from both the conventional phase and genomic selection phase (1–35 years) were used.
- estimated from the animals in P-base, whereas for the scenario of Pheno 2 , the allele frequencies were estimated from the animals in G-base.
- as implemented in the DMU package [39]..
- the imputation is conducted from the following linear relationship: M 1.
- σ 2 α is the variance of the marker effects under the assumption that all markers exhibit common genetic variance and can explain all additive genetic variance.
- The use of ssBR allows the inference of the additive genetic variance from two sources of information: first, the total genetic variance, approximated by the esti- mated imputation residual variance, σ 2 ε .
- Similar to ssGBLUP, for the Table 2 Mean (SD) of the “ true ” variance components and heritability in the base (founder) population (P-base) and the base population for the genomic phase a and estimates of variance components and heritabilities from P-AM, ssGBLUP, and ssBR based on three scenarios of phenotyping b.
- σ 2 g is the genetic variance used in P- AM, σ 2 ε is the total genetic variance used in ssBR.
- σ 2 e is the residual variance.
- σ 2 α is the marker variance.
- j¼1 2p j ð 1 − p j Þ is used to calculate genetic variance via the estimated marker variance in ssBR, where p j is the observed allele frequency at locus j, and m is the total number of markers.
- b Pheno 1 : phenotypes from only the conventional phase (1–20 years) were used.
- The significance test was performed to determine whether the estimated parameter differs from the simulated parameter.
- scenarios of Pheno 1 and Pheno 1 + 2 , allele frequencies were calculated based on the stored genotypes in the base population (P-base).
- The length of the chain was set to 50,000, with a burn-in of 20,000 itera- tions.
- The convergence of the posterior distribution for each parameter investigated was assessed using the boa and coda packages [42, 43]..
- PM maintained the DMU software package used in the statistical analysis.
- JRT helped with the design of the simulation.
- ACS maintained the ADAM software package used in the simulation.
- GPA and JJ contributed to the interpretation of the results and helped coordinate the project.
- The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript..
- The datasets analysed during the current study are available in the figshare repository (https://doi.org/10.6084/m9.figshare.10547921.v3)..
- Estimation of genetic variance in the age of genomics.
- Short communication: genomic prediction using different single-step methods in the Finnish red dairy cattle population.
- Comparing estimates of genetic variance across different relationship models.
- Inferring the trajectory of genetic variance in the course of artificial selection.
- Extension of the bayesian alphabet for genomic selection

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt