« Home « Kết quả tìm kiếm

GCA: An R package for genetic connectedness analysis using pedigree and genomic data


Tóm tắt Xem thử

- Results: We developed the GCA R package to perform genetic connectedness analysis for pedigree and genomic data.
- The software implements a large collection of various connectedness statistics as a function of prediction error variance or variance of unit effect estimates.
- The GCA R package is available at GitHub and the source code is provided as open source..
- Conclusions: The GCA R package allows users to easily assess the connectedness of their data.
- Keywords: Genetic connectedness, Prediction error of variance, Variance of unit effect estimates.
- Genetic connectedness quantifies the extent to which estimated breeding values can be fairly compared across units or contemporary groups [1, 2].
- The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material.
- If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
- The objective of this article is to describe a large collection of connected- ness statistics implemented in the GCA package, overview the software architecture, and present several examples using simulated data..
- A list of connectedness statistics supported by the GCA R package is shown in Fig.
- These statistics can be clas- sified into core functions derived from either prediction error variance (PEV) or variance of unit effect estimates (VE).
- PEV-derived metrics include prediction error vari- ance of differences (PEVD), coefficient of determination (CD), and prediction error correlation (r).
- Further, each metric based on PEV can be summarized as the aver- age PEV within and across units, at the unit level as the average PEV of all pairwise differences between individ- uals across units, or using a contrast vector.
- For each VE-derived met- ric, three correction factors accounting for the number of fixed effects can be applied.
- Further, the overall connectedness statistic can be.
- obtained by calculating the average of the pairwise con- nectedness statistics across units..
- Prediction error variance (PEV).
- The MME of the linear mixed model is.
- The inverse of the coefficient matrix is given by.
- 1 An overview of connectedness statistics implmented in the GCA R package.
- The statistics can be computed from either prediction error variance (PEV) or variance of unit effect estimates (VE).
- Connectedness metrics include prediction error variance of the difference (PEVD), coefficient of determination (CD), prediction error correlation (r), variance of differences in unit effects (VED), coefficient of determination of VE (CDVE), and connectedness rating (CR).
- 0, 1, and 2 are correction factors accounting for the fixed effects in the model.
- C 22 represents the lower right quadrant of the inverse of coefficient matrix.
- Var ( u |ˆ u ) can be viewed as the posterior variance of u..
- Kennedy and Trus argued that mean PEV over unit (PEV Mean ) defined as the average of PEV between indi- viduals within the same unit can be approximated by VE.
- PEV Mean (1).
- [12] pointed out that the agreement between PEV Mean and VE 0 depends on a number of fixed effects other than the management group fitted in the model.
- When unit effect is the only fixed effect included in the model, the exact PEV Mean can be obtained as given below..
- VE 2 = PEV Mean (3).
- This equation is suitable for cases in which there are two or more fixed effects fitted in the model..
- Below we describe connectedness metrics implemented in the GCA package.
- clearly articulated in the literature.
- Prediction error variance of difference (PEVD).
- A PEVD metric measures the prediction error variance difference of breeding values between individuals from different units [11].
- (C 22 ii + C 22 jj − 2C 22 ij )σ 2 , (5) where PEC ij is the off-diagonal element of the PEV matrix corresponding to the prediction error covariance between errors of genetic values..
- Group average PEVD: The average PEVD derived from the average relationships between and within units as a choice of connectedness measure can be traced back to Kennedy and Trus [11].
- This can be calculated by insert- ing the PEV Mean of i th and j th units and mean prediction error covariance (PEC Mean ) between i th and j th units into Eq.
- Contrast PEVD: The PEVD of contrast between a pair of units can be used to summarize PEVD [14]..
- A flow diagram showing a computational procedure is shown in Fig.
- 2 A flow diagram of three prediction error variance of the difference (PEVD) statistics.
- The group average PEVD (PEVD _ GrpAve) is shown in A..
- A1: Prediction error variance (PEV) matrix including variances and covariances of seven individuals.
- A2: Calculate the mean of prediction error variance / covariance within the unit (PEV _ mean) and mean of prediction error covariance across the unit (PEC _ mean).
- A3: Group average PEVD is calculated by applying the PEVD equation using PEV _ mean and PEC _ mean.
- The individual average PEVD (PEVD _ IdAve) is shown in B.
- B1: Prediction error variance (PEV) matrix including variances and covariances of seven individuals.
- The PEVD of contrast (PEVD _ Contrast) is shown in C.
- PEVD _ Contrast is calculated as the product of the transpose of the contrast vector (x), the PEV matrix, and the contrast vector.
- A CD metric measures the precision of genetic values and can be interpreted as the square of the correlation between the predicted and the true difference in the genetic values or the ratio of posterior and prior variances of genetic values u [15].
- Group average CD: Similar to the group average PEVD statistic, PEV Mean and PEC Mean can be used to summarize CD at the unit level..
- Graphical derivation of group average CD is illustrated in Fig.
- This summary method has not been used in the literature, but shares the same spirit with the group average PEVD..
- The group average CD (CD _ GrpAve) is shown in A.
- The individual average CD (CD _ IdAve) is shown in B.
- The CD of contrast (CD _ Contrast) is shown in C..
- CD _ Contrast is calculated by scaling the prediction error variance of the differences (PEVD) of contrast with the product of the transpose of the contrast vector (x), the relationship matrix (K), and the contrast vector.
- A flow diagram of individual average CD is shown in Fig.
- Prediction error correlation (r).
- Prediction error correlation, known as pairwise r statistic, between individuals i and j is calculated from the elements of the PEV matrix [16]..
- Group average r: This is known as flock connectedness in the literature, which calculates the ratio of PEV Mean.
- This summary method has not been used in the literature, but shares the same concept with the contrasts PEVD and CD.
- A metric VED, which is a function of VE can be used to measure connectedness.
- All PEV-based metrics follow a two-step procedure in the sense that they first com- pute the PEV matrix at the individual level and then apply one of the summary methods to derive connected- ness at the unit level or vice versa.
- Moreover, since the number of fixed effects is oftentimes smaller than the number of individuals in the model, the compu- tational requirements for VED are expected to be lower [12].
- Note that all VE-derived approaches can be classi- fied based on the number of fixed effects to be corrected..
- 4 A flow diagram of three prediction error correlation (r) statistics.
- The group average r (r _ GrpAve) is shown in A.
- A1: Prediction error variance (PEV) matrix of seven individuals.
- A3: Group average r is a correlation calculated from PEV _ mean and PEC _ mean.
- B1: Prediction error variance (PEV) matrix of seven individuals.
- B2: Calculate pairwise correlation coefficients of individuals between units using PEV and prediction error covariance (PEC).
- B3: Individual average r is calculated as the average of pairwise prediction error correlation coefficients of individuals across units.
- The r of contrast (r _ Contrast) is shown in C.
- r _ Contrast is calculated from the product of the transpose of the contrast vector (x), r matrix, and the contrast vector.
- Similarly, the correction function based on VEc can be employed to define a group average CD alike statistic..
- The GCA R package is implemented entirely in R, which is an open source programming language and environment for performing statistical computing [18].
- The initial versions of the algorithms and the R code were used in previous studies [4, 8, 9] and were enhanced further for efficiency, usability, and documen- tation in the current version to facilitate connectedness analysis.
- The GCA R package provides a comprehensive and effective tool for genetic connectedness analysis and whole-genome prediction, which further contributes to the genetic evaluation and prediction..
- Installing the GCA package.
- The current version of the GCA R package is available at GitHub (https://github.com/QGresources/GCA).
- package can be installed using the devtools R package [20].
- A simulated cattle data set using QMSim software [21] is included in the GCA package as an example data set.
- The data set is stored as an R object in the package..
- Application of the GCA package.
- A detailed usage of the GCA R package can be found in the vignette document (https://qgresources.github.io/GCA_.
- The GCA R package provides users with a com- prehensive tool for analysis of genetic connectedness using pedigree and genomic data.
- The users can eas- ily assess the connectedness of their data and be mindful of the uncertainty associated with comparing genetic values of individuals involving different man- agement units or contemporary groups.
- Moreover, the GCA package can be used to measure the level of connectedness between training and testing sets in the whole-genome prediction paradigm.
- This parame- ter can be used as a criterion for optimizing the train- ing data set.
- This paper also summarized the relation- ship among various connectedness metrics, which was not clearly articulated in the past literature.
- In sum- mary, we contend that the availability of the GCA package to calculate connectedness allows breeders and geneticists to make better decisions on compar- ing individuals in genetic evaluations and inferring link- age between any pair of individual groups in genomic prediction..
- PEV: Prediction error variance.
- PEVD: Prediction error variance of differences.
- r: Prediction error correlation.
- We thank the Morota lab members for testing the GCA package..
- The GCA R source code is provided as free and open source.
- The webpage https://github.com/QGresources/GCA was created as a nexus of all genetic connectedness related functions and examples available in the GCA R package.
- GM is a member of the editorial board for BMC Genomics..
- Estimation of genetic connectedness diagnostics based on prediction errors without the prediction error variance–covariance matrix

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt