« Home « Kết quả tìm kiếm

A fast-linear mixed model for genome-wide haplotype association analysis: Application to agronomic traits in maize


Tóm tắt Xem thử

- A fast-linear mixed model for genome-wide haplotype association analysis: application to agronomic traits in maize.
- In the haplotype association analysis, both haplotype alleles and blocks are tested.
- Results: Based on the FaST-LMM, the fastLmPure function in the R/RcppArmadillo package has been introduced to speed up genome-wide regression scans by a re-weighted least square estimation.
- When large or highly significant blocks are tested based on EMMAX, the genome-wide haplotype association analysis takes only one to two rounds of genome-wide regression scans.
- In genome-wide association studies (GWAS), single nu- cleotide polymorphisms (SNPs) are the smallest genetic units analyzed.
- Large genetic units can be obtained through the combination of multiple SNPs in different forms.
- Genome-wide association analysis for large genetic units shows major advantages over SNPs in relation to: 1).
- explaining large percentages of phenotype variations by the combined effects of multiple SNPs and 2) facilitating the study of mechanisms related to complex traits by biologically meaningful genetic units such as genes and pathways [9]..
- However, the high computing intensity of LMM has motivated the development of simpler algo- rithms [10–17] to reduce the computational burden, allowing LMM to become a widely used and powerful approach in genome-wide association studies (GWAS)..
- 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Instead of REML, the efficient mixed-model association (EMMA) [15] avoids a redundant and com- putationally expensive matrix operation at each iteration in the computation of the likelihood function by the spectral decomposition of phenotype and marker indica- tors.
- Finally, the second derivatives for the log-likelihood function are considered in the genome-wide efficient mixed-model association (GEMMA) [17] algorithm, specifically based on the spectral decomposition, in order to determine the global optimum..
- Based on the FaST-LMM [16], we transform the genome-wide mixed model association analysis to a lin- ear regression scan, along with searching for variance components, and extend the FaST-LMM for SNPs to different genetic units by constructing a unified test stat- istic.
- To speed up genome-wide regression scans, we introduce the fastLmPure function in the R/RcppArma- dillo package to infer the effect of tested genetic units..
- When only large or highly significant blocks obtained from EMMAX are tested, the genome-wide haplotype association analysis will reduce the analysis to one or two rounds of genome-wide regression scans.
- software Single-RunKing [19] was developed to imple- ment the extremely fast genome-wide mixed model as- sociation analysis for different genetic units.
- The high- computing efficiency of the software is demonstrated by the re-analyzing of 17 agronomic traits from the maize genomic datasets [20]..
- Haplotype blocks of the genomic dataset were con- structed using the Four Gamete Test method (FGT) [21], which is implemented in the Haploview software [22].
- More than 90% of the haplotype blocks contained less than 10 SNPs, with the largest block containing 71 SNPs.
- Figure 2 shows the distribution of the number of haplo- type alleles included in the blocks, of which 85% of haplotype blocks yielded 3~6 alleles and the most haplo- type alleles were 13 in a single block..
- GWAS for genetic units.
- The inner picture is an enlargement of the horizontal coordinates from 25 to 70.
- 3 QQ and Manhattan plots of three genetic units for TMAL trait.
- When haplotype blocks were analyzed, their last haplotype alleles were removed to make the regression of the block identifiable.
- At a significance level of 5%, the critical thresholds by the Bonferroni correction were determined as and 6.259 to declare signifi- cance for SNPs, haplotype alleles, and blocks, respect- ively.
- The agronomic traits were all associated with genome-wide SNPs, haplotype alleles, and blocks using the LM with unified test statistics and the Single- RunKing software based on the FaST-LMM..
- of and 5.1181 min, respectively, which were significantly lower than that of the linear model implemented in the R/lm function and 54.8637 min).
- 4 QQ and Manhattan plots of three genetic units for CD trait.
- No SNPs, haplotype alleles, and blocks were lo- cated together for the same trait, with two types of genetic units at most being located for a specific trait.
- 5 QQ and Manhattan plots of three genetic units for KNPR trait.
- Table 1 Three types of significant genetic units identified for 17 traits using the Single-RunKing software.
- and chr4.S explained 7.33 and 7.38% of the phenotypic variation, respectively.
- The four haplotype alleles accounted for 0.54 to 10.16% of the phenotypic variation, while the three haplotype blocks accounted for and 10.69%, which are quite larger than the corresponding SNPs or haplotype alleles detected..
- Additionally, all the detected genetic units were mapped on the annotated genes, especially Chr3Block4589 on two genes with known biological meaning..
- Using spectral decomposition of phenotypes and markers, the FaST-LMM transformed the LMM of the tested marker to LM.
- In GWAS implemented in the Single-RunKing software, computational efficiency is greatly improved in three ways: 1) by using the bare-bones linear model fitting function, known as R/fastLmPure, to rapidly estimate genetic effects of the tested SNPs, 2) by replacing genomic variance with heritability to narrow down the search of solutions, and 3) by focusing on large or highly significant SNPs obtained with EMMAX.
- The Single-RunKing software was developed to transform the genome-wide mixed model association analysis into bare- bones regression scans, where the optimal polygenic herit- ability of the tested markers is searched by the re- weighted least square estimation of the genetic effects..
- Given the genomic heritability, the EMMAX method needs a genome-wide regression scan of only one round..
- Based on the EMMAX method, the Single-RunKing software will run genome-wide regression scans within two rounds if only large or highly significant markers are tested..
- In genome-wide mixed model association analysis, the construction of kinship matrix by all markers will con- sume increasingly more memory footprint and comput- ing time, given that more high-throughput SNPs are produced by re-sequencing techniques.
- Counter- productively, the use of all or too many SNPs to calcu- late kinship matrices may yield proximal contamination due to the over-estimation of polygenic vari- ance, especially for large genetic units.
- Additionally, the CMLM reduces the dimension of the RRM by clustering individuals into several groups based on the selected genetic markers.
- If the resource popula- tion is too large, a random sample of the population can also be used to rapidly estimate genomic heritability..
- Overall, in order to improve computing efficiency, all simplified procedures of the genome-wide mixed model association analysis can be incorporated into the Single- RunKing software..
- In real data analysis, the genetic units SNP, haplotype alleles, and blocks were analyzed, of which the former is included in the latter.
- As produced with the analysis of variance, three possible outcomes were detected among the three genetic units: the first which consists of both the former and the latter, the second which is only the former or only the latter, and the third is neither the former nor the latter.
- After be- ing applied for the genome-wide mixed model associ- ation analysis, the haplotype blocks explained more phenotypic variation than the detected corresponding SNPs or haplotype alleles due to the combined effects of multiple SNPs..
- fastLmPure, was used to rapidly estimate effects of gen- etic units and maximum likelihood values of the FaST- LMM.
- When only large or highly significant genetic units are tested based on the EMMAX, the extended Single-RunKing software for genetic units takes genome- wide regression scans one to two times.
- The algorithm was applied into the genome-wide association of agro- nomic traits in maize.
- FaST-LMM for genetic units.
- where y is a vector of the phenotypic values from n indi- viduals, which is justified for systemic factors that in- clude population stratification.
- β is the additive genetic effect of the tested genetic units, such as the SNP, haplotype (or block), and copy number variations.
- Following the FaST-LMM algorithm [16], we spec- trally decompose K = USU T , where S is the diagonal matrix containing the eigenvalues of K in descending order, and U is the matrix of the eigenvectors corre- sponding to the eigenvalues.
- When genetic units such as haplotypes (or blocks) and CNVs can be divided into more than three genotypes, it is required that one of those genotypes is constricted to.
- With ^ β and ^ σ 2 ε , the maximum likelihood value of the LM is estimated as:.
- which represents the polygenic heritability h 2 in the weighted diagonal matrix W.
- At the same time, the genetic effect of the tested genetic unit is statistically inferred by ^ β and ^ σ 2 ε corresponding to the optimized h 2 .
- As stated earlier, the FaST-LMM [16] transforms the genome-wide mixed model association analysis into lin- ear regression scans by re-weighted least square estima- tions for effects of genetic units, along with optimization of polygenic heritabilities.
- To speed up computational efficiency, the regression analysis for the tested genetic unit is implemented with the bare-bones linear model fitting function, known as fastLmPure, in the R/RcppAr- madillo package [19].
- The fastLmPure function returns only the genetic effect and the standard error of the tested genetic unit, and statistics, such as σ 2 ε , −2logL, student t , and p value, need to be calculated after running the fastLmPure function..
- Starting from the estimated genomic heritability of quantitative traits, we can search down- ward to rapidly determine maximum likelihood esti- mates for the polygenic heritability of the tested genetic unit.
- Once the polygenic heritability for each genetic unit is fixed at a genomic heritability, the fast regression scan mentioned earlier is simplified as the EMMAX al- gorithm [11], of which its genome-wide scanning speed reaches the highest value using the fastLmPure function without optimization of polygenic heritabilities.
- further enhance computing efficiency, we only selected genetic units of large effects or those with high signifi- cance levels (0.05 or 0.01) from the EMMAX algorithm to optimize the estimation of their polygenic heritabil- ities [19].
- Thus, the computing time complexity for the genome-wide mixed model association analysis becomes O ( imn ) with i being the time of the genome-wide re- gression scans (1 <.
- QQ and Manhattan plots of three genetic units for LNAE trait.
- QQ and Manhattan plots of three genetic units for DTH trait.
- QQ and Manhattan plots of three genetic units for PH trait.
- QQ and Manhattan plots of three genetic units for EH trait.
- QQ and Manhattan plots of three genetic units for ELW trait.
- QQ and Manhattan plots of three genetic units for ELL trait..
- QQ and Manhattan plots of three genetic units for TBN trait.
- QQ and Manhattan plots of three genetic units for EL trait.
- QQ and Manhattan plots of three genetic units for GW trait.
- QQ and Manhattan plots of three genetic units for CW trait.
- QQ and Manhattan plots of three genetic units for KW trait.
- QQ and Manhattan plots of three genetic units for DTS trait.
- QQ and Manhattan plots of three genetic units for DTA trait.
- EMMA: Efficient mixed-model association.
- GEMMA: Genome- wide efficient mixed-model association.
- GWAS: Genome-wide association studies.
- We are grateful to the two anonymous reviewers for their insightful comments that greatly improved the presentation of the manuscript..
- The funding bodies had no role in the design of the study and collection, ana- lysis, and interpretation of data and in writing the manuscript..
- Linkage disequilibrium in the human genome.
- High-resolution haplotype structure in the human genome.
- Segmental duplications and copy-number variation in the human genome.
- Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
- Gene and pathway-based second-wave analysis of genome- wide association studies.
- Variance component model to account for sample structure in genome-wide association studies.
- Mixed linear model approach adapted for genome-wide association studies.
- FaST linear mixed models for genome-wide association studies.
- Genome-wide efficient mixed-model analysis for association studies.
- Genome-wide barebones regression scan for mixed-model association analysis.
- Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel.
- Statistical properties of the number of recombination events in the history of a sample of DNA sequences..
- Improved linear mixed models for genome-wide association studies.
- Advantages and pitfalls in the application of mixed-model association methods.
- A SUPER powerful method for genome wide association study.
- Estimating effects and making predictions from genome-wide marker data.
- Common SNPs explain a large proportion of the heritability for human height

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt