« Home « Kết quả tìm kiếm

Assessing runs of Homozygosity: A comparison of SNP Array and whole genome sequence low coverage data


Tóm tắt Xem thử

- Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data.
- Background: Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent.
- By allowing heterozygous SNPs per window, using the PLINK homozygosity function and non-parametric analysis, we were able to obtain non-significant differences in number ROH, mean ROH size and total sum of ROH between data sets using the different technologies for almost all populations..
- Runs of Homozygosity (ROH) are contiguous regions of the genome where an individual is homozygous across all sites.
- However, it was not until the first arrays with more than 300 K SNPs were used that the analysis of ROH started to shed light on the under- standing of human demographic history and in deci- phering the genetic structure of traits and complex diseases [6–8].
- Currently array-based genotyping covers around 1.9 to 2.2 million SNPs, allowing meaningful de- tection of ROH longer than 1 Mb, and even though this is an important improvement over previous arrays, it covers only ~ 2% of the total common SNPs present in the human genome [9, 10].
- 1 Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa Full list of author information is available at the end of the article.
- 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Thus, analyzing the ef- fect of different lengths of ROH may reveal the relative contributions of multiple rare and common variants to the demographic history of human populations and to explore and test new approaches to understand complex traits [11]..
- The number and length of ROH reflect indi- vidual and population history while the homozygosity burden can be used to investigate the genetic archi- tecture of complex disease.
- They contributed to stud- ies for different diseases and risk factors, from cancer to cognition, and have been tested for association with either the burden of ROH (total sum of ROH), their abundance (number of ROH), or for association of individual ROH with a phenotype.
- The application and usefulness of ROH is not limited to humans.
- The second is to obtain appropriate parameters of ROH call- ing that allow meaningful comparison between ROH obtained from both technologies..
- Observational approaches use algorithms that scan each chromosome by moving a fixed size window along the whole length of the genome in search of stretches of consecutive homozygous SNPs [32].
- In the algorithm, a variable number of heterozygote positions or missing SNPs can be speci- fied per window in order to tolerate genotyping errors and failures.
- The simplicity of the approach used by PLINK allows efficient execution on data from large consortia [12].
- However, with the sparse nature of the WES target design, long ROH detection is not possible.
- A number of factors influence the quality of ROH calling, including the marker density, their distribution across the genome, the quality of the genotype calling/error rates and minor allele frequency..
- Currently ROH studies have been carried out using genome-wide scan data overwhelmingly from SNP arrays both because of the availability of this data and the fact that array data is considered the gold stand- ard with very low genotyping calling error rates (typically.
- Indeed, it is expected that long ROH will keep their homozygous status independently of the SNP coverage..
- ROH boundaries will be fuzzier in comparison with WGS and because arrays have fewer SNP they will systematically present and underestimate of ROH shorter than 1 Mb..
- For cost reasons, low coverage sequencing is often employed to maximize the number of participants in a study and strengthen its power.
- 4× average) has a high probability that only one of the two chromosomes of a diploid individual has been sampled at a specific site [42, 43].
- Hence, parameters of ROH calling algorithms require tuning to the characteristics of the underlying data in order to obtain meaningful comparable results between studies using different technologies.
- In order to have a meaningful comparison of ROH obtained from array and WGS low coverage data it is important to first analyze the differences in presence of heterozygous SNPs and variant calling between both technologies.
- expected, WGS included more heterozygotes SNPs since the SNP array captured only data from ~ 2.5 M nucleo- tide positions in the autosomal genome, whereas the WGS provided data for the entire length of the genome.
- Of the discordant calling, on average, 0.1%.
- (±0.03) of the SNPs was called heterozygous by the array and homozygous by WGS and of the SNPs was called heterozygous by WGS, but homozygous by array.
- PLINK, by allowing a flexible number of heterozygous SNPs per window (the default value being 1 heterozygous SNP per window), already takes into account possible call- ing errors that may wrongly break a long ROH.
- By allowing this heterozygous SNP, the software produces an error that depends on the number of SNP (in homozygous state) per ROH.
- This figure shows that for most of the populations the ep(P,h) produced by allowing a single heterozygous SNP per window in array data is equivalent to allowing 4 to 5 heterozygous SNPs in WGS data.
- These differences are pro- voked by differences in the mean number of SNPs per.
- For example, the TSI population has, on average, 368 SNPs in the homozy- gous state per ROH in the array data, less than half of the average SNP per ROH in array data across all populations (714.7)..
- 1 it seems appropri- ate to compare ROH from both technologies allowing 1 to 5 heterozygote SNPs in WGS data in order to obtain Table 1 Mean number of heterozygote SNPs (per called SNP) in array and WGS low coverage data for 20 world populations.
- 1 Effect of allowing heterozygous SNPs per window evaluated by ep(P,h) as a measure of the empirically observed actually number of heterozygous SNPs found in population P when we allow h heterozygous SNP.
- Violin plots show the distribution of mean number of ROH (Fig.
- 3) and mean total sum of ROH (Fig.
- Figure 5a–c show the correlations with the array data as heat-maps between number of ROH (5a), mean ROH size (5b), and total sum of ROH (5c) for each population and a different number of allowed hetero- zygous SNPs in the WGS data (values and probabil- ities shown in Additional file 3).
- Results of the statistical comparison between ROH obtained from array and WGS (with a different number of heterozygous SNPs allowed) by the Mann-Whitney-Wilcoxon (MWW) test are shown as a heat-map of significance (p values.
- In general, by allowing 3 hetero- zygotes SNPs per window in WGS the statistical outcomes in the number of ROH, mean ROH size and total sun of ROH are similar between array and WGS data.
- 5d–f also show that for the Asian pop- ulations, especially the JPT, for the number of ROH and total sum of ROH differences between array and WGS data are significant for every heterozygous SNP allowed..
- 2 Violin plots of the mean number of ROH longer than 1 Mb.
- Once we established that the best PLINK condition to obtain comparable results is to allow 3 heterozy- gous SNPs per window when dealing with WGS low coverage, we compared the mean sum of ROH in both technologies for different ROH length categor- ies (Fig.
- Figure 6 shows that for ROH longer than 1 Mb, the array and WGS mean total lengths are very similar, with some exceptions like the JPT, in the case of ROH longer than 8 Mb..
- This gap between array and WGS data can be corrected for small ROH by changing PLINK parameters and relaxing the number of SNPs needed to call a ROH (−-homo- zyg-snp 30, data not shown)..
- Runs of homozygosity are an excellent tool to delve into the exploration of different aspects of human genetics..
- Large genomic datasets, using array and whole genome sequence data, are now becoming available and offer the researcher a unique opportunity to better understand the influence of ROH on complex diseases architecture and demographic history..
- unlike mean number and total sum of ROH, for most of the populations, mean ROH size remains equivalent between technologies when allowing 3 or more heterozygous SNPs per window.
- As a consequence, the mean total sum of ROH increases with more heterozygous SNP allowed..
- in fact for the Dai and Han popula- tions from China (CDX, CHS), Kinh population from Vietnam (KHV) and the Japanese population (JPT), it was not possible to obtain the same mean number and total sum of ROH between array and WGS data.
- This may be explained by population structure, but perhaps the inferior performance of the Infinium Omni 2.5–8.
- This could also explain why it was not possible to obtain same number of ROH in the Baganda population from Uganda (BAG) or the same mean ROH size in the Zulu population from South Africa (ZUL)..
- In Table 2 we present a comparison in per- formance of the application of three different technolo- gies (SNP array, WGS low coverage and WES data) to detect short, medium and long ROH..
- 4 Violin plots of mean total sum of ROH longer than 1 Mb (in Gb).
- Only SNPs of the 22 autosomes were in- cluded in this analysis.
- This filtering limits the effects of ascertainment bias caused by the small number of individuals in the SNP discovery panel, in the case of the array, and the calling errors associated with a low depth coverage of whole genome sequence data..
- Identification and Characterization of ROH:.
- Minimum number of SNPs that a ROH is required to have.
- Length in Kb of the sliding window – hmozyg-density 50.
- 5 Heatmaps of correlations and MWW tests of mean number of ROH, mean ROH size and mean total sum of ROH between array data allowing 1 heterozygous SNP per window and WGS data allowing 1 to 5 heterozygous SNPs per window (y-axis).
- Number of SNPs that the sliding window must have.
- Number of heterozygous SNP allowed in a window.
- Number of missing calls allowed in a window.
- The minimum length of a ROH was set to 300 kb..
- PLINK allows the setting of different variable number of heterozygous SNPs per window, with a default value of 1 heterozygous genotype per window, in order to tolerate genotyping calling errors.
- We define ep(P,h) as a measure of the empirically ob- served actual number of heterozygous SNPs found in population P when we allow h heterozygous SNPs..
- arithmetic mean of the actual number of heterozygous found in all ROHs in R(P,h,x,y) found in the population under study.
- This observed number of heterozygous SNPs differs from the parameter used for detecting ROHs depending on the population and technology platform characteristics..
- Mean number of ROH as the mean num- ber of ROH longer than 1 Mb.
- Mean ROH size as the mean size of ROH longer than 1 Mb.
- Total sum of ROH as the mean total sum of ROH longer than 1 Mb..
- 6 Mean sum of ROH in different length categories.
- Statistical comparisons between mean number of ROH, mean ROH size and mean total sum of ROH for dif- ferent populations, technologies and PLINK conditions were performed by Pearson’ s correlation and Mann- Whitney-Wilcoxon non-parametric test (MWW).
- Mean number of SNP (in homozygous state) per ROH in array data with 1 heterozygous SNP per ROH and WGS data with 1 to 5 heterozygous SNPs per ROH.
- Means and standard deviations of number of ROH, ROH size and total sum of ROH for different populations, technologies and allowed heterozygous SNPs per ROH (DOCX 35 kb).
- FCC is a National Research Foundation of South Africa (NRF) postdoctoral fellow and MR holds a South African Research Chair in Genomics and Bioinformatics of African populations hosted by the University of the Witwatersrand, funded by the Department of Science and Technology and administered by the NRF.
- 1 Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa..
- 2 Division of Human Genetics, School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa..
- Information Engineering, University of the Witwatersrand, Johannesburg, South Africa..
- Can be adjusted to detect ROH by modifying the number of SNPs required in a ROH..
- Allowing 3 heterozygous SNPs per ROH would grant meaningful outcomes..
- Able to detect, but only in selected genomic regions and boundaries of ROH could be fuzzy if they reach into non- exonic regions [49]..
- Extended tracts of homozygosity in outbred human populations.
- Long runs of homozygosity are enriched for deleterious variation.
- Runs of homozygosity: windows into population history and trait architecture.
- Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia.
- Runs of homozygosity implicate autozygosity as a schizophrenia risk factor.
- Association of Long Runs of Homozygosity with Alzheimer disease among African American individuals.
- Intellectual disability is associated with increased runs of homozygosity in simplex autism.
- Runs of homozygosity and inbreeding in thyroid cancer.
- Runs of Homozygosity:.
- The distribution of runs of homozygosity and selection signatures in six commercial meat sheep breeds.
- Genomic patterns of homozygosity in worldwide human populations.
- Detecting autozygosity through runs of homozygosity: a comparison of three autozygosity detection algorithms.
- Increased rate of deleterious variants in long runs of homozygosity of an inbred population from Qatar.
- H3M2: detection of runs of homozygosity from whole-exome sequencing data excess of homozygosity in the major histocompatibility complex in schizophrenia.
- A genome-wide homozygosity association study identifies runs of homozygosity associated with rheumatoid arthritis in the human major histocompatibility complex..
- Runs of homozygosity in European populations.
- Detection of runs of homozygosity from whole exome sequencing data: state of the art and perspectives for clinical, population and epidemiological studies genome- wide homozygosity signatures and childhood acute lymphoblastic leukemia risk single nucleotide polymorphism arrays: a decade of biological, computational and technological advances.
- Genomic and geographic distribution of SNP-defined runs of homozygosity in Europeans

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt