« Home « Kết quả tìm kiếm

Copy number variation in human genomes from three major ethno-linguistic groups in Africa


Tóm tắt Xem thử

- Background: Copy number variation is an important class of genomic variation that has been reported in 75% of the human genome.
- We used GenomeSTRiP and cn.MOPS to identify copy number variant regions (CNVRs)..
- Conclusions: Novel CNVRs in the current study increase representation of African diversity in the database of genomic variants..
- The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.
- If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
- Niger- Congo-A and Niger-Congo-B [9].
- Niger-Congo-B speaking populations from Uganda (UBB, n = 33) and the Democratic Republic of Congo (DRC, n = 50).
- Niger-Congo B- speaking populations from Uganda (UBB) and Democratic Republic of Congo (DRC).
- Comparison of cn.MOPS and GenomeSTRiP.
- GenomeSTRiP detected 16,149 CNVRs compared to 9213 detected by cn.MOPS.
- We defined as high confidence CNVRs those called by both GenomeSTRiP and cn.MOPS.
- This identified 7608 GenomeSTRiP CNVR that overlapped or were within cn.MOPS loci (Additional file 3)..
- Characteristics of CNVRs identified by GenomeSTRiP and cn.MOPS.
- The CNVRs discovered by GenomeSTRiP (median length 5.2 kb) were much shorter than those discovered by cn.MOPS (median length 32 kb) (Table 2) and were more similar in length to those in the database of gen- omic variants (DGV.
- The total lengths of CNVRs were 108 Mb and 1145 Mb in GenomeSTRiP and cn.MOPS, respectively.
- 24% of CNVRs were common to all three major lin- guistic groups represented in the data, 55% were unique to single linguistic groups and 21% were shared between pairs of major populations (Fig.
- GenomeSTRiP CNVR overlapping cn.MOPS CNVR were selected and singletons assessed for removal.
- We found 7384 of the 7608 final CNVRs analysis set overlapped known CNVRs in the human DGV and had not been previously reported, and were de- fined as novel CNVRs.
- Unique CNVR boundaries in the.
- The novel CNVRs also overlap SNPs as- sociated with traits in the genome wide association study catalogue (Additional file 2: Fig S5 and Add- itional file 6).
- Table 2 CNV statistics using GenomeSTRiP and cn.MOPS algorithms.
- Parameter GenomeSTRiP cn.MOPS GenomeSTRiP that overlap cn.MOPS.
- Descriptive statistics of CNVR found using GenomeSTRiP and cn.MOPS.
- Note that: GenomeSTRiP has about 5.3 times the number of CNVs compared with cn.MOPS (11,275 cf.
- GenomeSTRiP CNVRs were shorter (median length 5.3 kb) than cn.MOPS (median length 32.4 kb).
- Total length of cn.MOPS CNVRs was about 10.6 times greater (1146 Mb cf.
- a Count of any overlap (minimum 1 bp) between GenomeSTRiP and cn.MOPS CNVR.
- b The expected length of CNVs that would be found by both methods was obtained by 100 simulations using all the observed lengths of CNVs allocated to random places in the genome.
- We assumed that if a haplotype is asso- ciated with a CNV then the number of alleles (0, 1, 2) of that haplotype will be correlated with the observed num- ber of copies reported in samples in the dataset.
- There was no differ- ence between populations in the proportion tagged..
- Haplotypes that tag the CNVR detected in each of the five populations tested are shown in Additional file 8..
- in the CIV in the DRC in the GAS in the UBB and in the UNL..
- 3.0) in the UNL population in a separate study of the same data [21].
- A majority of the CNVR are shared between populations, but Nilo-Saharans appear to have the least CNVR, with most of them shared with the Niger Congo A and Niger Congo B.
- Protein coding genes were under- represented with 75% of the expected number..
- The mean frequency of CNVs in the CNVRs with SNPs under selection (19%) was twice that of CNVRs without SNPs under selection (8.5%) (χ 2 = 11,673.
- There were 2693 CNVRs with SNPs that tag hap- lotypes in the UNL population and 372 CNVRs with SNPs with evidence of selection.
- The cn.MOPS CNVRs were much larger, with a mean of 4.5 GenomeSTRiP CNVRs overlapping each cn.MOPS CNVR (Table 2).
- The histogram in the legend indicates the number of correlations with each value of Pearson ’ s r, there are large numbers of correlations between 0.5 and 0.6 and also between 0.9 and 1.
- The differences in CNVs detected by GenomeSTRiP and cn.MOPS are consistent with reports observing that different algorithms for detecting CNVs from whole genome sequencing data show major differ- ences in the CNVs detected [25].
- None of the novel CNVs in our data were common and less than 2% were shared between populations..
- There was a threefold variation in CNV and CNVR frequency per Mb between chromosomes in our dataset and a nearly twofold variation in the 1000 Genomes data, even after correction for chromosome length (Fig.
- The density of CNVRs per Mb for each chromo- some was correlated in the 1000 Genomes and our data- sets ( r suggesting that CNVR density may be an intrinsic property of chromosomes.
- The centre of the circle has the least frequency of <.
- b Comparison of frequencies in the various populations.
- All populations are represented in the plot with different colours.
- There is also strong correl- ation between alleles of a SNP and CNVs in the CCL4 (Cyst- eine-Cysteine Ligand 4) chemokine gene [28]..
- In the current study, SNP haplotypes tagged 41% of CNVRs..
- The weak association between CNV genotype and population structure in the PCA analysis was consistent with both these hypotheses.
- Therefore, the number of CNVRs associated with SNP haplotypes may be an indicator of the proportion of stable, non-recurrent homologous, high frequency CNVRs..
- affecting the frequency of the RhD selection [32].
- Variants in the Human Leucocyte Antigen, class II, DQ beta 1 ( HLADQB1 ) has been associated with pre- eclampsia in Iranian women [33].
- Given the association of HLADQB1 and KIR in pre- eclampsia and infectious disease which may impact infant birth and survival, they may be the actual targets of positive selection, resulting in the signatures of selection which have been seen in these loci.
- We found that CNV distinguish major continental popu- lations, when we included Asians, South Asians, Ameri- cans, Europeans and Africans from the 1000 Genomes in the same PCA plot.
- Africans in the 1000 Genomes (AFR) are closer to our data (TGN).
- The Africans in the 1000 Genomes overlay the TrypanoGEN African samples, indicating similar CNV in the datasets..
- There was no specific pattern observed as fewer bi-allelic insertions were available in the data.
- In a study of parent child trios up to 7% of variant loci in the child could not be associated with variants in the parents, which is indicative of novel or recurrent variants or alternatively, problems in variant geno- typing [39].
- These include known CNVRs that have been described in the DGV, and novel ones (3.
- that are not reported in the DGV, reflecting the diverse nature of these African popula- tions.
- Some of the CNVRs described may have medical significance as they occur in Mendelian disease-causing genes and overlap SNPs significantly associated with various traits in the GWAS catalogue.
- Finally, we show that CNV distinguish between continental populations but do not stratify within the continent, such as the Africans in the current study..
- The study was conducted in the context of the Trypano- GEN project [42], which aims to determine host genetic susceptibility to Human African Trypanosomiasis..
- n = 33) whereas Central African populations were Niger-Congo B speakers (n = 50) from the Democratic Republic of the Congo.
- The samples in the current study are a subset of those described in the Try- panoGEN bio-bank [42].
- Of the six methods benchmarked recently by Trost and colleagues [43], only cn.MOPS and Genome- STRiP use population scale data.
- Due to limited African CNV datasets, we referenced an evaluation of CNVR detec- tion algorithms for sensitivity and false discovery rate against CNVR in the HuRef CNV Benchmark [43].
- From these evaluations, two algorithms (GenomeSTRiP and cn.MOPS) integrated data from multiple samples.
- and a false discovery rate of 0.49, whereas cn.MOPS had sensitivity of 0.38.
- We therefore used GenomeSTRiP [27] and cn.MOPS [44] to detect CNVs in binary alignment map (BAM) files of our data.
- GenomeSTRiP has previously been used to detect CNVs in the 1000 Genomes project of hu- man populations [27].
- To validate detected CNVs we tested for overlap with published CNVs in the public Database of Genomic Variants (DGV.
- PLINK was used for population clustering as described in the documenta- tion.
- We investigated population differentiation by comparing F ST between CNVs in the different populations.
- 0.8 with at least one other SNP in the region were assembled into haplotypes.
- 0.05 for the null hypothesis that the slope of the regression line is zero.
- List of samples in the study..
- Correlation of GenomeSTRiP and cn.MOPS and supplementary figures..
- GenomeSTRiP CNVR that intersect cn.MOPS CNVR after QC..
- Haplotypes that tag CNVR in each of the populations..
- CM: cn.MOPS.
- cn.MOPS: Copy number mixture of Poissons.
- GAS: Guinea Niger Congo A speakers.
- NCA: Niger Congo A.
- NCB: Niger Congo B.
- cn.MOPS is also available from bio-conductor [51]..
- All study participants gave written informed consent to participate in the study.
- Global variation in copy number in the human genome.
- The clinical context of copy number variation in the human genome.
- Ethnologue: Languages of the World.
- The database of genomic variants: a curated collection of structural variation in the human genome.
- Linkage disequilibrium patterns of the human genome across populations.
- Population-genetic nature of copy number variations in the human genome.
- Population structure in copy number variation and SNPs in the CCL4L chemokine gene.
- Evolutionary genetics of the human Rh blood group system.
- Origins and functional impact of copy number variation in the human genome..
- Recurrent DNA copy number variation in the laboratory mouse.
- cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.
- cn.mops.
- http://bioconductor.org/packages/cn.mops/.

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt