« Home « Kết quả tìm kiếm

Structural variation of the malariaassociated human glycophorin A-B-E region


Tóm tắt Xem thử

- Background: Approximately 5% of the human genome shows common structural variation, which is enriched for genes involved in the immune response and cell-cell interactions.
- They are receptors for the invasion of the protist parasite Plasmodium falciparum, a causative agent of malaria.
- Many other structural variants exist across the glycophorin gene cluster, and they remain poorly characterised..
- Results: Here, we analyse sequences from 3234 diploid genomes from across the world for structural variation at the glycophorin locus, confirming 15 variants in the 1000 Genomes project cohort, discovering 9 new variants, and characterising a selection of these variants using fibre-FISH and breakpoint mapping at the sequence level.
- Conclusions: We identify and validate large structural variants in the human glycophorin A-B-E gene cluster which may be associated with different clinical aspects of malaria..
- Structural variation is re- sponsible for much of the differences in DNA sequence between individual human genomes [1–3], yet analysis of the phenotypic importance of structural variation has lagged behind the rapid progress made in studies of single nucleotide variation [4–6.
- The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.
- If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
- Full list of author information is available at the end of the article.
- We might expect that direct dis- ruption of ligand-receptor interactions by a glycophorin B-glycophorin A fusion receptor might be responsible for the protective effect of the DUP4 variant.
- Instead, it seems likely that DUP4 is associated with more complex alterations in the protein levels at the red blood cell surface resulting in increased red blood cell tension, mediating its protective effect against P.
- Given the size of effect of the DUP4 variant in protection against mal- aria (odds ratio ~ 0.6) and the frequency of the allele (up to 13% in Tanzania), it is clinically potentially very sig- nificant, although it appears to be geographically re- stricted to East Africa [11]..
- Because of the clinical importance of the DUP4 glyco- phorin variant, and how it can lead to insights on the mechanisms underlying malaria, it is timely to identify and characterise other structural variants in the glyco- phorin region.
- Previously, other structural variants in the glycophorin region have been identified in the 1000 Ge- nomes project samples by using sequence read depth analysis of 1.6 kb bins combined with a Hidden Markov Model approach to identify regions of copy number gain and loss [11].
- Although only DUP4 has been found to be robustly associated with clinical malaria phenotypes, it is possible that some of the other struc- tural variants are also protective, but are either rare, re- current, or both rare and recurrent, making imputation from flanking SNP haplotypes and genetic association with clinical phenotypes challenging..
- To de- tect copy number changes in the glycophorin genomic re- gion, we use sequence read depth analysis of 3234 diploid genomes from across the world, followed by direct analysis of structural variants using fibre-FISH and breakpoint map- ping using paralogue-specific PCR and Sanger sequencing..
- This will allow future development of robust yet simple PCR-based assays for each structural variant and detailed analysis of the phenotypic consequences of particular struc- tural variants on malaria infection and other traits.
- Together, this allows us to gain some insight into the evolutionary context of the extensive struc- tural variation at the glycophorin locus..
- Structural variation using sequence read depth analysis Previous work by us and others has shown that unbal- anced structural variation - that is, variation that causes a copy number change - can be effectively discovered by measuring the relative depth of sequence reads across the glycophorin region [11, 12.
- We extended our analysis to Gambian ge- nomes and identified 51 samples with DEL1 or DEL2 variants, and DEL16, subsequently characterized in the Brazilian cohort below.
- Further samples sequenced to high coverage diploid genomes from Brazil were analysed, which, given the ex- tensive admixture from Africa in the Brazilian popula- tion, are likely to be enriched for glycophorin variants from Africa.
- Fibre-FISH analysis of structural variants.
- Sequence read depth analysis shows copy number gain and loss with respect to the reference genome to which the sequence reads are mapped, but it does not deter- mine the physical structure of the structural variant.
- The repeated nature of the glyco- phorin region means that the green and red probes from the GYPB repeat cross-hybridise with the other repeats, with the GYPA repeat is distinguishable from the GYPB and GYPE repeats by a 16 kb insertion resulting in a small gap of signal in the green probe (Fig.
- For most variants the fibre-FISH results confirmed the structure previously predicted [11] and expected if the variants had been generated by non-allelic homologous recombination (NAHR) between the glycophorin repeats (Figs.
- However, three variants showed a com- plex structure that could not be easily predicted from the sequence read depth analysis.
- Two other structural variants (DUP5 and DUP26) also showed complex patterns of gains or losses, and fibre-FISH clearly shows the physical structure of the variant, including inversions..
- The more frequent of these two complex structural variants, DUP5, seems to be restricted to Gambia, as it is found once in the GWD population from the 1000 Ge- nomes project and twice in the Jola population from the Gambian Genome Variation project (Table 1).
- To distinguish the distal end of the GYPB repeat from the distal end of the GYPE repeat, a pink-coloured probe from a short GYPE -repeat-specific PCR product was also used for fibre-FISH, and clearly shows only a single copy of the distal end of the GYPB repeat in the DUP5 vari- ant, at the same position as the reference.
- The DUP26 variant was observed once, in sample HG03729, an Indian Telugu individual from the United Kingdom, sequenced as part of the 1000 Genomes pro- ject.
- Sequence read depth analysis predicts an extra copy of the glycophorin repeat, partly derived from the GYPB repeat and partly from the GYPA repeat (Fig.
- Defining the precise breakpoint of the variants can allow a more accurate prediction of potential phenotypic ef- fects of each variant by assessing, for example, whether a glycophorin fusion gene is formed or whether key regu- latory sequences are deleted.
- read depth at both ends of the deletion or duplication, and by designing PCR primers to specifically amplify across the junction fragment (Fig.
- The sequence alignment spanning the two 1 kb windows is examined manually for paired se- quence reads where the gap between the aligned pairs is consistent with the size of the variant, or where both se- quence pairs align but one aligns with multiple sequence mismatches..
- Breakpoint analysis of NA12249, the sample carrying the DUP27 variant, showed that DUP27 breakpoint is in the same intron as DUP2 (Supplementary Fig.
- 1 Structure of the glycophorin reference allele.
- A representation of the reference allele assembled in the GRCh37/hg19 assembly is shown, with the three distinct paralogous ~ 120 kb repeats of the glycophorin region coloured green, orange and purple, carrying GYPE, GYPB and GYPA respectively.
- Sequence read depth (SRD) analysis of selected deletions (DEL1, DEL2, DEL6, DEL7) is shown on the left.
- Individuals homozygous or DEL1 or DEL2, are shown in the plot with a very low sequence read depth.
- Above each plot the coloured bars show the glycophorin repeat regions, as in Fig.
- A schematic diagram next to the corresponding fibre-FISH image shows the structure of each allele inferred from the fibre-FISH and SRD analysis.
- Although the variants are the same across most of the sequence, two variants are GYPB -like in DUP2 and GYPA -like in DUP27.
- It is unlikely that DUP8 has a phenotype, given the involvement of the 5′ end of GYPE , which is not expressed.
- Other variants involve breakpoints within 1 kb of a gene coding region and could potentially affect expression levels of the neighbouring gene..
- We investigated whether the breakpoints we had found co-localised with known meiotic recom- bination hotspots previously determined by anti-DMC1 ChIP-Seq of the testes of five males [24.
- Importantly, the recombination hotspot dataset mapped hotspots in individuals carrying different alleles of the highly-vari- able PRDM9 protein, a key determinant of recombin- ation hotspot activity, with different alleles activating.
- The overlap between the PRDM9 C allele hotspot and the structural variant breakpoints is statistically significant (two-tailed Fisher’s exact test, p = 0.012) and reflects the observa- tion that there are more different rare structural variants in sub-Saharan African populations, with high frequen- cies of the C allele, than in European populations where the C allele is almost absent (allele frequency .
- These losses and gains are consistent with an origin by non-allelic homologous recombination (NAHR) between glyco- phorin repeats, with particular involvement of the PRDM9 C allele, which is at appreciable frequencies in African populations and directs high recombination rates at its cognate recombination hotspots.
- Sequence read depth (SRD) analysis of selected duplications (DUP2, DUP3, DUP7, DUP8, DUP14 and DUP29) is shown on the left.
- 1, with an additional green-labelled PCR product specific to the glycophorin E repeat for HG03686.
- We then used window-based analysis of sequence read depth and paralogue-specific allele-specific PCR and Sanger sequencing to refine copy number breakpoints..
- Our approach has the advantage that it does not rely on a sudden change in sequence read depth for CNV detection by a HMM, which may be compromised by poor mappability of some sequence reads in the breakpoint region and assumptions about the absence of somatic variation, with the consequence that the expected copy number reflecting an integer value.
- This is because, for these sizes, the relative increase or decrease in the number of mapped reads at the glycophorin region is likely to be below the threshold used to call a copy number change.
- We identify nine new structural variants at the human glycophorin locus, characterise breakpoints and muta- tional mechanisms for known and novel structural vari- ants, and show that recombination hotspot activity has influenced the nature of the structural variants observed..
- For some of the variants, targeted high coverage se- quence using very long reads will help refine some of the breakpoints.
- a Sequence read depth (SRD) analysis of three individuals heterozygous for the DUP5 variant.
- b Representative fibre-FISH images from the DUP5 index sample HG02585.
- Representative fibre-FISH images from the DUP5 index sample HG02585.
- 1, except the red probe is fosmid G248P89366H1 and the pink probe is the glycophorin E repeat-specific PCR product.
- e Sequence read depth (SRD) analysis (left) and fibre-FISH analysis (right) of the index sample HG03729 heterozygous for DUP26 variant.
- 1, except with the addition of the glycophorin E repeat-specific PCR product labelled in pink (c, d) or green (e).
- DNA sequences from the 1000 Genomes project and the Simons diversity project had been previously aligned to reference GRCh37 (hg19) to generate the alignment bam files.
- sample accession number SAMN00001619), and aligned to GRCh37 using standard approaches: FastQC v0.11.5 and Cutadapt v01.11 to trim reads and adapters, map- ping using BWA-MEM v0.7.15, processing of the BAM files using SAMtools v1.8, local realignment was done using GATK v3.6 and duplicate reads marked using Pic- ard v.1 and removed using SAMtools.
- Samples from the Brazilian genomes and the Gambian genome diversity project had been aligned to GRCh38..
- The reference region has no segmental duplications, and is absent from copy number variation according to the gold standard track of the database of Genomic Variants (DGV) [40].
- A ratio of the number or reads mapping to the glycophorin region to the number of reads mapping to the reference region allows an estimate of the total in- crease or decrease of sequence depth spanning the.
- a Sequence read depth analysis, indicating position of PCR primers (not to scale).
- d Multiple sequence alignment of the variant-specific PCR product, with homologous sequence on the GYPA repeat and the GYPE repeat.
- e A model of the generation of the variants by NAHR.
- The glycophorin region is shown together with the glycophorin genes..
- Below are the breakpoint regions for each structural variant, labelled in blue for the distal breakpoint in the variant, and red for the proximal breakpoint in the variant.
- Because the size of the re- gions used for sequence read count is ~ 320 kb, and spans the whole glycophorin region, we would not ex- pect copy number losses within the region to necessarily show read depth ratios of 0 or 0.5 for homozygous or heterozygous losses respectively, unless the whole 320 kb region is deleted.
- The main peak of the histogram below 0.9 is at ~ 0.8, and above 1.1 is at 1.2, suggesting that the copy number gains or losses identified in those peaks in the histograms are ~ 100 kb and heterozygous.
- Samples showing rations of ~ 0.6 for losses or ~ 1.4 for gains rep- resent either larger copy number changes in the hetero- zygous state, or homozygous ~ 100 kb copy number alterations..
- The presence and nature of structural variants were assessed by examination of quality of the plots, ensuring that copy number gains and losses and a consistent gain.
- or loss of sequence read depth across a contiguous re- gion.
- For the 1000 Genomes project, 6 samples were identified as harbouring copy number gains or losses across the glycophorin region, but failed to pass this sub- sequent 5 kb window step because sequence read depth was noisy across the region and no consistent region showing loss or gain of read depth was seen.
- The twenty sam- ples included samples that showed the putative 15 kb deletion in the Simons diversity samples, but not in the 1000 Genomes samples, further supporting our assertion that this was an artefact..
- Fibre-FISH.
- The probes used in this study included four WIBR-2 fos- mid clones selected from the UCSC Genome Browser GRCh37/hg19 assembly and a 3632-bp PCR product that is specific for the glycophorin E repeat [12.
- Annealing specificity of the PCR primer was en- hanced by incorporating a locked nucleic acid at that particular 3′ position of the PCR primer [21].
- A breakpoint was called in the transition region between three paralogous sequence.
- We used the same nomenclature as reference [11] when our variant could be identified as the same variant in the same sample from the 1000 Genomes project.
- Other variants, which either had not been unambiguously identified in the 1000 Genomes previously or were identified in other sample cohorts, were given DEL or DUP numbers following on from var- iants catalogued previously.
- A list of the samples carrying particu- lar variants is also included as supplementary data..
- The statistical significance of the overlap was calculated using the fisher command in BEDTools, which uses a Fisher ’ s exact test on the number of over- laps observed between two BED files..
- Histograms of sequence read depths of the glycophorin region.
- Histograms of normalised sequence read depths of the four cohorts used for this study, with red indicating putative deletions and blue putative duplications.
- The figure shows an alignment of the DUP2 variant sequence and the DUP27 variant sequence from the index samples NA18593 and NA12249 respectively..
- Variable nucleotides in the alignment are coloured depending on whether they.
- The funding agencies had no role in the design, analysis or interpretation of data..
- Large multiallelic copy number variations in humans.
- A novel gene member of the human glycophorin a and B gene family..
- Glycophorin variants and Plasmodium falciparum: protective effect of the Dantu phenotype in vitro.
- Red blood cell tension controls Plasmodium falciparum invasion and protects against severe malaria in the Dantu blood group.
- Origins and functional impact of copy number variation in the human genome..
- Two Prevalent ∼ 100-kb GYPB Deletions Causative of the GPB-Deficient Blood Group MNS Phenotype S – s – U – in Black Africans.
- Variants of the protein PRDM9 differentially regulate a set of human meiotic recombination hotspots highly active in African populations.
- Capsid region involved in hepatitis a virus binding to glycophorin a of the erythrocyte membrane.
- The database of genomic variants: a curated collection of structural variation in the human genome

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt