« Home « Kết quả tìm kiếm

A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds


Tóm tắt Xem thử

- A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds.
- Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera.
- Results: Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds.
- 98% of the sequence to chromosomes.
- All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome.
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Full list of author information is available at the end of the article.
- Due to these various drawbacks, the current state-of-the-art for genome assembly is to use a hybrid approach combining multiple technologies [18–21].
- Here we used four complementary technologies to generate a highly contiguous de novo assembly of the honeybee.
- N50 of the HAv1 is 5.167 Mbp, compared to 0.046 Mbp for Amel_4.5 (Fig.
- We performed scaffolding of the Amel_HAv1 contigs using BioNano data to produce version Amel_HAv2.
- Six of the sixteen chromosomes were recovered as single scaffolds and each chromosome was represented by an aver- age of 2.2 scaffolds.
- A visual overview of the 16 chromosomes is presented in Fig.
- We find a small frac- tion (0.9%) of the markers to be ambiguous.
- In Amel_4.5, 16.7 Mbp (7.3%) of the sequence is marked as repetitive and unplaced contigs have higher levels of repeat sequence than chromosome-anchored contigs (Table 2).
- 219.4 in Amel_HAv3 vs.
- We compared the respective completeness of the Amel_4.5 and Amel_HAv3 assemblies by counting the number of universal single-copy orthologues detected in either assembly with BUSCO [39].
- Overall, Amel_HAv3.
- An overview of the 16 linkage groups or chromosomes of Amel_HAv3 after anchoring and orienting the contigs according to the genetic map [38].
- extended upstream to the tip of the chromosome as dots when the area started at the first genetic map marker.
- in terms of the proportion of conserved genes located in genome scaffolds..
- After aligning the sequences, we found that most of the length difference is explained by three major intergenic indels: i) a 16 bp deletion between COX3 and tRNA-Gly.
- In both this and the previous assemblies (Amel_HAv3 and Amel_4.5), we find 12.8 Mbp of simple repeats/low complexity regions with RepeatMasker, representing 5.6% of the overall sequence and about 75% of all repeat-masked output (Additional file 1: Table S6).
- 1.4% of the assembly.
- 3b-c), although some repeat classes occupy larger proportions of the genome.
- These repeats have previously been estimated to represent 1–2% the honey- bee genome using Southern blotting and FISH, and to be clustered close to centromeres (AvaI) and the short-arm telomeres (AluI) [30, 43].
- 3c), al- though we are unable to fully assemble and map the complete sets because many of the repeats occur in un- placed contigs (89% of AluI and 41% of AvaI repeats, re- spectively).
- 500 bp in Amel_4.5 vs.
- About 16.4 Mbp of sequence that had previously been unplaced in Amel_4.5 now aligned against Amel_- HAv3 chromosomes, corresponding to 7.5% of the total Amel_HAv3 assembly (Fig.
- Chromosomal regions built from sequences that were un- placed in Amel_4.5 or unaligned to Amel_4.5 sequence represents 12% of the genome but contain 17% of simple repeats, 42% of DNA transposons, 25% of LTRs, 35% of sat- ellites and 59% of AvaI repeats (Fig.
- The telomeric repeat motif TTAGG is expected to occur as tandem arrays at the tip of the distal long-arm telo- meres of all honeybee chromosomes.
- a Location of the longest AluI cluster.
- b Location of the longest AvaI cluster.
- 5.7 kbp) at the very ends of the long arms of 14 chromosomes (all except chromo- somes 5 and 11.
- While TTAGG/CCTAAs are rare across the genome (about 8 motifs per 10 kbp or ~ 0.4% of the genomic background.
- 2), the outermost 1–2 windows of these chromosomes contain on average 1043 motifs per 10 kbp (52% of the sequence.
- The longest telomeric repeat region was assembled for chromosomes 3 and 8, containing 2142 and 1994 copies of the motif, respectively.
- For the metacentric chromo- some 1, we detected TTAGG repeats at both ends of the.
- We extracted and aligned the sequences of all distal telomeres with TTAGG arrays using MAFFT (n = 15, in- cluding both telomeres on chromosome 1), including ~ 4 kbp of the upstream subtelomeric region, and scanned the sequences for shared properties.
- Taking the sequence at chromosome 8 as reference, we find that the first 2kbp downstream of the start of the telomere is enriched for TCAGG, CTGGG and TTGGG variants (Fig.
- These polymorphisms are gradually replaced by the ca- nonical TTAGG repeat moving towards the distal ends of the telomeres, where the average pairwise divergence between telomeres accordingly is much reduced: from 12% at <.
- a The proportions of the Amel_HAv3 assembly with or without matching sequence in Amel_4.5 is displayed at the top.
- We recover a relatively conserved 3 kbp subtelomeric region upstream of the junction (avg..
- The subtelomeres contain two larger shared motifs just upstream of the junction telomere junction (Fig.
- 6): i) a ~ 350 bp (213-520 bp) fragment is located 100 bp upstream of the junction and has moderate similarities towards a 4.5kbp LINE/CR1 retrotransposon originally characterized in Helobdella robusta (CR1-18_HRo.
- a A model of the subtelomeric and telomeric regions as inferred from alignment and sequence analysis of the distal ends of 14 chromosomes (two telomere sequences from chromosome 1).
- 10-kbp telomeric region is indicated in the last box and the proportions of the canonical TTAGG repeat and variants are indicated for every 100-bp window.
- In Amel_4.5, we find subtelomeres on short con- tigs (average length of 27kbp) located at the tips of the outermost scaffolds of 13 chromosomes.
- Although many AvaI and AluI repeats re- main unmapped (see above), we find that the mapped repeats cluster toward the tips of the short-arms of most acrocentric and the center of metacen- tric chromosome 1 and possibly submetacentric chromosome 11 (Fig.
- The distribution of the putatively centromeric AvaI re- peats in Amel_HAv3 overlaps or co-occurs with experi- mental mapping of centromeres from patterns of recombination and heterozygosity in half-tetrads of the clonal Cape honeybee A.
- The high contiguity in Amel_HAv3 now facilitates further characterization of the putative centromeric regions.
- 2) are embedded in megabase-scale regions with reduced GC content com- pared to the rest of the genome (22.7% vs.
- The low-GC centromere-associated regions together span 42 Mbp of the genome and are among those that appear to have been particularly poorly assembled be- fore: these regions constitute 19.3% of the genome but contain 38% of all sequence that is unmatched against Amel_4.5 and 95% of all sequence that was unplaced in Amel_4.5 (Additional file 5: Figure S4A).
- We do not find TTAGGs associated with proximal telomeres, suggesting they are either not present at the short-arms of the honeybee chromosomes or only occur in unmappable sequence.
- Fosmid reads containing AluI repeats were found to likely have AluI mate pairs, indicating very long strings of AluIs that supersede the length of the arrays in the hybrid assem- bly.
- Because our assembly of these regions be- tween the centromeres and the short-arm telomeres remains incomplete, most of the unplaced contigs are inferred to be- long in these regions..
- The other assemblies in this list of twelve are all also based on whole-genome shotgun se- quencing using PacBio with the exception of the release 6 reference sequence of Drosophila melanogaster, which is based on sequencing of BAC clones without the use of long-read technologies [49].
- A par- ticular advantage of the honeybee for genome assembly is their haplodiploid mode of sex determination which results in the availability of haploid (male) drones, which eliminates the difficulties posed by heterozygous sites..
- Most of the new sequence incorporated into this gen- ome assembly compared with the previous one is an- chored as Mbp-scale blocks of low-GC heterochromatin around the centromeres of most chromosomes.
- These re- gions make up about 19% of the genome and are enriched for repetitive sequence and DNA transposons (Fig.
- Honeybee centromeres have been shown to contain extended arrays of the 547 bp AvaI repeat that appears to make up about 1% of the genome.
- It was not possible to demonstrate an association between AvaI and centromeres in previous assemblies due to the relative absence of the AvaI repeat and poor contiguity of these re- gions [33, 34].
- Short-arm telomeres (which are close, or proximal, to the centromeres) consist of tandem arrays of the 176 bp AluI element that make up as much as 2% of the genome.
- Although TTAGG repeats may be present beyond the AluI arrays on the short-arm telomeres, we are unable to conclu- sively map any TTAGGs to this end of the chromosomes and only anchor them to the distal telomeres on the long arms.
- About 90% of the TCAGG and CTGGG variants co-occur in the higher order repeat TCAGGCTGGG, which has also been de- tected in previous assemblies [58].
- The origin of this di- versity is unclear, but their localization towards the inner telomere suggests they are older more degenerate sequences compared to the more homogenous sequence of the outer telomere..
- These individuals were brothers of the individuals from the DH4 line used for previous honeybee genome assembly builds [33, 34]..
- The library was sequenced on 29 SMRT cells of the RSII instrument using the P6-C4 chemistry, which generated 10.2 Gb of filtered data.
- Full details of the pipeline are presented below and summarized in Additional file 6: Figure S1..
- PBJelly closed 87 (67%) gaps within scaffolds due to joins made by ARCS+LINKS and 16 (48%) of the gaps that were introduced between adja- cent scaffolds on the basis of proximity according to the genetic map.
- All of the conflicts could be traced back to original FALCON assembly and were confirmed to be chimeric.
- Therefore, we chose to resolve these conflicts in favor of the BioNano optical maps.
- This version of the assembly was desig- nated Amel_HAv2..
- The physical positions and order of the markers along and between contigs was compared to their expected order in the linkage map..
- We compared the chromosome-anchored and unplaced se- quences of the published reference assembly (Amel_4.5;.
- In order to locate distal telomeres, we estimated the density of the short telomeric repeat motif TTAGG/.
- order and orientation of the canonical set of coding se- quences, rRNAs and tRNAs along the chromosome (NCBI accession NC .
- Contigs and chromosomes in Amel_HAv3..
- Summary of the congruence and conflict observed between the hybrid assembly (Amel_HAv3) and the genetic map markers from AmelMap3 [38].
- A map of the mitochondrial sequence in the hybrid assembly (Amel_HAv3).
- A) Summary statistics are presented in the center of the circularized sequence, followed by a 100 bp sliding- window (20 bp steps) bar-plot of GC-content relative to the mitochon- drial average (15.
- The order and orientation of the coding genes (pink), rRNAs (green), tRNAs (blue) are illustrated as arrows.
- B) Alignments between Amel_HAv3 and Amel_4.5 illustrate base-level coordinates and compos- ition of the structural variants highlighted in A.
- The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data or in writing the manuscript..
- Long-read sequence assembly of the gorilla genome.
- The first near- complete assembly of the hexaploid bread wheat genome, Triticum aestivum.
- High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development.
- Single- molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.
- Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus).
- Genetic and genomic analyses of the division of labour in insect societies.
- A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.
- Whole-genome scan in thelytokous-laying workers of the Cape honeybee (Apis mellifera capensis): central fusion, reduced recombination rates and centromere mapping using half-tetrad analysis.
- A microsatellite-based linkage map of the honeybee, Apis mellifera L.
- Insights into social insects from the genome of the honeybee Apis mellifera.
- A third- generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map..
- The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization.
- Causes and consequences of crossing-over evidenced via a high-resolution recombinational landscape of the honey bee.
- The Release 6 reference sequence of the Drosophila melanogaster genome..
- A new standard for crustacean genomes: the highly contiguous, annotated genome assembly of the clam shrimp Eulimnadia texana reveals HOX gene order and identifies the sex chromosome.
- On the Origin of the Eukaryotic Chromosome: The Role of Noncanonical DNA Structures in Telomere Evolution

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt