Telomere-to-telomere assembly of the genome of an individual Oikopleura dioica from Okinawa using Nanopore-based sequencing

- Telomere-to-telomere assembly of the.
- Results: Here, we present a chromosome-scale genome assembly (OKI2018_I69) of the Okinawan O.
- 99% of the assembly is contained within five megabase-scale scaffolds..
- We found telomeres on both ends of the two largest scaffolds, which represent assemblies of two fully contiguous autosomal chromosomes.
- Each of the other three large scaffolds have telomeres at one end only and we propose that they correspond to sex chromosomes split into a pseudo-autosomal region and X-specific or Y-specific regions..
- At the sequence level, multiple genomic features such as GC content and repetitive elements are distributed differently along the short and long arms of the same chromosome..
- As a member of the tunicates, a sister taxonomic group to vertebrates, O..
- dioica’ s genome size is 65 – 70 Mbp [8, 9], making it one of the smallest among all sequenced animals.
- Here, we present a chromosome-length assembly of the Okinawan O.
- Based on k-mer counting of the Illumina reads, the genome was estimated to contain ~ 50 Mbp (Fig.
- We corrected sequencing errors and local misas- semblies of the draft contigs with Nanopore reads using.
- More than 99% of the Hi-C reads could be mapped to the contig assembly.
- The resulting assembly consisted of 8 megabase-scale scaffolds containing 99% of the total se- quence (Fig.
- One of the small scaffolds is a draft assembly of the mitochondrial genome that we discuss below..
- Most of the other smaller scaffolds are highly repetitive and might represent unplaced fragments of centromeric or telomeric regions.
- We annotated telomeres by search- ing for the TTAGGG repeat sequence and found that most of the megabase-scale scaffolds have single telo- meric regions: therefore, we reasoned that they represent chromosome arms.
- Table 1) comprises telomere-to- telomere assemblies of the autosomal chromosomes 1 (chr 1) and 2 (chr 2).
- We assume that the sex-specific regions belong to the long arm of the PAR, as the long arm does not contain any telomeric repeats (Fig.
- Alignment of the Illumina polishing reads to the OKI2018_I69 assembly estimated an error rate of 1.3% showing high sequence accuracy..
- 3c) shows bright, off-diagonal spots that suggest spatial clustering of the telomeres and centromeres both within the same and across different chromosomes [18]..
- The two sex-specific regions have lower apparent contact frequencies compared with the rest of the assembly which is consistent with their.
- The chromosome arms them- selves show few interactions between each other, even when they are part of the same chromosome..
- b Estimated total and repetitive genome size based on k-mer counting of the Illumina paired-end reads used for polishing the OKI2018_I69 assembly.
- c Pairwise genome alignment of the contig assemblies of I69 and I28 O..
- 3 OKI2018_I69 assembly of the Okinawan O.
- a Treemap comparison between the contig (left) and scaffold (right) assemblies of the O..
- Table 1 Comparison of the OKI2018_I69 assembly with the previously published O.
- 4 Chromosome-level features of the Okinawan O.
- It should be noted that the differences in GC contents affects the density of the GATC DpnII restric- tion enzyme recognition sites used for Hi-C library prep- aration.
- Interspersed repeats make up 14.4% of the as- sembly (9.25 Mbp.
- Of the annotated elements, the most abundant type is the long terminal repeat (LTRs;.
- 4.6%) with Ty3/gypsy Oikopleura transposons (TORs) dominating 2.97 Mbp of the sequence.
- interspersed nuclear elements (SINEs) make up a smaller portion of the OKI2018_I69 sequence (<.
- Indeed, 44% of the predicted re- peats in the Okinawan O.
- Annotation of the genome yielded 18,794 tran- script isoforms distributed among 17,260 protein-coding genes.
- The rest of the genes are either lost from the Okinawan O..
- On the other hand, the higher number of genes might be artifacts of the OdB3 and OSKA2016 annotations.
- The completeness of the annotation com- pares to the genome: BUSCO recovered 75.3% complete and 4.8% fragmented metazoan genes (Fig.
- Indeed, we found a high frequency of the non-canonical (non-GT/AG) in- trons in the OKI2018_I69 (11.
- reported that 12% of the introns were non- canonical in the OdB3 [9].
- However, more close examination is required to under- stand if it is the case for the rest of the genes.
- The ribosomal DNA gene encoding the precursor of the 18S, 5.8S and 28S rRNAs occurs as long tandem re- peats that form specific chromatin domains in the nucle- olus.
- We identified 4 full tandem copies of the rDNA gene at the tip of the PAR’s short arm, separated by 8738 bp (median distance).
- the real number of the tandem rDNA copies could range between 20 (MiSeq) and 100 (Nanopore) copies.
- 5 Quality assessment of the OKI2018_I69 genome assembly.
- The synteny- based approach with OdB3’s linkage groups as a refer- ence was only required to guide final pairing of chromo- some arms into single scaffolds of chr 1, chr 2 and PAR, as we found that these scaffolds mostly align to one of the autosomal LGs or PAR.
- The repeat landscape and proportions of various repeat classes in the genome are indicated and color- coded according to the classes shown on the right side of the figure.
- The non-repetitive fraction of the genome is shown in black.
- Table 2 Comparison of the annotations of the three O.
- 7 Draft scaffold of the mitochondrial genome in the OKI2018_I69 assembly.
- a Predicted gene annotation of the draft mitochondrial genome sequence.
- b Self-similarity plot of the draft mitochondrial genome sequence.
- A tandem repeat can be seen, which complicates the complete assembly of the mitochondrial genome from whole-genome sequencing data.
- dioica populations that will elucidate the rela- tion of the Okinawan populations to the North Atlantic and North Pacific ones..
- By comparison, in flies and mosquitoes, the degree of contacts between two arms of the same chromosome appear to be reduced but nonetheless more frequent than between different chromosomes [18].
- As we prepared our Hi-C libraries from adult ani- mals, where polyploidy is high [38], we cannot rule out that it could be a possible cause of the low inter-arm in- teractions in our contact matrix.
- The view of the OKI2018_I69 genome assem- bly can be found here:.
- We believe that the current version of the assembly will serve as an essential resource for a broad range of biological studies, including genome- wide comparative studies of Oikopleura and other spe- cies, and provides insights into chromosomal evolution..
- Q32850), and the integrity of the genomic DNA was val- idated using Agilent 4200 TapeStation (Agilent, 5067–.
- The quality of the reads before and after filtering were checked with FASTQC v0.11.5 [44].
- Read pairs that lacked one of the reads after the filtering were discarded in order to preserve paired-end information..
- Next, one round of the HaploMerger2 processing pipeline [48] was applied to eliminate redundancy in contigs and to merge haplotypes..
- 2 (“Draft chromosome scale assembly based on scaffolds of the reference genome sequence”) in Denoeud et al.
- 500×) than the rest of the assembly, and were therefore removed from the final assembly..
- The completeness and quality of the assembly were checked with QUAST v5.0.2 [52] and by searching for the set of 978 highly conserved metazoan genes (OrthoDB version 9.1) [23] using BUSCO v .
- Staged embryos were initiated by gently mixing 10 μl of the spawned male sperm to the awaiting eggs in FASW at 23 °C.
- Further processing for mRNA selection was performed with Oligo-d(T)25 Magnetic Beads (NEB, E7490) and the integrity of the RNA was validated once more with Agilent 4200 TapeStation (Agilent .
- Adapters for the creation of DNA libraries for the Illu- mina platform were added per manufacturer’s guidance (NEB, E7805) as were unique indexed oligonucleotides (NEB, E7600) to each of the three staged samples.
- The quality and completeness of the transcriptome assembly was verified with rna- QUAST v1.5.1 [61] and BUSCO..
- The quality of the predicted gene models was assessed with BUSCO..
- A draft annotation of the mitochondrial genome was obtained by submitting the corresponding scaffold (chr_.
- For visualization of the results, we converted the alignments to GFF3 format and collated the colinear “match_part” alignment blocks in “match”.
- For each of GC content, sequencing depth, repeat content, gene count, and DpnII restriction sites, the significance of the differences between long and short arms was assessed with Welch’s two-sided T test as well as a nonparametric Mann-Whitney test imple- mented in R (Suppl.
- The results of the two tests were largely in agreement, but groups were only in- dicated as significantly different if they both produced significance values below 0.05 ( p <.
- Contami- nations found in smaller scaffold of the OKI2018_I69 assembly.
