« Home « Kết quả tìm kiếm

Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome


Tóm tắt Xem thử

- Chromosome-length genome assembly and structural variations of the primal Basenji dog ( Canis lupus familiaris ) genome.
- CanFam_Bas is superior to CanFam3.1 in terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly.
- The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds.
- Full list of author information is available at the end of the article.
- Basenjis are an ancient breed that sits at the base of the currently accepted dog phylogeny [10].
- One explanation for the un- usual vocalisation of the Basenji is that the larynx is flat- tened [12].
- The shape of the dingo and NGSD larynx is not reported..
- The basal position of the Basenji makes it useful as a general reference for variant analysis, but the generation of clade-specific ge- nomes is likely to be important for canine nutrition and disease studies.
- Following a final round of Pilon [21] BGIseq-polishing, scaffolds were mapped onto the CanFam3.1 [13] using PAFScaff v .
- Seven rounds of iterative Diploidocus tidying of the remaining sequences removed 277 (832 kb) as low coverage/quality and 481 (1.58 Mb) as probable haplo- tigs, retaining 483 core scaffolds and 165 probable repeat-heavy sequences [14] as China v1.0 (Fig.
- CanFam3.1 using PAFScaff v0.4.0.
- His sire is an American bred dog while his dam was imported from the Haut-Ule district of the DRC Congo N E, in 2006.
- mapped onto CanFam3.1 as China v1.1.
- It was observed that the mitochondrial chromosome was missing and China v1.1 Chromosome 29 contained a 33.2 kb region consisting of almost two complete cop- ies of the mitochondrial genome that were not found in other dog genome assemblies.
- of the assembly and show a high level of synteny with CanFam3.1 and CanFam_GSD (Fig.
- The com- pleteness and accuracy of the genome as measured by BUSCO v3 [33] (laurasiatherian, n = 6253) is also super- ior to CanFam3.1 and approaches that of CanFam_GSD (92.9% Complete, 3.75% Fragmented, 3.34% Missing)..
- Table 1 Genome assembly and annotation statistics for Basenji assemblies vs CanFam3.1 and CanFam_GSD.
- CanFam_Bas (China) Wags CanFam3.1 CanFam_GSD.
- 3D-F, Additional File 1), confirming the robustness of determined DNA methylation profile of the blood DNA..
- However, the Wags assembly of the X chromosome is smaller in size (59 Mb vs 125 Mb) and shows multiple rearrangements as a result of lower sequence coverage on the sex chromosomes (~21x).
- In addition, the Wags assembly includes 3.6 Mb of the Basenji dog Y for future comparative stud- ies of this unique chromosome..
- Both proteomes compare favourably with CanFam3.1 in terms of completeness (Table 1).
- Approximately 90% of the Quest For Orthologues (QFO) reference dog proteome [39] is covered by each GeMoMa proteome, confirming com- parable levels of completeness (Supplementary Table 3, Additional File 2)..
- When the CanFam_Bas GeMoMa proteome was com- pared to Wags, CanFam3.1 and CanFam_GSD, over 91%.
- To investigate this further, the Wags, CanFam3.1 and Can- Fam_GSD genomes were mapped onto CanFam_Bas and the coverage for each gene calculated with Diploido- cus v0.10.0.
- Of the 27,129 predicted genes are found at least 50% covered in all four dogs, whilst only are completely unique to Can- Fam_Bas.
- A considerably greater proportion of the missing genes in Wags (64.2% versus 11.4% in CanFam3.1 and 15.8% in CanFam_GSD) were on the X chromosome.
- Only 7 of the 302 missing Wags genes (2.3%) had no long read coverage, whilst of genes missing in CanFam_GSD were confirmed by an absence of mapped long reads..
- Two copies of the Amy2B gene were identified in a tandem repeat on Chromosome 6 of the CanFam_Bas assembly.
- CanFam_Bas .
- CanFam_GSD .
- Wags assembly has a single copy of the Amy2B region, which includes 90% of the Amy2B coding sequence (data not shown).
- Single-copy depth analysis of Wags esti- mates 4.97 (90% at 253.8x) and at 253.5x) copies of the AMY2B coding sequence and tandem re- peat unit, respectively.
- During the assembly of the female Basenji genome (China v1.0), the mitochondrial genome was erroneously assembled into a NUMT fragment on chromosome 29..
- 5A, Additional File 1), except for low coverage in a region of the D-loop as has been previously reported in primates [41].
- An additional 26 NUMTs are partially covered in CanFam3.1 and 9 are entirely absent.
- Whilst this could represent a breed difference, 19 of the 35 additional incomplete NUMTs in CanFam are also incomplete in Wags, whilst Wags has a further ten incomplete NUMTs that are present in CanFam3.1 (Supplementary Table 5, Add- itional File 2).
- To discover unique large-scale structural differences in assembled genomes of the three breeds – Basenji,.
- German Shepard and Boxer – we performed pairwise alignments of CanFam_Bas, CanFam3.1 and CanFam_.
- There was, however, a large inversion in CanFam3.1 that was not present in CanFam_Bas or Can- Fam_GSD (Fig.
- We next over- lapped the CanFam SV calls relative to CanFam3.1 and found 18,063 long read deletion calls overlapped between Basenji and GSD.
- For Basenji this represented 70.00% of the total 25,260 Basenji deletions while for GSD this rep- resented 73.25% of the total 24,138 GSD deletions (Fig..
- of the total 15,434 Basenji insertions and 36.46% of the total 14,111 GSD insertions (Fig.
- (Supplementary Table 8, Additional File 2) were mapped on to three reference genomes Basenji (CanFam_BAS), Boxer (CanFam3.1) and GSD (CanFam_GSD).
- To investigate this result further and test for interactions we focused upon breeds within each of the monophyletic clades close to or asso- ciated with the three reference genomes.
- High-quality set of consensus structural variant (SV) calls generated from the intersection of the ONT and PacBio SV calls for each breed versus reference comparison, limited to SVs >.
- Over- all, the CanFam_Bas and CanFam_GSD performed equally well while the relative mapping was lowest for CanFam3.1 (Fig.
- In this case CanFam_BAS detected higher number of changes than did either CanFam3.1 or CanFam_GSD (Fig.
- In combination these data attest to the quality of the CanFam_Bas assembly.
- In total, 64.2% of the missing genes are on the X chromo- some, compared to under 16% in the other two individ- uals.
- An ex- haustive search of the genome detected a 33.2 kb region consisting of almost two complete copies of the mito- chondrial genome on Chromosome 29 that was not present in the other dog assemblies analysed.
- 6 Comparative short read mapping and single nucleotide variant calling for 58 dog breeds versus three reference genomes: CanFam_Bas (Bas), CanFam3.1 (3.1) and CanFam_GSD (GSD).
- The Boxer is a member of the European Mastiff clade and the GSD is a member of the New world clade.
- Previous studies have shown that Basenjis may have 4–18 copies [6], placing these es- timates at the lower end of the range.
- Additional work is needed to establish the source of the differences between ddPCR and read depth estimates..
- The high variation in Amy2B copy number suggests at least three possible evolutionary histories of the gene in Basenjis.
- Second, the ancestral founding popu- lation of the modern Basenji may have been poly- morphic for Amy2B.
- This analysis identified over 70, 000 SVs in CanFam_Bas relative to CanFam3.1 and over 64,000 SVs in GSD relative to CanFam3.1 (Supplemen- tary Table 7, Additional File 2).
- Further, each consen- sus set contains several hundred SVs overlapping anno- tated exons, highlighting the importance of the selection of appropriate reference genome for analysis of specific genomic regions..
- Next, we examined the overlap of the consensus calls for SVs over 100 bp of GSD and Basenji relative to Can- Fam3.1.
- The basal position of the Basenji makes it useful as a reference for variant ana- lysis as there are clear biases affecting related breeds seen for both the GSD and Boxer reference genomes..
- CanFam_Bas offers improved genome contiguity relative to CanFam3.1 and can serve as a representative basal breed in future canid studies.
- An overview of the China assembly workflow is given in Supplementary Fig.
- Any scaffolds with median cover- age less than three (e.g., less than 50% of the scaffold covered by at least three reads) were filtered out as low- coverage scaffolds.
- Correction of mitochondrial insertion into chromosome 29 NUMT analysis identified a 33.2 kb region consisting of almost two complete copies of the mitochondrial gen- ome, not present in other dog genome assemblies..
- ONT reads that mapped onto both flanking regions of the 33.2 kb putative NUMT were extracted and reas- sembled with Flye (v .
- Reads mapping to at least 5 kb of the as- sembled region including some immediate flanking sequence were extracted (66 reads, 1.50 Mb) and polished with one round of Racon (v m 8 -x.
- The polished NUMT region was mapped on to the Chromosome 29 scaffold with GABLAM (v blast+ v2.9.0 [67] megablast) and stretches of 100% sequence identity identified each side of the NUMT.
- To assemble the mitochondrion, ONT reads were mapped onto a construct of three tandem copies of the CanFam3.1 mtDNA with minimap2 (v ax map-ont --secondary = no).
- Reads with hits were ex- tracted using SAMTools (v1.9) [66] fasta and mapped onto a double-copy CanFam3.1 mtDNA with GABLAM (v blast+ v2.9.0 [67] megablast).
- coverage of the mtDNA.
- The polished mtDNA assembly was mapped onto CanFam3.1 mtDNA with GABLAM (v blast+ v2.9.0 [67] megablast) and circu- larised by extracting a region from the centre of the as- sembly corresponding to a single full-length copy with the same start and end positions.
- At each stage of the assembly, summary statistics were calculated with SLiMSuite SeqList (v quality was assessed with Merqury (v20200318) (Meryl v20200313, bedtools v SAMTools v1.9 [66], java v8u45, igv v2.8.0) and completeness assessed with BUSCO (v3.0.2b) [33] (BLAST+ v HMMer v Augustus v3.3.2, EMBOSS v6.6.0, laurasiatherian lineage (n = 6253.
- The mappability of the MethylC-seq library was 86%.
- Sire is AM Ch C- Quests Soul Driver, HM827502/02, and his dam is Avongara Luka, HP345312/01, a native female dog imported from the Haut-Ule district of the DRC Congo N E, in 2006.
- Additional polish- ing of the assembly for residual indels was done by aligning 32x coverage of Illumina data and the Pilon algorithm [21].
- The variance and standard deviation of the estimate was calculated using X reg for all single copy BUSCO genes.
- The copy number of the beta amylase gene Amy2B was calculated using Diploidocus (v0.10.0) (runmode = regcnv) [27] using a modification of the single locus copy number estimation (above) to account for multiple copies of the gene in the assembly.
- First, the AMY2B protein sequence from CanFam3.1 (UniprotKB: J9PAL7) was used as a query and searched against the genome with Exonerate (v to identify assembled copies of the Amy2B gene.
- Estimated N reg values were converted into a number of copies by multiplying by the proportion of the query found covered by that region.
- To further investigate the robust- ness of the method and improve the Amy2B copy number estimate in CanFam_Bas, analysis was repeated with ONT reads at least 5 kb in length and at least 10 kb in length..
- In addition, assembly coverage for each NUMT fragment was calculated for Wags, CanFam3.1 and Can- Fam_GSD.
- To make a fair compari- son of the influence of genome quality and completeness on annotation, CanFam3.1 was annotated with the same pipeline.
- Annotation completeness was estimated using BUSCO v3 [33] (laurasiatherian, n = 6253, proteins mode), run on a reduced annotation consisting of the longest pro- tein per gene.
- Bas gene was calculated for Wags, CanFam3.1 and CanFam_GSD.
- Reads were mapped against China v1.0, CanFam3.1 and CanFam_GSD.
- SNVs and small indels were called from the Illumina reads of the 58 representative breeds against three refer- ence genomes (Basenji China v1.0, CanFam3.1, and Can- Fam_GSD).
- RAZ selected the female China, obtained samples and thereby provided a substantial contribution to the acquisition of the data.
- GSJ obtained the ethics approval for Wags, selected the individual, obtained the sample and thereby provided a substantial contribution to the acquisi- tion of the data from the male Basenji.
- All authors agree to be personally accountable for their own contributions and ensure that questions related to the accuracy or in- tegrity of any part of the work are appropriately investigated, resolved, and the resolution documented in the literature..
- For China, all experimentation was performed under the approval of the University of New South Wales Ethics Committee (ACEC ID: 18/18B) and with the owner ’ s written consent.
- Demographic history, selection and functional diversity of the canine genome.
- Genomic regions under selection in the feralization of the dingoes.
- Interspecific gene flow shaped the evolution of the genus Canis.
- The larynx of the basenji dog.
- Genome sequence, comparative analysis and haplotype structure of the domestic dog.
- Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C.
- A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.
- Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat.
- De novo assembly of the Aedes aegypti genome using hi-C yields chromosome-length scaffolds.

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt