- Chromosome-length genome assembly and structural variations of the primal Basenji dog ( Canis lupus familiaris ) genome. - CanFam_Bas is superior to CanFam3.1 in terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly. - The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds. - Full list of author information is available at the end of the article. - Basenjis are an ancient breed that sits at the base of the currently accepted dog phylogeny [10]. - One explanation for the un- usual vocalisation of the Basenji is that the larynx is flat- tened [12]. - The shape of the dingo and NGSD larynx is not reported.. - The basal position of the Basenji makes it useful as a general reference for variant analysis, but the generation of clade-specific ge- nomes is likely to be important for canine nutrition and disease studies. - Following a final round of Pilon [21] BGIseq-polishing, scaffolds were mapped onto the CanFam3.1 [13] using PAFScaff v . - Seven rounds of iterative Diploidocus tidying of the remaining sequences removed 277 (832 kb) as low coverage/quality and 481 (1.58 Mb) as probable haplo- tigs, retaining 483 core scaffolds and 165 probable repeat-heavy sequences [14] as China v1.0 (Fig. - CanFam3.1 using PAFScaff v0.4.0. - His sire is an American bred dog while his dam was imported from the Haut-Ule district of the DRC Congo N E, in 2006. - mapped onto CanFam3.1 as China v1.1. - It was observed that the mitochondrial chromosome was missing and China v1.1 Chromosome 29 contained a 33.2 kb region consisting of almost two complete cop- ies of the mitochondrial genome that were not found in other dog genome assemblies. - of the assembly and show a high level of synteny with CanFam3.1 and CanFam_GSD (Fig. - The com- pleteness and accuracy of the genome as measured by BUSCO v3 [33] (laurasiatherian, n = 6253) is also super- ior to CanFam3.1 and approaches that of CanFam_GSD (92.9% Complete, 3.75% Fragmented, 3.34% Missing).. - Table 1 Genome assembly and annotation statistics for Basenji assemblies vs CanFam3.1 and CanFam_GSD. - CanFam_Bas (China) Wags CanFam3.1 CanFam_GSD. - 3D-F, Additional File 1), confirming the robustness of determined DNA methylation profile of the blood DNA.. - However, the Wags assembly of the X chromosome is smaller in size (59 Mb vs 125 Mb) and shows multiple rearrangements as a result of lower sequence coverage on the sex chromosomes (~21x). - In addition, the Wags assembly includes 3.6 Mb of the Basenji dog Y for future comparative stud- ies of this unique chromosome.. - Both proteomes compare favourably with CanFam3.1 in terms of completeness (Table 1). - Approximately 90% of the Quest For Orthologues (QFO) reference dog proteome [39] is covered by each GeMoMa proteome, confirming com- parable levels of completeness (Supplementary Table 3, Additional File 2).. - When the CanFam_Bas GeMoMa proteome was com- pared to Wags, CanFam3.1 and CanFam_GSD, over 91%. - To investigate this further, the Wags, CanFam3.1 and Can- Fam_GSD genomes were mapped onto CanFam_Bas and the coverage for each gene calculated with Diploido- cus v0.10.0. - Of the 27,129 predicted genes are found at least 50% covered in all four dogs, whilst only are completely unique to Can- Fam_Bas. - A considerably greater proportion of the missing genes in Wags (64.2% versus 11.4% in CanFam3.1 and 15.8% in CanFam_GSD) were on the X chromosome. - Only 7 of the 302 missing Wags genes (2.3%) had no long read coverage, whilst of genes missing in CanFam_GSD were confirmed by an absence of mapped long reads.. - Two copies of the Amy2B gene were identified in a tandem repeat on Chromosome 6 of the CanFam_Bas assembly. - CanFam_Bas . - CanFam_GSD . - Wags assembly has a single copy of the Amy2B region, which includes 90% of the Amy2B coding sequence (data not shown). - Single-copy depth analysis of Wags esti- mates 4.97 (90% at 253.8x) and at 253.5x) copies of the AMY2B coding sequence and tandem re- peat unit, respectively. - During the assembly of the female Basenji genome (China v1.0), the mitochondrial genome was erroneously assembled into a NUMT fragment on chromosome 29.. - 5A, Additional File 1), except for low coverage in a region of the D-loop as has been previously reported in primates [41]. - An additional 26 NUMTs are partially covered in CanFam3.1 and 9 are entirely absent. - Whilst this could represent a breed difference, 19 of the 35 additional incomplete NUMTs in CanFam are also incomplete in Wags, whilst Wags has a further ten incomplete NUMTs that are present in CanFam3.1 (Supplementary Table 5, Add- itional File 2). - To discover unique large-scale structural differences in assembled genomes of the three breeds – Basenji,. - German Shepard and Boxer – we performed pairwise alignments of CanFam_Bas, CanFam3.1 and CanFam_. - There was, however, a large inversion in CanFam3.1 that was not present in CanFam_Bas or Can- Fam_GSD (Fig. - We next over- lapped the CanFam SV calls relative to CanFam3.1 and found 18,063 long read deletion calls overlapped between Basenji and GSD. - For Basenji this represented 70.00% of the total 25,260 Basenji deletions while for GSD this rep- resented 73.25% of the total 24,138 GSD deletions (Fig.. - of the total 15,434 Basenji insertions and 36.46% of the total 14,111 GSD insertions (Fig. - (Supplementary Table 8, Additional File 2) were mapped on to three reference genomes Basenji (CanFam_BAS), Boxer (CanFam3.1) and GSD (CanFam_GSD). - To investigate this result further and test for interactions we focused upon breeds within each of the monophyletic clades close to or asso- ciated with the three reference genomes. - High-quality set of consensus structural variant (SV) calls generated from the intersection of the ONT and PacBio SV calls for each breed versus reference comparison, limited to SVs >. - Over- all, the CanFam_Bas and CanFam_GSD performed equally well while the relative mapping was lowest for CanFam3.1 (Fig. - In this case CanFam_BAS detected higher number of changes than did either CanFam3.1 or CanFam_GSD (Fig. - In combination these data attest to the quality of the CanFam_Bas assembly. - In total, 64.2% of the missing genes are on the X chromo- some, compared to under 16% in the other two individ- uals. - An ex- haustive search of the genome detected a 33.2 kb region consisting of almost two complete copies of the mito- chondrial genome on Chromosome 29 that was not present in the other dog assemblies analysed. - 6 Comparative short read mapping and single nucleotide variant calling for 58 dog breeds versus three reference genomes: CanFam_Bas (Bas), CanFam3.1 (3.1) and CanFam_GSD (GSD). - The Boxer is a member of the European Mastiff clade and the GSD is a member of the New world clade. - Previous studies have shown that Basenjis may have 4–18 copies [6], placing these es- timates at the lower end of the range. - Additional work is needed to establish the source of the differences between ddPCR and read depth estimates.. - The high variation in Amy2B copy number suggests at least three possible evolutionary histories of the gene in Basenjis. - Second, the ancestral founding popu- lation of the modern Basenji may have been poly- morphic for Amy2B. - This analysis identified over 70, 000 SVs in CanFam_Bas relative to CanFam3.1 and over 64,000 SVs in GSD relative to CanFam3.1 (Supplemen- tary Table 7, Additional File 2). - Further, each consen- sus set contains several hundred SVs overlapping anno- tated exons, highlighting the importance of the selection of appropriate reference genome for analysis of specific genomic regions.. - Next, we examined the overlap of the consensus calls for SVs over 100 bp of GSD and Basenji relative to Can- Fam3.1. - The basal position of the Basenji makes it useful as a reference for variant ana- lysis as there are clear biases affecting related breeds seen for both the GSD and Boxer reference genomes.. - CanFam_Bas offers improved genome contiguity relative to CanFam3.1 and can serve as a representative basal breed in future canid studies. - An overview of the China assembly workflow is given in Supplementary Fig. - Any scaffolds with median cover- age less than three (e.g., less than 50% of the scaffold covered by at least three reads) were filtered out as low- coverage scaffolds. - Correction of mitochondrial insertion into chromosome 29 NUMT analysis identified a 33.2 kb region consisting of almost two complete copies of the mitochondrial gen- ome, not present in other dog genome assemblies.. - ONT reads that mapped onto both flanking regions of the 33.2 kb putative NUMT were extracted and reas- sembled with Flye (v . - Reads mapping to at least 5 kb of the as- sembled region including some immediate flanking sequence were extracted (66 reads, 1.50 Mb) and polished with one round of Racon (v m 8 -x. - The polished NUMT region was mapped on to the Chromosome 29 scaffold with GABLAM (v blast+ v2.9.0 [67] megablast) and stretches of 100% sequence identity identified each side of the NUMT. - To assemble the mitochondrion, ONT reads were mapped onto a construct of three tandem copies of the CanFam3.1 mtDNA with minimap2 (v ax map-ont --secondary = no). - Reads with hits were ex- tracted using SAMTools (v1.9) [66] fasta and mapped onto a double-copy CanFam3.1 mtDNA with GABLAM (v blast+ v2.9.0 [67] megablast). - coverage of the mtDNA. - The polished mtDNA assembly was mapped onto CanFam3.1 mtDNA with GABLAM (v blast+ v2.9.0 [67] megablast) and circu- larised by extracting a region from the centre of the as- sembly corresponding to a single full-length copy with the same start and end positions. - At each stage of the assembly, summary statistics were calculated with SLiMSuite SeqList (v quality was assessed with Merqury (v20200318) (Meryl v20200313, bedtools v SAMTools v1.9 [66], java v8u45, igv v2.8.0) and completeness assessed with BUSCO (v3.0.2b) [33] (BLAST+ v HMMer v Augustus v3.3.2, EMBOSS v6.6.0, laurasiatherian lineage (n = 6253. - The mappability of the MethylC-seq library was 86%. - Sire is AM Ch C- Quests Soul Driver, HM827502/02, and his dam is Avongara Luka, HP345312/01, a native female dog imported from the Haut-Ule district of the DRC Congo N E, in 2006. - Additional polish- ing of the assembly for residual indels was done by aligning 32x coverage of Illumina data and the Pilon algorithm [21]. - The variance and standard deviation of the estimate was calculated using X reg for all single copy BUSCO genes. - The copy number of the beta amylase gene Amy2B was calculated using Diploidocus (v0.10.0) (runmode = regcnv) [27] using a modification of the single locus copy number estimation (above) to account for multiple copies of the gene in the assembly. - First, the AMY2B protein sequence from CanFam3.1 (UniprotKB: J9PAL7) was used as a query and searched against the genome with Exonerate (v to identify assembled copies of the Amy2B gene. - Estimated N reg values were converted into a number of copies by multiplying by the proportion of the query found covered by that region. - To further investigate the robust- ness of the method and improve the Amy2B copy number estimate in CanFam_Bas, analysis was repeated with ONT reads at least 5 kb in length and at least 10 kb in length.. - In addition, assembly coverage for each NUMT fragment was calculated for Wags, CanFam3.1 and Can- Fam_GSD. - To make a fair compari- son of the influence of genome quality and completeness on annotation, CanFam3.1 was annotated with the same pipeline. - Annotation completeness was estimated using BUSCO v3 [33] (laurasiatherian, n = 6253, proteins mode), run on a reduced annotation consisting of the longest pro- tein per gene. - Bas gene was calculated for Wags, CanFam3.1 and CanFam_GSD. - Reads were mapped against China v1.0, CanFam3.1 and CanFam_GSD. - SNVs and small indels were called from the Illumina reads of the 58 representative breeds against three refer- ence genomes (Basenji China v1.0, CanFam3.1, and Can- Fam_GSD). - RAZ selected the female China, obtained samples and thereby provided a substantial contribution to the acquisition of the data. - GSJ obtained the ethics approval for Wags, selected the individual, obtained the sample and thereby provided a substantial contribution to the acquisi- tion of the data from the male Basenji. - All authors agree to be personally accountable for their own contributions and ensure that questions related to the accuracy or in- tegrity of any part of the work are appropriately investigated, resolved, and the resolution documented in the literature.. - For China, all experimentation was performed under the approval of the University of New South Wales Ethics Committee (ACEC ID: 18/18B) and with the owner ’ s written consent. - Demographic history, selection and functional diversity of the canine genome. - Genomic regions under selection in the feralization of the dingoes. - Interspecific gene flow shaped the evolution of the genus Canis. - The larynx of the basenji dog. - Genome sequence, comparative analysis and haplotype structure of the domestic dog. - Canfam_GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping and Hi-C. - A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. - Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat. - De novo assembly of the Aedes aegypti genome using hi-C yields chromosome-length scaffolds.
Xem thử không khả dụng, vui lòng xem tại trang nguồn hoặc xem
Tóm tắt