- Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (>. - Moreover, SV detection software was originally developed and tested using the human genome or the genome of the model plant Arabidopsis thaliana , so this software may not efficiently detect SVs in pear. - Sequen- cing of the genome of Pyrus bretschneideri cv. - bretschneideri ) and is one of the primary pear cultivars grown in China. - We have conducted a systematic analysis using ‘Yali’ genome NGS and long-read sequencing data to compare the performances of several com- monly used SV-calling software packages using short reads, namely Pindel [25], BreakDancer [33], IMR/. - Then, the reliability of selected ‘ Yali ’ pear SVs was verified using visualization tools. - Sequencing and mapping of the ‘ Yali ’ genome. - Short read sequencing of the pear ‘Yali’ genome was conducted using the IIIumina HiSeq™ 2000 platform for pair-end sequencing, and the sequencing depth was 60×.. - The quality of the raw rese- quencing data was determined using FastQC (https://. - Of the clean reads, 97.15% were mapped to the ‘Dang- shansuli’ pear genome using Burrows-Wheeler-Aligner (BWA) software [37]. - Long-read sequencing data for ‘Yali’ were generated using the PacBio platform, and the sequencing depth was 30×. - SVs between ‘ Yali ’ and the reference genome detected using different algorithms and sequencing data. - Depending on the performances of the nine SV callers, which are based on different algorithms (Table 1), up to eight types of SVs in the ‘Yali’ genome were detected: in- sertions, deletions, inversions, duplications, transloca- tions, MNPs (multiple nucleotide polymorphisms), CTXs and ITXs (Table 1). - The number of SVs detected by the nine callers, categorized based on type and length,. - Of the nine SV callers, SVIM detected the highest number of SVs. - Detailed informa- tion about the number of SVs called by each software package is shown in Fig. - ‘Yali’ pear genome. - Of the SVs 79.47% were deletions, and no insertions lon- ger than 400 bp were identified (Fig. - IMR/DENOM [34] utilizes local de novo assemblies and iterative read mapping to the reference sequence to identify SVs [38]. - Table 1 Comparison of the nine types of SV detection software Data type Detection. - An overview of the nine SV callers, including the types of SVs detected (INS: insertion, DEL: deletion, INV: inversion, DUP: duplication, TRA: Translocation, ITX: intra-chromosomal translocation, CTX: inter-chromosomal translocation) and the mutation signals used (SR: split reads, RP: read pairs, AS: assembly). - 1 kb) but it could not de- tect large deletions in ‘Yali’ (>. - Moreover, Platypus could not call insertions longer than 300 bp, and over 50% of the SVs identified ranged from 50 bp to 75 bp in length. - Moreover, more than 97% of the inversions and more than 94% of the dupli- cations called by DELLY were greater than 1 kb in length.. - 1 The number and types of SVs were called by seven software packages (Pindel, DELLY, BreakDancer, IMR/DENOM, Platypus, Lumpy, MetaSV) using next-generation sequencing data (60× sequencing depth), and two software packages (Sniffles, SVIM) applied long-read sequencing data (30× sequencing depth). - The Integrative Genomics Viewer (IGV) browser was first used to confirm the presence of the SVs called by each caller. - The accuracies of Pindel (58%) and BreakDancer (58%) were lower than those of the other callers. - For the IMR/DENOM and Platypus software packages, which are based on assembly, the average ac- curacies of SV detection (81 and 66%, respectively) were higher than those of the other types of software, demon- strating that callers based on assembly algorithms detect SVs with higher confidence. - The accuracy of the SVs called by MetaSV (70. - According to the performances of the seven software packages using NGS data, Pindel, BreakDancer, IMR/. - IMR/DENOM . - Pindel-IMR/DENOM 1 502 0 0. - DELLY-IMR/DENOM . - BreakDancer-IMR/DENOM 0 4729 0 0. - DELLY-BreakDancer-IMR/DENOM 0 4423 0 0. - high percentage, 66.42%, of the duplications identified by Pindel were also identified by DELLY, but only 2.11%. - DENOM can only detect insertions and deletions (Table 1), the number of inversions and duplications overlap- ping with those identified by the other three software packages was 0. - Of the deletions identified by IMR/DENOM, 8.53% were also identified by Pindel, and 66.54% of the Pindel deletions overlapped with the IMR/DENOM deletions. - Of the DELLY inser- tions, 26.06% were identified by IMR/DENOM, and 12.21% of IMR/DENOM insertions were identified by DELLY. - However, 45.02% of the DELLY deletions over- lapped with those identified by IMR/DENOM, while over 85% of IMR/DENOM deletions were identified by DELLY. - Although 100% of the BreakDancer deletions also overlapped with those identified by DELLY, only 5.68% of DELLY deletions were identified by BreakDancer.. - When comparing the combination of three software packages, few of the insertions called by Pindel, DELLY and IMR/DENOM overlapped, and no insertions called by these programs overlapped with those called by BreakDancer. - Al- though Pindel, DELLY and IMR/DENOM shared fewer than 10% of deletions with each other, when comparing the output of Pindel, DELLY and BreakDancer, all of the deletions identified by BreakDancer, 66% of the deletions identified by Pindel and 36.27% of deletions identified by DELLY overlapped. - DENOM, and BreakDancer and IMR/DENOM. - DELLY, BreakDancer and IMR/DENOM. - and Pindel, BreakDancer and IMR/DENOM. - We then annotated the SVs detected by five individual callers, three using NGS data, each based on a different algorithm (Pindel, DELLY, and IMR/DENOM), and two using long-read sequencing data (Sniffles, which de- tected more SVs, and SVIM, which detected higher- confidence SVs), and observed the number of genes within SVs commonly identified by these callers (Fig. - These 264 genes will be the main targets for future functional studies of the variants be- tween ‘Yali’ and ‘Dangshansuli’ pear. - DENOM and Platypus, the number of SVs increased as sequencing depth increased to 50×. - For Pindel, BreakDancer, DELLY, Lumpy, Sniffles and SVIM, the number of SVs called ob- viously increased as the sequencing depth increased.. - 2 Comparison of the number of genes within SVs identified using NGS-based software and long-read sequencing-based software. - The yellow bars indicate the number of SVs identified by an individual software package and the black bars indicate the number of SVs identified by combinations of software packages. - The goal of this study was to detect SVs with higher ac- curacy using the ‘Yali’ resequencing data. - IMR/DENOM and Platypus are based on assembly. - The overlapping SVs identified by multiple software packages were more. - 3 The number of four SV types ( Insertion (a), Inversion (b), Deletion (c), Duplication (d)) were identified by nine software packages at different sequencing depths. - In our study, we compared the sensitivities, accuracies and computational equipment requirements of seven common software packages using ‘Yali’ pear NGS data and two software packages using ‘Yali’ pear long-read se- quencing data to provide insights for choosing the most appropriate SV-calling program. - Using SAMtools, the mean sequence insert size of ‘Yali’ was found to be 320 bp. - Of the two software packages using long-read sequencing data, SVIM showed higher sensitivity, probably because SVIM collects, clusters and combines SV signatures from read alignments [32]. - selected five software packages for finding overlapping SVs, and finally validated the accuracy of the overlapping SVs.. - In our study, we found that the depth of NGS and long read sequencing absolutely affected the number of SVs called (Fig. - A combination of multiple software packages is recommended for the detection of more types of SVs with higher accuracy. - ‘Yali’ plants were grown in an experimental nursery at. - FastQC was used to check raw sequencing data in FASTQ format during the first major step of sequence data pre- processing (fastqc -o yali_fastq yali_1.fq yali_2.fq). - Trimmomatic, which is a fast, multithreaded command line tool for trimming paired-end and single reads produced by Illumina NGS technology [36], was used to trim reads using the following parameters: java –jar trimmomatic-0.36.jar PE -phred33 -trimlog logfile yali_1.fq yali_2.fq yali.read_1.fq yali.trim.read_1.fq yali.read_2.fq yali.- trim.read_2.fq ILLUMINACLIP: /Trimmomatic/adapters/. - The pair-end sequencing reads of ‘Yali’ were aligned to the. - ‘Dangshansuli’ reference genome using the ‘align’ step (bwa aln -t 20 dangshansuli.fasta yali.read_1.fq >. - yali.read_. - bwa aln -t 20 reference yali.read_2.fq >. - bwa sampe dangshansuli.fasta yali.read_1.sai yali.read_2.sai yali.read_1.fq yali.read_2.fq >. - yali.sam). - NGMLR was used to map long reads to the ‘Dangshansuli’ reference genome (ngmlr –t 50 –r dangshansuli.fasta -q yali_pacbio.fastq –o yali.sam). - Description of the four types of SV software. - Seven types of SV-calling software using NGS data and two types of SV-calling software using long-read sequencing data, each based on different algo- rithms, were used to detect the SVs between ‘Yali’ and. - The first step is ‘bam2pindel.pl’, the purpose of which is to extract read pairs for use by Pindel (bam2pindel_bwa.pl -i yali- sortrmdup.bam -o output_prefix -s yali –om). - BreakDancer-max had the ability to predict five types of SVs from ‘Yali’ sequencing data: insertions, deletions, inversions, and inter- and intra-chromosomal translocations.. - IMR-DENOM [34]. - The input files consisted of the indexed refer- ence fasta file and the ‘Yali’ sorted BAM file. - The generic option ‘-t’ can be changed to detect other types of SVs.. - Before detecting SVs, the BAM files *.bam, *.splitters.bam and *.discordants.- bam were obtained using the BWA-MEM alignment in speedseq [48] (speedseq align -R “@RG\tID:id\tSM:sam- ple\tLB:lib” dangshansuli.fasta yali.read_1.fq yali.read_. - The output contained important information such as the type and size of SV, the chromosome ID, and the SV length (lumpyexpress –B yali.bam –S yali_splitters.-. - bam –D yali.discordants.bam –o yali.output). - lumpy_vcf yali.vcf --bam yali.bam --outdir. - Default parameters were used (sniffles –m yalisortrmdup.bam –v yali.vcf).. - Validation of SVs detected using NGS data and long-read sequencing data. - erence genome sequence, and the BAM file of ‘Yali’ was loaded to visually confirm the presence of the identified deletions and insertions.. - GO analysis of 264 genes within SVs commonly identified by five SV callers.. - KEGG analysis of 264 genes within SVs commonly identified by five SV callers.. - The number of genes within SVs detected by software packages using NGS data (a) and long-read sequencing data (b).. - GO analysis of 403 genes within SVs commonly identified by three SV callers using NGS data.. - KEGG analysis of 403 genes within SVs commonly identified by three SV callers using NGS data.. - GO analysis of 4495 genes within SVs commonly identified by two SV callers using long-read sequencing data.. - KEGG analysis of 4495 genes within SVs commonly identified by two SV callers using long-read sequencing data.. - Verification of SVs in ‘ Yali ’ through comparisons with the ‘ Dangshansuli ’ reference genome.. - We gratefully thank the Changli Fruit Research Institute, Hebei Academy of Agricultural Sciences, China for providing the ‘ Yali ’ pear as experimental material.. - Funding from the National Science Foundation of China (31725024 and 31672111) supported the next-generation sequencing of ‘ Yali ’ pear, and funding from the Earmarked Fund for the China Agriculture Research System (CARS-28) supported the long-read sequencing of ‘ Yali pear. - DNA extraction and library preparation for ‘ Yali ’ were performed using funding from the “ 333 High Level Talents Project ” of Jiangsu Province (BRA2016367).. - The ‘ Yali ’ pear plant samples were obtained from the Changli Fruit Research Institute, Hebei Academy of Agricultural Sciences, China. - The genome of the pear (Pyrus bretschneideri Rehd.).. - DBG2OLC: efficient assembly of large genomes using Long erroneous reads of the third generation sequencing technologies
Xem thử không khả dụng, vui lòng xem tại trang nguồn hoặc xem
Tóm tắt