« Home « Kết quả tìm kiếm

Comparison of multiple algorithms to reliably detect structural variants in pears

- Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (>.
- Moreover, SV detection software was originally developed and tested using the human genome or the genome of the model plant Arabidopsis thaliana , so this software may not efficiently detect SVs in pear.
- Sequen- cing of the genome of Pyrus bretschneideri cv.
- bretschneideri ) and is one of the primary pear cultivars grown in China.
- We have conducted a systematic analysis using ‘Yali’ genome NGS and long-read sequencing data to compare the performances of several com- monly used SV-calling software packages using short reads, namely Pindel [25], BreakDancer [33], IMR/.
- Then, the reliability of selected ‘ Yali ’ pear SVs was verified using visualization tools.
- Sequencing and mapping of the ‘ Yali ’ genome.
- Short read sequencing of the pear ‘Yali’ genome was conducted using the IIIumina HiSeq™ 2000 platform for pair-end sequencing, and the sequencing depth was 60×..
- The quality of the raw rese- quencing data was determined using FastQC (https://.
- Of the clean reads, 97.15% were mapped to the ‘Dang- shansuli’ pear genome using Burrows-Wheeler-Aligner (BWA) software [37].
- Long-read sequencing data for ‘Yali’ were generated using the PacBio platform, and the sequencing depth was 30×.
- SVs between ‘ Yali ’ and the reference genome detected using different algorithms and sequencing data.
- Depending on the performances of the nine SV callers, which are based on different algorithms (Table 1), up to eight types of SVs in the ‘Yali’ genome were detected: in- sertions, deletions, inversions, duplications, transloca- tions, MNPs (multiple nucleotide polymorphisms), CTXs and ITXs (Table 1).
- The number of SVs detected by the nine callers, categorized based on type and length,.
- Of the nine SV callers, SVIM detected the highest number of SVs.
- Detailed informa- tion about the number of SVs called by each software package is shown in Fig.
- ‘Yali’ pear genome.
- Of the SVs 79.47% were deletions, and no insertions lon- ger than 400 bp were identified (Fig.
- IMR/DENOM [34] utilizes local de novo assemblies and iterative read mapping to the reference sequence to identify SVs [38].
- Table 1 Comparison of the nine types of SV detection software Data type Detection.
- An overview of the nine SV callers, including the types of SVs detected (INS: insertion, DEL: deletion, INV: inversion, DUP: duplication, TRA: Translocation, ITX: intra-chromosomal translocation, CTX: inter-chromosomal translocation) and the mutation signals used (SR: split reads, RP: read pairs, AS: assembly).
- 1 kb) but it could not de- tect large deletions in ‘Yali’ (>.
- Moreover, Platypus could not call insertions longer than 300 bp, and over 50% of the SVs identified ranged from 50 bp to 75 bp in length.
- Moreover, more than 97% of the inversions and more than 94% of the dupli- cations called by DELLY were greater than 1 kb in length..
- 1 The number and types of SVs were called by seven software packages (Pindel, DELLY, BreakDancer, IMR/DENOM, Platypus, Lumpy, MetaSV) using next-generation sequencing data (60× sequencing depth), and two software packages (Sniffles, SVIM) applied long-read sequencing data (30× sequencing depth).
- The Integrative Genomics Viewer (IGV) browser was first used to confirm the presence of the SVs called by each caller.
- The accuracies of Pindel (58%) and BreakDancer (58%) were lower than those of the other callers.
- For the IMR/DENOM and Platypus software packages, which are based on assembly, the average ac- curacies of SV detection (81 and 66%, respectively) were higher than those of the other types of software, demon- strating that callers based on assembly algorithms detect SVs with higher confidence.
- The accuracy of the SVs called by MetaSV (70.
- According to the performances of the seven software packages using NGS data, Pindel, BreakDancer, IMR/.
- IMR/DENOM .
- Pindel-IMR/DENOM 1 502 0 0.
- DELLY-IMR/DENOM .
- BreakDancer-IMR/DENOM 0 4729 0 0.
- DELLY-BreakDancer-IMR/DENOM 0 4423 0 0.
- high percentage, 66.42%, of the duplications identified by Pindel were also identified by DELLY, but only 2.11%.
- DENOM can only detect insertions and deletions (Table 1), the number of inversions and duplications overlap- ping with those identified by the other three software packages was 0.
- Of the deletions identified by IMR/DENOM, 8.53% were also identified by Pindel, and 66.54% of the Pindel deletions overlapped with the IMR/DENOM deletions.
- Of the DELLY inser- tions, 26.06% were identified by IMR/DENOM, and 12.21% of IMR/DENOM insertions were identified by DELLY.
- However, 45.02% of the DELLY deletions over- lapped with those identified by IMR/DENOM, while over 85% of IMR/DENOM deletions were identified by DELLY.
- Although 100% of the BreakDancer deletions also overlapped with those identified by DELLY, only 5.68% of DELLY deletions were identified by BreakDancer..
- When comparing the combination of three software packages, few of the insertions called by Pindel, DELLY and IMR/DENOM overlapped, and no insertions called by these programs overlapped with those called by BreakDancer.
- Al- though Pindel, DELLY and IMR/DENOM shared fewer than 10% of deletions with each other, when comparing the output of Pindel, DELLY and BreakDancer, all of the deletions identified by BreakDancer, 66% of the deletions identified by Pindel and 36.27% of deletions identified by DELLY overlapped.
- DENOM, and BreakDancer and IMR/DENOM.
- DELLY, BreakDancer and IMR/DENOM.
- and Pindel, BreakDancer and IMR/DENOM.
- We then annotated the SVs detected by five individual callers, three using NGS data, each based on a different algorithm (Pindel, DELLY, and IMR/DENOM), and two using long-read sequencing data (Sniffles, which de- tected more SVs, and SVIM, which detected higher- confidence SVs), and observed the number of genes within SVs commonly identified by these callers (Fig.
- These 264 genes will be the main targets for future functional studies of the variants be- tween ‘Yali’ and ‘Dangshansuli’ pear.
- DENOM and Platypus, the number of SVs increased as sequencing depth increased to 50×.
- For Pindel, BreakDancer, DELLY, Lumpy, Sniffles and SVIM, the number of SVs called ob- viously increased as the sequencing depth increased..
- 2 Comparison of the number of genes within SVs identified using NGS-based software and long-read sequencing-based software.
- The yellow bars indicate the number of SVs identified by an individual software package and the black bars indicate the number of SVs identified by combinations of software packages.
- The goal of this study was to detect SVs with higher ac- curacy using the ‘Yali’ resequencing data.
- IMR/DENOM and Platypus are based on assembly.
- The overlapping SVs identified by multiple software packages were more.
- 3 The number of four SV types ( Insertion (a), Inversion (b), Deletion (c), Duplication (d)) were identified by nine software packages at different sequencing depths.
- In our study, we compared the sensitivities, accuracies and computational equipment requirements of seven common software packages using ‘Yali’ pear NGS data and two software packages using ‘Yali’ pear long-read se- quencing data to provide insights for choosing the most appropriate SV-calling program.
- Using SAMtools, the mean sequence insert size of ‘Yali’ was found to be 320 bp.
- Of the two software packages using long-read sequencing data, SVIM showed higher sensitivity, probably because SVIM collects, clusters and combines SV signatures from read alignments [32].
- selected five software packages for finding overlapping SVs, and finally validated the accuracy of the overlapping SVs..
- In our study, we found that the depth of NGS and long read sequencing absolutely affected the number of SVs called (Fig.
- A combination of multiple software packages is recommended for the detection of more types of SVs with higher accuracy.
- ‘Yali’ plants were grown in an experimental nursery at.
- FastQC was used to check raw sequencing data in FASTQ format during the first major step of sequence data pre- processing (fastqc -o yali_fastq yali_1.fq yali_2.fq).
- Trimmomatic, which is a fast, multithreaded command line tool for trimming paired-end and single reads produced by Illumina NGS technology [36], was used to trim reads using the following parameters: java –jar trimmomatic-0.36.jar PE -phred33 -trimlog logfile yali_1.fq yali_2.fq yali.read_1.fq yali.trim.read_1.fq yali.read_2.fq yali.- trim.read_2.fq ILLUMINACLIP: /Trimmomatic/adapters/.
- The pair-end sequencing reads of ‘Yali’ were aligned to the.
- ‘Dangshansuli’ reference genome using the ‘align’ step (bwa aln -t 20 dangshansuli.fasta yali.read_1.fq >.
- yali.read_.
- bwa aln -t 20 reference yali.read_2.fq >.
- bwa sampe dangshansuli.fasta yali.read_1.sai yali.read_2.sai yali.read_1.fq yali.read_2.fq >.
- yali.sam).
- NGMLR was used to map long reads to the ‘Dangshansuli’ reference genome (ngmlr –t 50 –r dangshansuli.fasta -q yali_pacbio.fastq –o yali.sam).
- Description of the four types of SV software.
- Seven types of SV-calling software using NGS data and two types of SV-calling software using long-read sequencing data, each based on different algo- rithms, were used to detect the SVs between ‘Yali’ and.
- The first step is ‘bam2pindel.pl’, the purpose of which is to extract read pairs for use by Pindel (bam2pindel_bwa.pl -i yali- sortrmdup.bam -o output_prefix -s yali –om).
- BreakDancer-max had the ability to predict five types of SVs from ‘Yali’ sequencing data: insertions, deletions, inversions, and inter- and intra-chromosomal translocations..
- IMR-DENOM [34].
- The input files consisted of the indexed refer- ence fasta file and the ‘Yali’ sorted BAM file.
- The generic option ‘-t’ can be changed to detect other types of SVs..
- Before detecting SVs, the BAM files *.bam, *.splitters.bam and *.discordants.- bam were obtained using the BWA-MEM alignment in speedseq [48] (speedseq align -R “@RG\tID:id\tSM:sam- ple\tLB:lib” dangshansuli.fasta yali.read_1.fq yali.read_.
- The output contained important information such as the type and size of SV, the chromosome ID, and the SV length (lumpyexpress –B yali.bam –S yali_splitters.-.
- bam –D yali.discordants.bam –o yali.output).
- lumpy_vcf yali.vcf --bam yali.bam --outdir.
- Default parameters were used (sniffles –m yalisortrmdup.bam –v yali.vcf)..
- Validation of SVs detected using NGS data and long-read sequencing data.
- erence genome sequence, and the BAM file of ‘Yali’ was loaded to visually confirm the presence of the identified deletions and insertions..
- GO analysis of 264 genes within SVs commonly identified by five SV callers..
- KEGG analysis of 264 genes within SVs commonly identified by five SV callers..
- The number of genes within SVs detected by software packages using NGS data (a) and long-read sequencing data (b)..
- GO analysis of 403 genes within SVs commonly identified by three SV callers using NGS data..
- KEGG analysis of 403 genes within SVs commonly identified by three SV callers using NGS data..
- GO analysis of 4495 genes within SVs commonly identified by two SV callers using long-read sequencing data..
- KEGG analysis of 4495 genes within SVs commonly identified by two SV callers using long-read sequencing data..
- Verification of SVs in ‘ Yali ’ through comparisons with the ‘ Dangshansuli ’ reference genome..
- We gratefully thank the Changli Fruit Research Institute, Hebei Academy of Agricultural Sciences, China for providing the ‘ Yali ’ pear as experimental material..
- Funding from the National Science Foundation of China (31725024 and 31672111) supported the next-generation sequencing of ‘ Yali ’ pear, and funding from the Earmarked Fund for the China Agriculture Research System (CARS-28) supported the long-read sequencing of ‘ Yali pear.
- DNA extraction and library preparation for ‘ Yali ’ were performed using funding from the “ 333 High Level Talents Project ” of Jiangsu Province (BRA2016367)..
- The ‘ Yali ’ pear plant samples were obtained from the Changli Fruit Research Institute, Hebei Academy of Agricultural Sciences, China.
- The genome of the pear (Pyrus bretschneideri Rehd.)..
- DBG2OLC: efficient assembly of large genomes using Long erroneous reads of the third generation sequencing technologies

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt

Comparison of multiple algorithms to reliably detect structural variants in pears

CHỦ ĐỀ LIÊN QUAN