« Home « Kết quả tìm kiếm

Comparison of multiple algorithms to reliably detect structural variants in pears


Tóm tắt Xem thử

- Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (>.
- Moreover, SV detection software was originally developed and tested using the human genome or the genome of the model plant Arabidopsis thaliana , so this software may not efficiently detect SVs in pear.
- Sequen- cing of the genome of Pyrus bretschneideri cv.
- bretschneideri ) and is one of the primary pear cultivars grown in China.
- We have conducted a systematic analysis using ‘Yali’ genome NGS and long-read sequencing data to compare the performances of several com- monly used SV-calling software packages using short reads, namely Pindel [25], BreakDancer [33], IMR/.
- Then, the reliability of selected ‘ Yali ’ pear SVs was verified using visualization tools.
- Sequencing and mapping of the ‘ Yali ’ genome.
- Short read sequencing of the pear ‘Yali’ genome was conducted using the IIIumina HiSeq™ 2000 platform for pair-end sequencing, and the sequencing depth was 60×..
- The quality of the raw rese- quencing data was determined using FastQC (https://.
- Of the clean reads, 97.15% were mapped to the ‘Dang- shansuli’ pear genome using Burrows-Wheeler-Aligner (BWA) software [37].
- Long-read sequencing data for ‘Yali’ were generated using the PacBio platform, and the sequencing depth was 30×.
- SVs between ‘ Yali ’ and the reference genome detected using different algorithms and sequencing data.
- Depending on the performances of the nine SV callers, which are based on different algorithms (Table 1), up to eight types of SVs in the ‘Yali’ genome were detected: in- sertions, deletions, inversions, duplications, transloca- tions, MNPs (multiple nucleotide polymorphisms), CTXs and ITXs (Table 1).
- The number of SVs detected by the nine callers, categorized based on type and length,.
- Of the nine SV callers, SVIM detected the highest number of SVs.
- Detailed informa- tion about the number of SVs called by each software package is shown in Fig.
- ‘Yali’ pear genome.
- Of the SVs 79.47% were deletions, and no insertions lon- ger than 400 bp were identified (Fig.
- IMR/DENOM [34] utilizes local de novo assemblies and iterative read mapping to the reference sequence to identify SVs [38].
- Table 1 Comparison of the nine types of SV detection software Data type Detection.
- An overview of the nine SV callers, including the types of SVs detected (INS: insertion, DEL: deletion, INV: inversion, DUP: duplication, TRA: Translocation, ITX: intra-chromosomal translocation, CTX: inter-chromosomal translocation) and the mutation signals used (SR: split reads, RP: read pairs, AS: assembly).
- 1 kb) but it could not de- tect large deletions in ‘Yali’ (>.
- Moreover, Platypus could not call insertions longer than 300 bp, and over 50% of the SVs identified ranged from 50 bp to 75 bp in length.
- Moreover, more than 97% of the inversions and more than 94% of the dupli- cations called by DELLY were greater than 1 kb in length..
- 1 The number and types of SVs were called by seven software packages (Pindel, DELLY, BreakDancer, IMR/DENOM, Platypus, Lumpy, MetaSV) using next-generation sequencing data (60× sequencing depth), and two software packages (Sniffles, SVIM) applied long-read sequencing data (30× sequencing depth).
- The Integrative Genomics Viewer (IGV) browser was first used to confirm the presence of the SVs called by each caller.
- The accuracies of Pindel (58%) and BreakDancer (58%) were lower than those of the other callers.
- For the IMR/DENOM and Platypus software packages, which are based on assembly, the average ac- curacies of SV detection (81 and 66%, respectively) were higher than those of the other types of software, demon- strating that callers based on assembly algorithms detect SVs with higher confidence.
- The accuracy of the SVs called by MetaSV (70.
- According to the performances of the seven software packages using NGS data, Pindel, BreakDancer, IMR/.
- IMR/DENOM .
- Pindel-IMR/DENOM 1 502 0 0.
- DELLY-IMR/DENOM .
- BreakDancer-IMR/DENOM 0 4729 0 0.
- DELLY-BreakDancer-IMR/DENOM 0 4423 0 0.
- high percentage, 66.42%, of the duplications identified by Pindel were also identified by DELLY, but only 2.11%.
- DENOM can only detect insertions and deletions (Table 1), the number of inversions and duplications overlap- ping with those identified by the other three software packages was 0.
- Of the deletions identified by IMR/DENOM, 8.53% were also identified by Pindel, and 66.54% of the Pindel deletions overlapped with the IMR/DENOM deletions.
- Of the DELLY inser- tions, 26.06% were identified by IMR/DENOM, and 12.21% of IMR/DENOM insertions were identified by DELLY.
- However, 45.02% of the DELLY deletions over- lapped with those identified by IMR/DENOM, while over 85% of IMR/DENOM deletions were identified by DELLY.
- Although 100% of the BreakDancer deletions also overlapped with those identified by DELLY, only 5.68% of DELLY deletions were identified by BreakDancer..
- When comparing the combination of three software packages, few of the insertions called by Pindel, DELLY and IMR/DENOM overlapped, and no insertions called by these programs overlapped with those called by BreakDancer.
- Al- though Pindel, DELLY and IMR/DENOM shared fewer than 10% of deletions with each other, when comparing the output of Pindel, DELLY and BreakDancer, all of the deletions identified by BreakDancer, 66% of the deletions identified by Pindel and 36.27% of deletions identified by DELLY overlapped.
- DENOM, and BreakDancer and IMR/DENOM.
- DELLY, BreakDancer and IMR/DENOM.
- and Pindel, BreakDancer and IMR/DENOM.
- We then annotated the SVs detected by five individual callers, three using NGS data, each based on a different algorithm (Pindel, DELLY, and IMR/DENOM), and two using long-read sequencing data (Sniffles, which de- tected more SVs, and SVIM, which detected higher- confidence SVs), and observed the number of genes within SVs commonly identified by these callers (Fig.
- These 264 genes will be the main targets for future functional studies of the variants be- tween ‘Yali’ and ‘Dangshansuli’ pear.
- DENOM and Platypus, the number of SVs increased as sequencing depth increased to 50×.
- For Pindel, BreakDancer, DELLY, Lumpy, Sniffles and SVIM, the number of SVs called ob- viously increased as the sequencing depth increased..
- 2 Comparison of the number of genes within SVs identified using NGS-based software and long-read sequencing-based software.
- The yellow bars indicate the number of SVs identified by an individual software package and the black bars indicate the number of SVs identified by combinations of software packages.
- The goal of this study was to detect SVs with higher ac- curacy using the ‘Yali’ resequencing data.
- IMR/DENOM and Platypus are based on assembly.
- The overlapping SVs identified by multiple software packages were more.
- 3 The number of four SV types ( Insertion (a), Inversion (b), Deletion (c), Duplication (d)) were identified by nine software packages at different sequencing depths.
- In our study, we compared the sensitivities, accuracies and computational equipment requirements of seven common software packages using ‘Yali’ pear NGS data and two software packages using ‘Yali’ pear long-read se- quencing data to provide insights for choosing the most appropriate SV-calling program.
- Using SAMtools, the mean sequence insert size of ‘Yali’ was found to be 320 bp.
- Of the two software packages using long-read sequencing data, SVIM showed higher sensitivity, probably because SVIM collects, clusters and combines SV signatures from read alignments [32].
- selected five software packages for finding overlapping SVs, and finally validated the accuracy of the overlapping SVs..
- In our study, we found that the depth of NGS and long read sequencing absolutely affected the number of SVs called (Fig.
- A combination of multiple software packages is recommended for the detection of more types of SVs with higher accuracy.
- ‘Yali’ plants were grown in an experimental nursery at.
- FastQC was used to check raw sequencing data in FASTQ format during the first major step of sequence data pre- processing (fastqc -o yali_fastq yali_1.fq yali_2.fq).
- Trimmomatic, which is a fast, multithreaded command line tool for trimming paired-end and single reads produced by Illumina NGS technology [36], was used to trim reads using the following parameters: java –jar trimmomatic-0.36.jar PE -phred33 -trimlog logfile yali_1.fq yali_2.fq yali.read_1.fq yali.trim.read_1.fq yali.read_2.fq yali.- trim.read_2.fq ILLUMINACLIP: /Trimmomatic/adapters/.
- The pair-end sequencing reads of ‘Yali’ were aligned to the.
- ‘Dangshansuli’ reference genome using the ‘align’ step (bwa aln -t 20 dangshansuli.fasta yali.read_1.fq >.
- yali.read_.
- bwa aln -t 20 reference yali.read_2.fq >.
- bwa sampe dangshansuli.fasta yali.read_1.sai yali.read_2.sai yali.read_1.fq yali.read_2.fq >.
- yali.sam).
- NGMLR was used to map long reads to the ‘Dangshansuli’ reference genome (ngmlr –t 50 –r dangshansuli.fasta -q yali_pacbio.fastq –o yali.sam).
- Description of the four types of SV software.
- Seven types of SV-calling software using NGS data and two types of SV-calling software using long-read sequencing data, each based on different algo- rithms, were used to detect the SVs between ‘Yali’ and.
- The first step is ‘bam2pindel.pl’, the purpose of which is to extract read pairs for use by Pindel (bam2pindel_bwa.pl -i yali- sortrmdup.bam -o output_prefix -s yali –om).
- BreakDancer-max had the ability to predict five types of SVs from ‘Yali’ sequencing data: insertions, deletions, inversions, and inter- and intra-chromosomal translocations..
- IMR-DENOM [34].
- The input files consisted of the indexed refer- ence fasta file and the ‘Yali’ sorted BAM file.
- The generic option ‘-t’ can be changed to detect other types of SVs..
- Before detecting SVs, the BAM files *.bam, *.splitters.bam and *.discordants.- bam were obtained using the BWA-MEM alignment in speedseq [48] (speedseq align -R “@RG\tID:id\tSM:sam- ple\tLB:lib” dangshansuli.fasta yali.read_1.fq yali.read_.
- The output contained important information such as the type and size of SV, the chromosome ID, and the SV length (lumpyexpress –B yali.bam –S yali_splitters.-.
- bam –D yali.discordants.bam –o yali.output).
- lumpy_vcf yali.vcf --bam yali.bam --outdir.
- Default parameters were used (sniffles –m yalisortrmdup.bam –v yali.vcf)..
- Validation of SVs detected using NGS data and long-read sequencing data.
- erence genome sequence, and the BAM file of ‘Yali’ was loaded to visually confirm the presence of the identified deletions and insertions..
- GO analysis of 264 genes within SVs commonly identified by five SV callers..
- KEGG analysis of 264 genes within SVs commonly identified by five SV callers..
- The number of genes within SVs detected by software packages using NGS data (a) and long-read sequencing data (b)..
- GO analysis of 403 genes within SVs commonly identified by three SV callers using NGS data..
- KEGG analysis of 403 genes within SVs commonly identified by three SV callers using NGS data..
- GO analysis of 4495 genes within SVs commonly identified by two SV callers using long-read sequencing data..
- KEGG analysis of 4495 genes within SVs commonly identified by two SV callers using long-read sequencing data..
- Verification of SVs in ‘ Yali ’ through comparisons with the ‘ Dangshansuli ’ reference genome..
- We gratefully thank the Changli Fruit Research Institute, Hebei Academy of Agricultural Sciences, China for providing the ‘ Yali ’ pear as experimental material..
- Funding from the National Science Foundation of China (31725024 and 31672111) supported the next-generation sequencing of ‘ Yali ’ pear, and funding from the Earmarked Fund for the China Agriculture Research System (CARS-28) supported the long-read sequencing of ‘ Yali pear.
- DNA extraction and library preparation for ‘ Yali ’ were performed using funding from the “ 333 High Level Talents Project ” of Jiangsu Province (BRA2016367)..
- The ‘ Yali ’ pear plant samples were obtained from the Changli Fruit Research Institute, Hebei Academy of Agricultural Sciences, China.
- The genome of the pear (Pyrus bretschneideri Rehd.)..
- DBG2OLC: efficient assembly of large genomes using Long erroneous reads of the third generation sequencing technologies

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt