« Home « Kết quả tìm kiếm

A clinically validated whole genome pipeline for structural variant detection and analysis


Tóm tắt Xem thử

- A clinically validated whole genome.
- pipeline for structural variant detection and analysis.
- WGS provides unique opportunities for detection of structural variants.
- Results: We have developed a clinically validated pipeline for highly specific and sensitive detection of structural variants basing on 30X PCR-free WGS.
- Using a combination of breakpoint analysis of split and discordant reads, and read depth analysis, the pipeline identifies structural variants down to single base pair resolution.
- Compound and potential compound combinations of structural variants and small sequence changes are automatically detected.
- Analytical and clinical sensitivity and specificity of the pipeline has been validated using analysis of Genome in a Bottle reference genomes and known positive samples confirmed by orthogonal sequencing technologies..
- Conclusion: Consistent read depth of PCR-free WGS enables reliable detection of structural variants of any size..
- Keywords: Whole genome sequencing, Structural variants, Clinical validation, Pipeline, Diagnostic console, WGS, CNV, Deletion, Duplication, Break point.
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Full list of author information is available at the end of the article.
- Short read based Whole Genome Sequencing (WGS) is slowly but surely becoming an integral part of the land- scape of clinical diagnostic testing for rare genetic disor- ders.
- This approach, however, ignores many of the advantages of WGS which provides unique opportunities for detection of structural variants (SVs), pathologic short tandem repeats and mitochondrial variants, which otherwise require separate assays.
- SVs are a diverse group of variants which consists of copy number variants (CNVs), namely duplications or deletions of human genetic sequences resulting in an abnormal number of alleles.
- While detection of small sequence changes has become fairly standardized using “gold standard” tools such as BWA [5] and GATK [6] which are almost universally used for sequence alignment and variant calling, the situation for SV detection is quite different.
- The SV component of Variantyx Genomic Intelligence pipeline uses a combination of breakpoint analysis (using split and discordant reads) and read depth analysis to identify structural variants, often down to single base pair resolution.
- While some of the tools are used as published, others are significantly.
- In addition, the results of the raw variant calls have been augmented and filtered using in-house developed anno- tations, techniques, and data sources..
- In general, structural variants can be divided into two categories: those resulting in unbalanced changes in number of copies of human DNA, and those resulting in balanced changes so the total number of copies remains the same.
- depth-based analysis and break point analysis [13].
- The first is to identify regions in which read depth is signifi- cantly different from typical depth in same region in samples which are known not to have copy number vari- ation in this region.
- Only break point analysis can be used to identify balanced SVs, including inversions, transloca- tions, as well as insertions of foreign DNA such as trans- posable elements..
- Both read depth and break point signals are utilized by Variantyx Genomic Intelligence algorithms, while results on larger break point derived variants must be con- firmed by depth signal to be considered a true positive..
- Structural variants are called with the use of Samblaster [11] for read extraction, LUMPY [14] for read-based SV calling and SVtyper [11] for genotyping, using default parameters.
- The rolling average read depth-based model rolls up 100 bp segments into buckets of 10,000 and 2500 bp.
- We found these sizes op- timal while 10,000 bp bucket allows to detect uninter- rupted stretches of read depth deviation in larger CNVs and 2500 bp bucket allows to detect smaller CNVs and improve exact position of larger ones.
- Break point ana- lysis allows detection of smaller SVs and all types of bal- anced variants..
- The most common SVs, deletions and tandem duplica- tions, have a single break point which exhibits in the sample reads the unexpected juxtaposition of two non- contiguous reference coordinates marking the start and end of the structural event.
- Other events such as insertion of DNA naturally have two break points, one at the start and the end of the inserted fragment..
- However, in such cases, one of the breakpoints may not be detected, because the number of split or discordant.
- Translocation of chromosome arms also have one break point but are hard to distin- guish from an insertion with an undetected second break point.
- While exact quantitate benchmarking of the variant calling results by Variantyx pipeline relative to other SV calling tools and algorithms has not been performed, some comparison could be made.
- We use machine learning algorithm to detect CNVs based on large number of human genomes se- quenced under same standard operating procedures, resulting in highly repeatable normalized read depth..
- Raw output of the SV calling pipeline includes signifi- cant number of false positives that must be removed prior to introduction to the Diagnostic Console.
- Many of these false positive calls can be filtered out based on a number of criteria specific to variant type.
- In particular, all variants called based on break point analysis must be supported by at least 20 observations (combined split and discordant reads), out of which 5 must be split reads.
- In addition, CNVs over 5000 bp long called using break point analysis must have at least 30% overlap with those called using read depth analysis..
- This rule is based on the fact that in most regions of the genome (with notable exceptions such as sex chromosomes) the frequency of SNVs is higher, and if at least one allele is present the threshold will be exceeded.
- Unfortunately, typically it is not the case (particularly in large deletions) since some of the reads are still getting aligned within such.
- When two alleles of DNA are present, typically number of heterozygous SNVs significantly exceeds number of alternative SNVs.
- In typical true Heterozygous deletions long enough to include statistically significant number of SNVs the observed ratio does not exceed 10%..
- duplications must represent no more than 20% of total number of duplications.
- heterozygous SNV is defined as one having fraction of reads supporting each of the two alleles between 40 and 60% (See Fig.
- The application of filtration allows to remove most of false positive calls, while leads to loss of relatively small number of true positive variants (removes 38 TP, 297 FP in the analyzed buckets).
- The most significant impact is on largest bucket, removing all 84 false positive and keeping all 5 true positive SVs.
- This filter also re- moved 4 true positive variants from the bucket.
- It is very well known and represents industry standard in clinical genetics of small sequence changes, however despite the fact it includes over 20,000 curated pathogenic SV currently it is not widely used in SV annotation.
- of genomic coordinates of the SVs included in HGMD Pro- fessional, making the data not readily available for annota- tion.
- It often happens that no SVs similar to one detected have been previously reported in peer-reviewed litera- ture, however the SV intersects gene(s) or region with known pathogenic small sequence changes.
- Recessive structural variants can be compound to small sequence changes, and detection of such combin- atory compound heterozygous pairs is often challenging..
- Such compound (in case of family ana- lysis when paternal and maternal alleles are identifiable) and potential compound (when one of the variants is de novo or patient is tested as a singleton) pairs are pre- sented in a dedicated section of Diagnostic Console, along with compound pairs of small sequence variants..
- No changes in tech- nical parameters such as number of split reads or depth call overlap are recommended as part of Unity test structural variants Diagnostic Process (Additional file 1:.
- Analytical validation includes comparison of variants called by the assay with known true positive set of variants to deter- mine sensitivity, specificity and positive predicted value..
- Indeed, in case of small sequence changes available true positive variant sets can be used for accurate benchmarking.
- See Additional file 1: Figure S2 for analytical validation statistics of small sequence changes component of Variantyx Unity test..
- Unfortunately, no true positive variant set of accept- able quality is available for analytical validation of SVs..
- Close examination of a rep- resentative group of “true positive” SVs called by differ- ent approaches revealed large number of false positives and false negatives, making use of this data unacceptable in analytical validation of clinical test.
- Thus, we have de- cided to directly pursue clinical validation of Unity test..
- To perform clinical validation, we have gathered a statis- tically significant number of true positive (those having causative pathogenic SV confirmed by orthogonal detec- tion techniques) and true negative clinical samples (those of healthy individuals or affected but having causative genetic variants of different than SV types)..
- Majority of the true positive samples were obtained from public collections, while some originated from different sources [25]..
- A total of 60 clinical validation cases underwent complete Unity test cycle, starting from de novo WGS sequencing all the way to clinical interpretation by board certified clinical geneticists and generation of patient re- port.
- Due to large number of detected pathologic genetic variants it was impossible to pass the synthetic sample for real patient data.
- Out of 60 clinical validation patients, 17 were true positive for pathogenic SV, in some cases with multiple SVs present, and in others the SV was a compound het- erozygote with recessive small sequence changes.
- In gen- eral, significant number of true positive samples found in public repositories belong to cases diagnosed nearly two decades ago by rather narrow, by today’s standards,.
- 3 Structural variant annotation on variant and gene level as presented in Diagnostic Console.
- Between all patient and synthetic samples which underwent clinical interpretation there were a total of 25 SVs, all of which were detected by Variantyx Genomic Intelligence platform and 24 were clinically reported, resulting in 96% clinical sensitivity for detection of pathogenic SV.
- It is important to note that true positive samples that include SVs beyond the scope of Variantyx Unity test, such as balanced translocations, were not in- cluded in clinical validation..
- The uniformity and consistent read depth of PCR free WGS allows reliable detection of SVs and clinical utilization of SV workflow as part of comprehensive WGS based genetic testing that could be used as the first line diagnostic test.
- A synthetic DNA sample for Unity test validation was purchased from SeraCare (Seraseq Inherited Cancer DNA Mix v1)..
- Additional file 1: Table S1 Samples for clinical validation of Variantyx Unity test.
- Table S2 Variantyx Unity test thresholds.
- Figure S1 Causative heterozygous deletion of 45 bp detected and reported by Variantyx Unity test.
- Figure S2 Analytical validation statistics of small sequence changes by Variantyx Unity test basing on combination of 3 different Genome in a Bottle samples.
- Method S1 Variantyx diagnostic procedure for reporting pathogenic structural variants.
- 4 Structural variant filtration in Diagnostic Console.
- SV: Structural Variant.
- TP: True Positive.
- WGS: Whole Genome Sequencing.
- The authors thank Haim Neerman for participation in multiple fruitful discussions of the pipeline and clinical validation, Rohan Jyoti for designing user interface of the Diagnostic Console and Grace Hawk for editing and proofreading the manuscript..
- The full contents of the supplement are available online at https://.
- NM, SM, LK and TF performed clinical validation of the pipeline.
- Jacobs PA, Browne C, Gregson N, et al.
- Estimates of the frequency of chromosome abnormalities detectable in unselected newborns using moderate levels of banding.
- Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.
- Trost B, Walker S, Wang Z, et al.
- A comprehensive workflow for read depth- based identification of copy-number variation from whole-genome sequence data.
- doi: https://doi.org/10..
- Chiang C, et al.
- Detection of genomic structural variants from next-generation sequencing data.
- Layer RM, et al.
- Li Y, et al.
- Boeva V, et al.
- https://omim.org/..
- Zook JM, et al.
- Genome in a Bottle.
- doi: https://doi.org .
- Chaisson MJP, et al.
- doi: https://doi.org/.
- Alka C, et al.
- Falik Zaccai TC et al

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt