« Home « Kết quả tìm kiếm

NanotatoR: A tool for enhanced annotation of genomic structural variants


Tóm tắt Xem thử

- nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release.
- Conclusions: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting..
- The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.
- If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
- With the advent of the high-throughput short-read sequen- cing (SRS) techniques, identification of molecular underpin- nings of genetic disorders has become faster, more accurate and cost-effective [1].
- a change or variation of a single bp in the genome) or small insertions and deletions (INDELs.
- Importantly OGM has allowed refinement of intractable, low-complexity regions of the genome and discovery of genomic content missing in the reference genome assembly [19]..
- The nanotatoR pipeline takes as input Bionano-annotated SV files, in the format of either unmodified SMAP (BNG’s SVcaller output) or text (TXT) files that retain information from the SMAPs, but also append additional fields.
- For trio analysis (proband, mother, father), BN_VAP performs molecule checks for SVs identified in the proband (self-molecules) and checks whether the SV is present in the parents’.
- Finally, all of the sub-functions are compiled into a Main function..
- In order for the SVs to be considered “same”, nanotatoR , by default, checks whether two independent variants of the same type (e.g.
- Currently the 50% size similarity cutoff is not implemented by default for inversions and translocations, as sizes have only started to be provided in the SVcaller output recently.
- If the query SV matches with multiple variants in the BNDB from the same BNDB sample, nanotatoR counts these as a single variant/sample, with allele count of 2 for homozygous/unknown and 1 for heterozygous matches..
- 1.1dFrequency calculations: for DECIPHER and DGV, SV frequency is calculated by dividing the number of query matched database variants (step 1.1a) by the total number of alleles in the database, i.e .
- which is stored in the form of a text file.
- SVs overlapping gaps in hg19/hg38 are annotated in the output SMAPs as “nbase” calls (e.g..
- For duplications, inversions, and translocations nanotatoR evaluates whether chimeric scores “pass” the thresholds set by the Bionano SVcaller during de-novo genome assembly [34].
- SVs with the same family ID as the query are not included in the overall internal frequency calculation as described in the previous paragraph.
- “Internal_Homozygotes” are appended to each of the annotated input files (nonrelevant fields contain dashes)..
- A function to encode more complex intra-family relatedness and identify individual samples, termed nanoID, is in devel- opment and will be available by default in the next re- lease of nanotatoR..
- Expression values for overlapping and non-overlapping SV genes are extracted from the genome-wide expres- sion matrix and appended into separate columns in the overall SV input file.
- For example, an overlap of gene X with a SV in the proband, with an expression value of 10, would be represented as gene X (10), and printed in the “OverlapProbandEXP” column.
- The appended num- ber of columns in the output is dependent on which functions were run.
- inversions or translocations) and integration of the primary gene list (either provided by the user or generated by nanotatoR as described in section 4) into the input SV-containing file.
- and near SVs that are present in the primary gene list are printed in separate columns.
- indel_dup_denovo if not present in the parents (Found_in_parents_BSPQI_molecules/.
- indel_dup_both if present in the proband, as well as both parents (e.g..
- indel_dup_cmpdHET if present in the heterozygous state in parents.
- 2 variants with genomic coordinates overlapping with the same gene, one present in the mother, the other in the father..
- A visual representation of the steps is shown in Fig.
- where it was accepted in the April, 2019 cycle.
- The output is in the form of an Excel workbook subdivided into variant types and inheritance modes in familial cases.
- The user has an option to either filter the data based on input parameters or perform the filtrations steps in the final Excel sheets.
- Theoretical examples of the nanotatoR annotation process and output are illustrated in Fig.
- 1 Workflow of the nanotatoR pipeline: The nanotatoR pipeline is divided into 3 layers.
- Bionano SVcaller identified a total of 9387 SVs in the proband (NA24385), shown in the last tab of the 9-tab report, termed “all”.
- The vast majority (8680) were in the “ indel_dup ” tabs.
- 114 (out of 279 unfiltered) inversions were reported in the “ inv ” tab, and 0 (out of 84) translocations in the “ trans ” tab.
- All the translocations called by the Bionano SVcaller in this sample were in the categories “ trans_interchr_.
- Out of the 952 insertions, deletions and duplications, only 8 were de novo.
- 68 in the mother only.
- and 82 in the father only.
- These numbers can be used to evaluate pathogenicity of the SVs.
- 150 SVs would be reported in the “ indel_dup_.
- 2 nanotatoR annotates genes overlapping or near a SV: a The cartoon shows three hypothetical scenarios: one deletion in the region upstream of Gene X (yellow) which may contain regulatory regions, indicated as solid purple in the reference genome (top) and lilac in the patient ’ s genome (bottom).
- However, analysis of the genomes with OGM had revealed a deletion variant affecting the UGT2B17 gene in the son and the mother [41].
- 169 SVs were found to be overlapping with the primary gene list, and were shown in the “all_PG_OV” Tab..
- To investigate the frequency of the variant in the 8- sample internal cohort database, we first selected all var- iants with within − 10 kb of start breakpoint and + 10 kb of the end breakpoint (i.e.
- 3 Filtration and annotation of SV distribution in the NA24385 trio dataset: Out of the total 9387 variants found by SVcaller, 8804 passed the nanotatoR filtration of “ Present in self molecules ” and “ Pass chimeric score ” conditions.
- This annotation can be used to evaluate relevance of the variants to the condition studied.
- shown in the “GM24385_Variant_UGT2B17” tab in Table S3) and the other two are found in his mother.
- Bionano SVcaller also called another variant in the mother with the same end breakpoint, and a similar, but not identical, start break- point.
- As described in Methods section 1.2b, nanotatoR selects the variants that pass size similarity and breakpoint criteria, and reports the zygosity for each in the parents (“GM24385_del_ex- ample_Zygosity” tab in Table S3)..
- In addition to the family, the variant was found in two other samples of the internal cohort.
- As the internal control cohort was composed of 8 samples, of which 3 were part of the Ashkenazi family, the total number of alleles in the internal cohort was calculated as 10 = 2 x (8–.
- 3), where 2 is for diploid genomes, 8 is the total number of samples in the cohort and 3 is the number of related individuals.
- Next, we calculated the filtered and unfiltered frequency (Formula 2, Function 1.1d) of the deletion overlapping UGT2B17 in the Bionano reference database.
- To generate the primary gene list for the trio sample, we downloaded the ClinVar and GTR databases, using the downloadClinvar = TRUE and downloadGTR = TRUE parameters in the gene_list_generation function..
- The time taken for each of the functions is reported in Supplementary Table S5.
- Further breakdown of the indel_.
- For dual-enzyme label- ing, fewer variants were called in the SVmerge dataset (6814), of which ~ 9% were filtered out by nanotatoR de- fault filtration.
- Of the remaining are “indel_.
- Breakdown of the indel SVs between deletions, insertions and duplications is shown in Fig.
- Proportions of insertions and deletions are similar in the two data sets, while the dual enzyme labeling called more duplications and inversions than single-enzyme labeling in this example..
- The 4 deletion variants identified in the study overlapped GSTM1, LCE3B, LCE3C, CR1 and SIGLEC14 genes.
- SV breakpoints reported in the original publication and in the nanotatoR- annotated data sets are shown in Supplementary Table S8;.
- SV type and gene names are highlighted in the all_PG_.
- Example III: Duchenne muscular dystrophy cohort We have previously published validation of the OGM technology to identify variants in the DMD gene in a cohort of patients with Duchenne muscular dystrophy [27].
- Each of these types of variants was placed in the correct final Excel output tab with.
- b SV distribution in the NA12878 DLE and SVmerge filtered datasets: Deletions (dark blue), insertions (light blue), duplications (grey) and inversions (orange) numbers are as shown in the pie charts.
- For DLE, the majority of the identified SVs were insertions (64.9.
- While the total number of variants called is different between DLE and SVmerge, a similar pattern is seen in the SVmerge dataset.
- Many more duplications and inversions were called in the dual labeling than single DLE labeling method.
- As a result of the erroneous input, internal frequency is currently overestimated for variants on the X chromosome..
- Using nanotatoR, we were able to automate steps that previously had to be taken manually to identify the pathogenic SVs in the DMD gene in the data sets:.
- The details of the variants can be found in Barseghyan et.al.
- Deletion variant on chromosome 4 identified in sample NA24385: Cartoon of the chromosome 4 region deleted in the NA24385 genome (a) and screenshot of the matching UCSC genome browser output (b).
- Content of the various tabs is detailed in the first tab.
- of the Excel workbook.
- SVs reported in the indel, inv., and trans tabs have undergone the default nanotatoR filtration (found in self molecules, passed chimeric score threshold)..
- Content of the tabs and column names is detailed in the first tab.
- The columns used to calculate the filtered and unfiltered frequencies are highlighted in the del_totalData tab..
- The columns used to calculate the filtered and unfiltered frequencies are highlighted in the GM24385_data_all tab..
- nanotatoR annotation of structural variants in the DLE-labeled sample NA12878 dataset.
- of the Excel book.
- SVs reported in the indel, inv., and trans tabs have undergone the default nanotatoR filtration (found in self molecules, passed chimeric score threshold).
- In the all_PG_OV tab, the cells containing the query genes are highlighted in purple.
- nanotatoR annotation of structural variants in the dual-labeled sample NA12878 dataset (SVmerge output).
- In the all_PG_OV tab, the cells con- taining the query genes are highlighted in purple.
- Overlapping genes coordinates identified in the original paper by Mak et al.
- Singleton samples CDMD_1003 and CDMD_1159 were analyzed with single DLE labeling and are shown in the SingleLabel_Solo tab.
- Singleton samples CDMD_1155, CDMD_1156, and CDMD_1187 were analyzed with dual enzyme labeling and are shown in the.
- Mother proband dyads CDMD_1131, CDMD_1157, and CDMD_1163 were analyzed with dual enzyme labeling and are shown in the DualLabelDuo tab.
- Columns in each sheet of the workbook are either a direct output of SVcaller or appended by nanotatoR , as indicated in column 3 of the table in the first tab.
- https://doi.org .
- https://doi.org/10..
- Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy.
- https://doi.org/.
- Long-read single-molecule maps of the functional methylome.
- The database of genomic variants: a curated collection of structural variation in the human genome

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt