« Home « Kết quả tìm kiếm

Survival marker genes of colorectal cancer derived from consistent transcriptomic profiling


Tóm tắt Xem thử

- The stability and robustness of the gene survival markers was assessed by cross-validation, and the best-ranked genes were also validated with two external independent cohorts: one of microarrays with 482 samples.
- Up-regulation of the top genes was also proved in a comparison with normal colorectal tissue samples.
- This risk predictor yielded an optimal separation of the individual patients of the cohort according to their survival, with a p-value of 8.25e-14 and Hazard Ratio 2.14 (95% CI .
- 1 Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Cientificas (CSIC) and University of Salamanca (USAL), Salamanca, Spain Full list of author information is available at the end of the article.
- 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Colorectal cancer (CRC) is one of the most frequent tumors that causes great morbidity worldwide.
- Further- more, the specific purpose of our work was to find con- sistent biomolecular targets that, together to facilitate samples stratification, could be related to the prognosis of the disease using survival data..
- Finally, after internal and external cross-validation, the genes selected as best survival markers were used to construct a risk predictor to allow stratifica- tion of the patients with respect to their relative risk..
- demonstrate a good integration of the global transcrip- tomic profiles of different samples sets avoiding the ty- pical batch-effects that can alterate any unified analysis..
- The phenotypic and clinical information about the final collection of 1273 samples, i.e., the available data about age, gender, survival time, location of the tumor, degree and TNM staging, presence of mutation in some cancer genes (TP53, KRAS, BRAF), etc..
- We performed the integration and combined normalization of the CRC expression datasets using 5 different proce- dures.
- The procedures applied different normalization algo- rithms to provide a homogeneous signal matrix, avoiding bias due to batch effect on the global expression profile of the CRC samples.
- (v) fRMA plus scaling of the data using mean-centered expression values..
- A group of 79 samples were discarded because they did not have survival data or they presented anomalous data distributions with respect to the other samples of the same series.
- Figure 1 presents the heatmaps derived from an unsupervised clustering of the samples using in each case the expression data matrix derived from each one of the 5 procedures applied.
- In this way, each heatmap is composed of 210 samples samples from each one of the 7 datasets (identified by the ID number, GSE, from GEO).
- mix of the overall expression signal coming from different datasets..
- This approach can reveal major effects associated to the global expres- sion signal of the samples, but it is not very sensitive to detect minor changes in a small number of genes.
- For this reason we applied a second approach to compare the results provided by the 5 normalization procedures in order to select the one that produces the best unifica- tion of the 7 CRC datasets, preserving a good signal to noise ratio in the expression distributions.
- 1 Symmetric heatmaps representing the similarity between the overall gene expression signal of the samples compared with each other..
- from each one of the 7 GSE datasets).
- e fRMA plus scaling of the data using mean-centered expression values.
- By contrast, the analysis of the data provided by the other 3 procedures (RMA plus Combat, fRMA plus Combat and fRMA plus mean-centered scaling, Fig..
- significant effect attributed to belonging to one of the series.
- Once we produced a large and well-integrated meta- dataset of CRC samples, having global expression pro- files and clinical survival data for all cases, we proceed to the identification of the subset of genes that suffer significant changes with colorectal tumor progression..
- 0.05) in either direction (i.e., genes up-regulated with the progression of the disease, in late versus early CRC stages.
- or genes down-regulated with the progression of the disease).
- 2 Plots presenting the distribution of the 1273 samples from 7 datasets (GSEs) obtained by Principal Component Analysis (PCA) of the global gene expression profile of each sample.
- Each plot presents the values of the two main dimensions (dim 1 versus dim 2) and corresponds to the PCA results obtained using the expression data calculated with different preprocessing and normalization methods.
- Table 2 Results of the linear regression analyses on the global expression matrix calculated for the 1273 samples from 7 datasets (GSEs) combined using 5 different preprocessing and normalization methods.
- (E) fRMA plus scaling of the data using mean-centered expression values.
- Thus, when the p -value of the factors are significant (<.
- To do this, we carried out Kaplan-Meier (KM) analysis of the survival times of the set of 1273 colorec- tal cancer samples for each one of the 2524 genes found in the previous exploration.
- To do this, our algorithm performs for each gene multiple splits of the sample cohort in two groups, and looks for the splitting that provides the best separation between groups (i.e..
- Figure 3 shows the Kaplan-Meier plots corresponding to the survival profiles of the two populations of.
- individuals that were segregated according to the ex- pression values of the gene tested.
- The separation of the two populations in both cases is very significant, with KM p-values <.
- however it was necessary to do an internal cross-validation of these results to assess how stable and reliable was the signal for each one of the selected genes..
- We carried out a cross-validation of the top-200 genes selected in any of the two conditions (i.e.
- This internal cross-validation was done using for each gene a resampling strategy that randomly selected 80% of the sample 100 times (i.e..
- 3 Kaplan-Meier plots of the survival analysis of the set of 1273 samples from colorectal cancer (CRC) patients.
- A short view of these data is shown in Table 3 that presents the 50 genes selected as best survival markers of CRC: the first part of the table corresponds to the top 25 genes, where up-regulation corresponds to shorter sur- vival and higher risk (HR >.
- the second part of the table corresponds to the top 25 genes, where up-regulation cor- responds to longer survival and lower risk (HR <.
- As indicated, the stability and robustness of the gene survival markers was assessed via a resampling strategy with random selection of 80% of the dataset 100 times.
- For the final ranking of the genes included in these tables we also considered that they had to give a signifi- cant adjusted p-value in more than 80 out of 100 boot- strap iterations (i.e.
- The consistency of the results obtained with the internal cross-validation gives strong support to the top genes found (presented in Table 3), but we had to consider the value of using other external independent CRC cohorts to corroborate these findings.
- The results in- dicated a good performance in more than two thirds of the genes tested.
- In Additional file 4: Table S4 we present the KM p-values and HR of the genes that were validated from the top 10 previously found: 7 genes of the top 10 for the case of up-regulation associated with poor survival (PTPN14, LAMP5, TM4SF1, LCA5, CSGALNACT2, SLC2A3 and GADD45B) and 6 genes of the top 10 previously found for the case of up-regulation associated with good survival (EPHB2, DUS1L, NUAK2, FANCC, MYB and CHDH)..
- We performed several multivariate survival analyses (OS, overall survival) on this dataset using combinations of the top genes proposed in Table 3.
- one group of high-risk, associated to the overexpression (or up-regulation) of the genes.
- This analysis was repeated with several other combinations of the top up-regulated genes associated with poor survival (present in Table 3), resulting in similar results.
- For example, com- bining DCBLD2, LAMP5, TM4SF1, NPR3 and GADD45B the separation of the high and low-risk groups improved a bit: KM p-value = 2.21e-07 and HR confidence interval, CI .
- All the integrated datasets, so far presented in this study corresponded to CRC samples, because we want to provide genes that are disease markers present in the transformed tumor cells of the intestinal epithelium, and genes that mark the progression and aggravation of this type of cancer.
- After this integra- tion, we could explore the expression level of the top up-regulated genes (identified as markers of poor survival), comparing the expression distribution on a set of cancer samples versus a set of normal tissue samples.
- The results were always very similar and the boxplots of the expression distributions for.
- These results indicate that the gene markers, identified in our survival studies, are most of the times also up-regulated in CRC tumors with respect to normal colo- rectal tissue..
- Finally, to obtain a more accurate evaluation of the prog- nostic value of all the genes selected as best candidates Table 3 Genes selected as top-50 best survival markers of colorectal cancer (CRC) (Continued).
- The first part of the table corresponds to the top-25 genes where up-regulation corresponds to shorter survival and higher risk (i.e., HR >.
- the second part of the table corresponds to the top-25 genes where UP-regulation corresponds to longer survival and lower risk (HR <.
- The stability and robustness of the gene survival markers was assessed by cross-validation, applying to each gene a resampling strategy with random selection of 80% of the samples 100 times (i.e.
- Therefore, it allows the best splitting of the cohort in two groups.
- The analysis of the beta factors assigned by the re- gression to each of the top 100 genes, i.e.
- to each variable within the multivariate vector (data included in Additional file 7: Table S5), allows the identifi- cation of the genes that were the most influential fac- tors in this risk analysis and therefore it facilitated the selection of the best “gene survival markers”.
- This complexity causes the molecular characterization of CRC to remain deficient, with a lack of clear gene markers associated to specific CRC subtypes and to the prognosis of the disease [17–19].
- A recursive algorithm using 10-fold cross-validation finds the value of risk score (marked with a vertical black line) that allows the best splitting of the cohort in two groups.
- The analysis of the beta factors assigned by the regression to each of the top 100 genes (i.e.
- to each variable within the multivariate vector) allows the identification of the genes that are the most influential factors in this risk analysis and therefore it helps in the selection of the best “ gene survival markers.
- The correlation between gene expression and survival is an excellent tool to investigate prognosis of the disease and to build risk pre- dictors that will be applicable to individual patients..
- A clear limitation comes from the fact that, in most of previous studies, the number of tumor samples used to select the genes that enter into the construction of the prognostic predictors is small (i.e., the size of the patient cohorts rarely it is greater than a few hundred individuals).
- Finally, we are investigating the biological meaning of the genes found as best predictive and prognostic markers.
- The analysis of the literature reveals some relevant observations.
- Moreover, a recent integrative analysis of multiple colon cancer gene-expression-based subtype classifiers reported that one of the three highest scoring genes in- cluded in several classifiers was GADD45B [36]..
- The variability due to the different staging of the tumors is another factor that can bring limitations to any CRC study.
- A final reason for the limitations of the results may be an over-adjustment to the tested data sets.
- The final proposed set of gene survival markers includes an open list of one hundred up-regulated genes, with a robust statistical estimation of the value of each one.
- In fact, our results showed that a selection of the top 5 genes applied to independent external cohorts provided very good separation of CRC samples in two distinct groups of high and low risk..
- Previously, to make the best use of the information obtained from the microarrays, we have considered the importance to ascertain the quality of the data.
- We used the R function image to create chip images of the raw intensities to dis- cover spatial artefacts in the samples.
- We have used the function fitPLM provided in the AffyPLM package to create the PLMset class object used as the input in the elaboration of the NUSE analysis.
- After applying the re- ferred quality assessment methods, we discarded 79 of the initial samples collected and proceed with the remaining 1273 (Table 1)..
- To create a table with all the phenotypic characteris- tics of the patients selected which involved all samples.
- Batch effect is one of the main problems when several datasets are combined to be studied together, because different batches usually add large unwanted variability to the data.
- The table includes the IDs of the samples in GEO and all the.
- available data about age, gender, survival time, location of the tumor, degree and TNM staging, presence of mutation in some cancer genes (TP53, KRAS, BRAF), etc.
- This table is an expension of the data in Table 3.
- Validation of the survival data done in an independent set of samples taken from The Cancer Genome Atlas (TCGA), that included 269 colorectal carcinomas with survival information and RNA-seq global expression profiling.
- The table includes the KM p- values and HR of the genes that were validated from the top-10 survival marker genes previously found presented in Table 3.
- Of the top-10 for the case of up-regulation associated with poor survival, 7 were validated (PTPN14, LAMP5, TM4SF1, LCA5, CSGALNACT2, SLC2A3 and GADD45B).
- Of the top-10 found for down-regulation associated with poor survival, 6 genes were validated (EPHB2, DUS1L, NUAK2, FANCC, MYB and CHDH)..
- Comparison of the distributions of the expression signal corresponding to ten genes in 25 samples from normal colorectal epithelium (green boxplots) versus 25 samples from CRC (red boxplots).
- Beta factors assigned by regression analysis to each of the top-100 survival marker genes.
- The factors allowed the identification of the genes that were the most influential variables in this risk analysis (i.e.
- We acknowledge the funding provided to JDLR research group by the Spanish Government with grants of the ISCiii co-funded by FEDER (refer- ences PI15/00328 and AC14/00024).
- provided by the “ Junta de Castilla y Leon ” (JCyL) with the support of the “ Fondo Social Europeo ” (FSE).
- The funding boards had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript..
- The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/.
- ARM and MMM also contributed to the design of the work and help in the preparation of the manuscript.
- Moreover, the Ethical Committees of our Research Centers (CiC-IBMCC and IMDEA-Food) supervised the adequate use of the data corresponding to human samples..
- Molecular subtypes in cancers of the gastrointestinal tract.
- Colon cancer subtypes: concordance, effect on survival and selection of the most representative preclinical models

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt