« Home « Kết quả tìm kiếm

Transcriptomic and presence/absence variation in the barley genome assessed from multi-tissue mRNA sequencing and their power to predict phenotypic traits


Tóm tắt Xem thử

- Transcriptomic and presence/absence variation in the barley genome assessed from multi-tissue mRNA sequencing and their power to predict phenotypic traits.
- One of the major approaches used in plant breeding to increment yield gains is to exploit the natural genetic variation present in the crop species’ gene pool..
- Barley was domesticated more than 10,000 years ago in the fertile crescent [2].
- To exploit the natural genetic variation present in the gene pool of barley, genomic tools such as single.
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- to characterize most of the barley accessions of the German ex situ genebank using a genotyping by sequencing approach [7]..
- It is now accepted that a significant proportion of the genes of plant genomes are not expressed (expression presence/absence variation.
- However, up to now, little information is available con- cerning the extent and distribution of PAV in the barley genome [18, 26]..
- Prediction of phenotypic variation in the context of genomic selection, which is nowadays an essential com- ponent of plant breeding programs, is performed based on SNP genotyping profiles.
- Only the use of microarray based transcriptome information for prediction of phenotypic traits in maize resulted for a subset of the traits in increased prediction accuracies especially when combined with SNP genotyping informa- tion [29].
- Furthermore, an evaluation of the prediction accuracy of PAV has to our knowledge not yet been performed, despite that sin- gle PAV have been shown to contribute to phenotypic.
- Out of the 73,187 expressed genes across seedlings, leaves, and apex sam- ples, 11,523 genes mapped to regions of the Morex reference genome where no gene had previously been annotated (Additional file 1: Figure S1).
- A total of 3,482 genes mapped to the unknown chromosome of the Morex reference sequence, where 581 of these were newly annotated genes..
- The average length of the newly annotated genes was 5,470 bp..
- We additionally identified 1,502 new contigs, with an average gene length of 494 bp, that did not map to any of the seven barley chromosomes.
- In total, 96% of the homol- ogous genes were found in other cereals of the Triticacea tribe but not in more distantly related species (Fig.
- In addition, only 280 of the newly identified genes had an unknown gene annotation com- pared to the eight plant species.
- Altogether, 67% of the newly identified genes were expressed in all three tissues (Fig.
- 1 Characterization of the contigs established by a de novo transcriptome assembly of unmapped reads across all 23 inbreds.
- b Expression of 1,502 newly identified genes in the three different tissues.
- c Number of inbred lines in which the contigs of the de novo transcriptome assembly were expressed..
- A total of 38,810 barley genes were detected as ePAV, of which 28,340 had previously been annotated in the refer- ence genome (Additional file 1: Table S2).
- 2), and in fact, 80.6% of the newly annotated genes and 78.8% of the newly identified genes were also detected as ePAV (Additional file 1: Table S2).
- In 50 replications, 20% of the gene length of each gene was used for transcript calling and ePAV detection.
- Presence and absence of the 73,187 genes across all inbreds.
- 3 Distribution of expression presence/absence variation (ePAV) across the physical map of the barley chromosomes.
- Using such SNP located within genes, which we refer to as gPAV- SNP, we calculated the proportion of gPAV-SNP that were also detected by our procedure as ePAV, and consid- ered this value as an estimation of the power to detect gPAV by our ePAV detection procedure (Additional file 1:.
- Finally, the similarity between presence/absence patterns in the 23 inbreds of ePAV and gPAV-SNP was very high, ranging from 70% to 90%..
- Both PCA revealed the existence of two clusters of inbreds defined by the row type of the inbreds (Additional file 1:.
- How- ever, these analyses also reveal that the relationship of the inbreds within clusters differs between the two sources of molecular variation.
- presence/absence variation (ePAV) and sequence variants (SV) r .
- In order to obtain unbiased estimates of the prediction accuracy, we ran- domly subdivided in 1000 cross-validation runs the 23 inbreds in training and validation set.
- Prediction accura- cies of SV, T, and ePAV were compared to the prediction accuracy of the SNParray data set that we used as the baseline predictor.
- The seedling transcriptome (T s ) resulted across the three traits in the highest median of prediction accuracy of all the examined single predictors..
- We also evaluated the pairwise combinations of sin- gle predictors and observed for all traits an increase of the prediction accuracy compared to using T s .
- There- fore, a grid search in which the relative weights of the relationship matrices of two or three predictors varied in increments of 0.1 prior to summing them up, was used to identify those combinations of SV, ePAV, and T s that resulted in the highest prediction accuracies.
- For all three traits, the highest median of the prediction accuracy was observed when using more than one predictor (Fig.
- Even with a sequencing depth that corresponds to 0.5% of that of our study, prediction accu- racies higher than that of the prediction with the SNParray data set were obtained.
- However, the variability of the prediction accuracy across the different runs of the resam- pling simulations increases with a reduced sequencing depth..
- Both numbers are in the range of what was previously reported for barley [17, 18] as well as maize [20, 21].
- of the total number of genes were detected as ePAV (Additional file 1: Table S2).
- Prediction accuracy for the barley inbreds for leaf angle, heading date, and plant height of single predictors as well as the optimal combination identified in a grid search (Opt) using the original number of reads of the seedling sample as well as using data sets for which the number of reads was randomly reduced to 10, 5, 1, and 0.5% of the original number of reads per seedling sample.
- The number of variants gives the mean number of features available for predictions in each scenario or for the combined predictors the weight of sequence variants (SV s )/expression presence/absence variation (ePAV s )/gene expression (T s ) resulting in the highest prediction accuracy.
- This might be explained thereby that selection is less efficient in lowly recombining regions of the chromosome compared to highly recombining pericentromeric regions to purge presence/absence variation that was created by evolution- ary processes during plant polyploidization and specia- tion [30].
- This allowed us to estimate that by characterizing the expression of genes in one tissue, we are able to detect about 30% of the existing gPAV..
- Instead, the gPAV can be also caused by partial insertion/deletions of the corresponding gene.
- α ∗ was approxi- mately 90% in our data set (Table 1), meaning that 10% of the ePAV are caused by the physical absence of the gene and not by impairment of its transcription.
- Another explanation could be the differences in the methodologies between both studies.
- Number of dispensable genes in the barley genome We can estimate from the above described estimates of 1-β ∗ and α ∗ that about 10% of the about 38,000 ePAV, i.e.
- Therefore, our results suggest that more than 10% of the barley genes show PAV on a genomic level.
- This figure is similar to what was observed in the analysis of 80 Arabidopsis accessions but was higher compared to other cereal species.
- Instead, the superior- ity of the ePAV information compared to the SNParray for the prediction of phenotypic traits might be due to that ePAV are only caused to 10% by gPAV but cover also gene expression differences.
- However, we also observed differences in the predic- tion accuracy of T depending on the tissue that was used for mRNA extraction.
- And in the set-up used in our study of unreplicated plants for sample collection such heterogenous environmental factors cause together with genotype*environment interaction a reduction of the pre- cision of the measurement of the predictor.
- Our finding indicated that the transcriptome of seedlings grown on fil- ter paper is a good proxy of the gene activity for a broad range of developmental stage of plants grown in a diverse set of environments..
- [35], we have observed rather small differences between the optimal weight of the three predictors across the three examined traits, despite that these were assessed at completely different develop- mental stages.
- In the above described grid search, the SV and ePAV data sets were extracted from the mRNA sequencing data of multiple tissues.
- Due to the above described quan- titative genetic advantage of the seedling sample but also the logistical advantages of using seedling sam- ples that are generated on filter paper in petri dishes:.
- The prediction accuracy of the original sequencing depth was not influenced by predicting the phenotypic traits from SV and ePAV features extracted from the seedling sample instead from the three tissues.
- Therefore, we performed down- sampling simulations to examine the reduction of the prediction accuracy if the sequencing depth is reduced..
- about 2x10 5 2x150 bp reads, is not the reduction of the median of the prediction accuracy but the increasing standard devi- ation (Fig.
- This increase is caused by the increasing sampling variance of the low depth sequencing.
- about 1x10 6 2x150 bp reads, the obtained predic- tion accuracy was in more than 95% of the resampling runs higher than that obtained with the SNParray data set..
- Seeds of the 23 spring barley inbreds were sown in con- trolled greenhouse conditions with 16 hours light and eight hours dark at 22 °C.
- A fragment of the youngest fully developed leaf from two different plants was collected for each inbred.
- For a total of six inbreds, apices were harvested at stage 47 of the Zadoks scale [47].
- The petri dishes were placed in the greenhouse with the above described environmental conditions.
- At each of the four agro-ecologically diverse environments in Germany, the 23 barley inbreds were replicated and 19 times, respectively.
- This data set is designated in the fol- lowing as SNParray.
- of the 23 inbreds (Casale et al.
- Trinity was used to perform a de novo assembly of the unmapped reads of all inbreds [49]..
- All contigs that had a homology (e-value ≤ 1e-5, identity ≥ 98.0%) to an anno- tated protein in at least one of the species Arabidopsis thaliana, Brachypodium distachyon, Sorghum bicolor, Zea mays, Oryza sativa, Triticum aevisticum, Triticum dicoccum, and Secale cereale were retained.
- using a gene annotation file that comprised low and high confidence genes of transcripts defined in the barley ref- erence genome [6] and the newly identified genes of the de novo assembly..
- Genes which mapped to the reference sequence and were expressed in at least two samples, but which were not available in the IBSC-reference annotation file were des- ignated in the following as newly annotated genes.
- designated in the following as T, where the indexes l, s, a were used to separate the tissues leaf, seedling, and apex..
- For each tissue, a presence call was made for each inbred- gene combination in the matrix of presence/absence calls, if T >.
- 10% of the maximum value of T for a gene-tissue combination (cf.
- For all inbred-gene com- binations with a presence call for at least one tissue, a presence call was kept in the across tissue matrix of pres- ence/absence calls.
- An absent call was kept in the across tissue matrix of presence/absence calls for all inbred-gene combinations with only no or absent calls across tissues..
- These genes were designated in the following as ePAV which have an across tissue ePAV call of present and absent each for at least two inbreds (cf.
- For the SNP from the SNParray dataset for which no missing data was observed, the Q 90 of the major allele frequency was calculated per population to consider random deviations from an allele frequency of 0.5.
- These 1,972 SNP that have a present and absent each for at least one inbred were designated in the following as gPAV-SNP (Additional file 1: Figure S2).
- A total of 14,843 barley genes comprised in their coding sequence one SNP from the SNParray and were designated in the following as genic SNP..
- were detected as ePAV out of the total number of detected ePAV.
- We estimated 1- β ∗ and α ∗ firstly for ePAV determined based on T of the entire gene as well as based on T calculated for 10 bp large windows surrounding the genic SNP..
- For each gene, randomly 20% of the entire gene length were selected and transcript calling and ePAV detection were performed.
- If the allele call was different between the tissues of the same inbred, the call with the higher sequencing depth was retained.
- Missing values in the matrices of SV and ePAV were mean imputed..
- Each of the three phenotypic traits leaf angle, heading date, and plant height was analyzed across the four envi- ronments using mixed models.
- The performance of the barley inbreds was predicted using different predictors: (i) SNParray, (ii) SV, (iii) ePAV, (iv) T l , (v) T s .
- The dimension of W is deter- mined by the number of barley inbreds and the number of features in the corresponding predictor (m SNParray = 44,045 m SV m ePAV = 38,810, m Tl = 60,888, m Ts = 67,844).
- The median of the prediction accu- racy across the 1,000 cross-validation runs was calculated..
- From the original data set of seedling samples, the number of reads was randomly reduced to 10, 5, 1, and 0.5% of the original number of reads per inbred.
- S1: Characterization of the not annotated contigs established by the transcript calling.
- Population structure of the 23 barley inbreds.
- The sequence data have been deposited in the NCBI Sequence Read Archive (SRA) under accession PRJNA534414.
- A chromosome conformation capture ordered sequence of the barley genome.
- Transcriptome analysis of the vernalization response in barley (Hordeum vulgare) seedlings.
- Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome.
- Extension of the bayesian alphabet for genomic selection

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt