« Home « Kết quả tìm kiếm

Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data


Tóm tắt Xem thử

- Using whole genome sequence (WGS) data, we assess two methods for detecting mixed infection: (i) a combination of the number of heterozygous sites and the proportion of heterozygous sites to total SNPs, and (ii) Bayesian model-based clustering of allele frequencies from sequencing reads at heterozygous sites..
- We found that both approaches were effective in distinguishing between pure strains and mixed infection where there was relatively high (>.
- A large dataset of clinical isolates ( n = 1963) from the Karonga Prevention Study in Northern Malawi was tested to examine correlations with patient characteristics and outcomes with mixed infection.
- The frequency of mixed infection in the population was found to be around 10%, with an association with year of diagnosis, but no association with age, sex, HIV status or previous tuberculosis..
- Keywords: Mycobacterium tuberculosis, Tuberculosis, Bioinformatics, Epidemiology, Genomic analysis, Mixed infection.
- they can represent sequencing errors, but heterozygous calls may be biologically relevant and indicate the pres- ence of mixed infection [4–6]..
- Mixed infection occurs when two or more strains of the same species of pathogen are present in an individ- ual host at any one time.
- Additionally, attempts to reconstruct the trans- mission of bacterial pathogens can be complicated as only one strain of a mixed infection may be represented and true transmission links may not be established [5]..
- Previous attempts to determine the presence of mixed M.
- In this setting we assess the prevalence of mixed infection in an unselected population, and examine correlations with patient char- acteristics and outcomes..
- In mixed infection sam- ples, mapped sequences at these sites will be a combin- ation of reads from one strain carrying a SNP at this position and reads from one or more additional strains that do not, resulting in more than one allele call.
- Detecting mixed infection using the heterozygous base calls The first approach to detect mixed infection used the number of heterozygous base calls across the genome to set a minimum threshold for distinguishing mixtures (de- noted as the “heterozygous sites method.
- This simple method allows for rapid identifi- cation of potential mixtures in large datasets without re- quiring the more complex interrogation of the sequence reads to calculate allele frequencies at heterozygous sites..
- Detecting mixed infection with Bayesian model-based clustering.
- An alternative approach for detecting mixed infection was employed that estimated the number of strains present in a sample through Bayesian model-based clustering of allele frequencies at heterozygous sites, implemented through the mclust package in R [27.
- The allele frequencies of heterozygous sites in mixed infection samples will cluster at similar frequencies in a set number of groups depending on the number and pro- portion of strains present.
- On the other hand, the allele frequencies of heterozygous sites in pure samples, though there may be a high number of heterozygous sites in sam- ples with high clonal heterogeneity, will be more randomly distributed without clustering.
- Our model aimed to determine if the allele frequencies of heterozygous sites in a sample can be optimally clus- tered into groups relating to mixed infections of two strains, or if the sample is a non-mixed, pure strain..
- A sample is classified as being a mixed infection of two strains ( G = 2) where, (i) the number of heterozygous sites is >.
- Samples were classified as likely containing a single strain (unmixed) where, (i) the number of heterozygous sites is ≤ 10 or (ii) the num- ber of heterozygous sites is >.
- Table 1 shows the sample information for each artificial mixture along with the results of both mixture detection approaches, arranged by the known major strain propor- tion and then by the number of heterozygous sites..
- For the heterozygous sites method, a clear threshold that discriminates between mixed samples and pure strains was not attained with our analysis, though with a heterozygous SNP threshold of ≥20 sites, all but one sam- ples with a major proportion of and 0.90 (11/.
- 1.5% heterozygous to total SNP proportion for samples containing between 11 and 19 heterozygous sites correctly identifies the 0.90 major proportion sample with less than 20 heterozygous sites (ERR221649) as a mixed infection, with still no pure samples incorrectly classified..
- heterozygous sites randomly distributed between 0 and 1.
- b demonstrates the characteristic pattern of mixed infection with two different strains, with the read frequencies clustering into two distinct clusters with means around 0.90 and 0.10, implying a 0.9/0.1 mixture.
- 1.5% heterozygous sites to total SNP proportion in samples with 11–19 heterozygous sites.
- One 0.95/0.05 sample had a heterozygous proportion over 1.5% but contained only 6 heterozygous sites so was indistinguishable from clonal variation.
- In total, 9/36 mixed samples were misidentified as pure strains using this approach, performing worse than the heterozygous sites method (3/36 mixed samples mis- identified).
- The allele frequencies at heterozygous sites in these samples are shown in Fig.
- The Bayesian mixture method also allows for an estimation of the mixing proportions of samples iden- tified as mixed infection.
- 2 The plotted allele frequencies of reads at heterozygous sites in samples misidentified as pure strains in artificial mixtures of two strains using the Bayesian model-based clustering approach.
- The characteristic pattern of mixed infection that would be expected in samples of more than two non-clonal strains, e.g.
- Identifying mixed infection in replicate samples.
- Using the heterozygous sites method with a threshold of.
- All replicate samples were identified as pure strains using the Bayesian clustering approach, including the four samples deemed mixed infection using the het- erozygous sites method..
- A table showing the sensitivity and specificity of both the heterozygous sites and Bayesian clustering ap- proaches with the artificial mixture and replicate sam- ples is shown in Table 2.
- At present, there is no gold standard test for detecting mixed infection in M.
- 3 A comparison of the major strain proportion estimated through Bayesian model- based clustering (blue) against the known majority strain proportion (red) in all in vitro artificial mixture samples ( N = 48).
- The standard deviation of allele frequencies of heterozygous sites around the mean of the estimated major proportion is shown by the error bars in black.
- The heterozygous sites method had a higher sensitivity than the Bayesian clus- tering method in detecting the true positive rate of mixed infections from the artificially mixed samples (91.7 to 75.0.
- A final evaluation of both the heterozygous sites and Bayesian clustering methods was carried out using to 168 in silico mixed samples (and the pure parental strains) with a priori known mixture proportions of and Additional file 1).
- bovis sam- ples were then used to assess the prevalence of mixed infection in this population.
- Both the heterozygous sites.
- and Bayesian clustering approaches were applied to this dataset to identify isolates likely to be mixed infection..
- There was high concordance between the number of mixed infections identified with the heterozygous sites (195/1963.
- 1.5%, thus the number of heterozygous sites was the classifying factor with these samples using this approach..
- There were nine occurrences where mixed infections were found using the heterozygous sites approach, but samples were deemed single strains when applying the Bayesian clustering method.
- Figure 5a shows a fre- quency histogram for the number of heterozygous sites found in all samples with the classification of mixed infec- tion or pure strain through the Bayesian clustering method.
- Allele frequency of reads at heterozygous sites plots for the nine discrepant samples are shown in Fig.
- Associations with mixed infection.
- Of the possible risk factors assessed, only the year of col- lection has a significant association with mixed infection of TB strains ( p = 0.009).
- Patients with smear-negative pul- monary tuberculosis (SNPT) were also found to be more likely to harbour a mixed infection than patients smear-positive pulmonary tuberculosis..
- Table 2 The sensitivity and specificity of the heterozygous sites and Bayesian model-based clustering approaches for detecting mixed infection in artificial mixture and replicate samples.
- Calculations assume that the 4 technical replicates of one sample that were classified as mixed by the heterozygous sites method came from a pure sample.
- Number of mixed samples detected.
- Heterozygous sites method Bayesian model-based clustering.
- No other disease characteristics were found to be signifi- cantly associated with mixed infection..
- We have developed methods that can be used to detect the signals of mixed infection in M.
- We found that the signal from heterozygous sites alone was sufficient to identify mixtures in both artificially mixed and clinically-derived samples, with mixed infection confi- dently predicted in samples with a low number of het- erozygous sites (12 and 11 SNPs with the heterozygous sites and Bayesian clustering approaches).
- There were key differences between the heterozygous sites and Bayesian clustering approaches that led to dif- ferent numbers of mixed samples being reported in dif- ferent datasets.
- In the artificial in vitro mixed samples, we found that the heterozygous sites method had better sensitivity in detecting mixed samples, with only 3/36 mixtures not identified compared to 9/36 samples mis- identified using Bayesian clustering.
- The signal from the allele frequencies of reads in these samples was indistin- guishable from clonal heterogeneity that could be found in pure samples and so the Bayesian clustering could not effectively identify the characteristic patterns of mixed infection in these samples..
- In the replicate samples, the heterozygous sites method identified four samples as mixed infection that were not found to be mixed using the Bayesian clustering method..
- 4 A comparison of the major strain proportion estimated through Bayesian model- based clustering against the known majority strain proportion in the in silico two-strain mixture samples ( N = 168).
- The standard deviation of allele frequencies of heterozygous sites around the mean of the estimated major proportion is shown by the grey crosses.
- Portuguese isolate were identified as mixed infection with the heterozygous sites approach.
- In these cases, as well as with the nine samples in the clinical Malawi dataset where there was a different classification between detection methods, it may be that an isolate has relatively high levels of clonal variability, resulting in false-positives when using the heterozygous sites approach..
- The Portuguese samples were either multidrug or extensively-drug resistant and, while SNPs in known drug resistance loci were removed from the analysis, other associated sites that were under selection may have been retained that appear as heterozygous sites.
- Consequently, drug resistant samples may have a relatively high number of heterozygous sites with variable allele frequencies.
- These samples will be cor- rectly differentiated from mixed infections where allele frequencies at heterozygous sites will be consistent across the genome by the Bayesian clustering method, but may be incorrectly identified as mixed infections with the heterozygous sites method.
- Multidrug resist- ance has also been linked to increased mutation rates and hyper-mutant strains in TB, particularly in ‘ Beijing ’ strains [30, 31], which may also increase levels of hetero- geneity in clonal isolates and lead to samples incorrectly classified as mixed infection when using the number of heterozygous sites alone.
- As such, it appears that the heterozygous sites method is more sensitive in identify- ing mixed infection but may overestimate the number of mixed infections in a population.
- The Bayesian cluster- ing method though will have a lower sensitivity in detecting mixed infection but a higher specificity in cor- rectly identifying pure strains..
- Samples where the minority strain proportion was very low proved more difficult to accurately identify in both the in vitro and in silico artificially mixed samples, and this problem has been highlighted in previous attempts to detect mixed infection [4, 5].
- In the in vitro artificial mixtures with a majority strain proportion of 0.95, only 9/12 could be identified as mixed infection with heterozygous proportions, and 4/12 identified through Bayesian clustering.
- No in silico artificial mixtures with a 0.05 minority proportion were able to be identified compared to pure strains as the number of heterozygous sites in these samples was found to be very low (between 0 and 2 sites across all 56 samples).
- tuberculosis isolates from the Karonga Prevention Study in Malawi with both the heterozygous sites and Bayesian clustering methods we found evidence of mixed infection in between 9.5 – 9.9%.
- of the population.
- The incidence of mixed infection found in Malawi is lower than has been identified in samples from Cape Town, South Africa (19% between Beijing and non-Beijing strains) [32], consistent with the much higher incidence of tuberculosis in South Africa [18, 33], with TB incidence suggested to be linked to the rate of mixed infection [6, 7]..
- Additionally, the rate of mixed infection in South Africa was estimated using RFLP and spoligotype analysis directly from sputum, whereas our methods have used whole genome data isolated from solid culture.
- 5 A closer inspection of samples identified as pure with the Bayesian clustering approach but mixed with the heterozygous sites approach..
- a A frequency histogram of heterozygous sites in Malawi samples identified as mixed infection or pure strains with the Bayesian clustering approach.
- b The plotted allele frequencies of reads at heterozygous sites for samples identified as mixed using heterozygous sites approach but as pure strains with the Bayesian clustering approach, with sample ERR323056 shown first.
- Although there is some evidence of the characteristic pattern of mixed infection in some samples, the signal from heterozygous sites is insufficient to identify these strains as mixed infections.
- sequence data at a suitable depth of coverage for the appli- cation of our methods for detecting mixed infection [34]..
- The methods detailed here for identifying mixed infec- tions can be extended to estimate an approximation of the parental strain genomes in mixtures by imputing the nucleotide base call that has come from major and minor strains in a mixed infection at each heterozygous site.
- It may prove more challenging to confidently detect mixed infection in organisms other than M.
- One solution is to use the levels of heterozygosity at the gene-level or in larger genomic regions to look for the signatures of mixed infection.
- We found that these characteristic patterns of mixed infection are present in certain Mycobacterium Regions of Difference (RDs) in some mixed samples (Additional files 2 and 3), and so the methodologies described here could be applied to similar diagnostic marker regions in other taxa to esti- mate the presence of mixed infection..
- Although we have found the rate of mixed infection in our clinical dataset of Malawian isolates to be relatively high (around 10.
- this is still likely to be lower than the true rate of mixed infection as only sputum samples were taken, and many were subcul- tured.
- mixed infection.
- Nine individuals with mixed infections based on heterozygous sites but not with the Bayesian clustering method were excluded.
- in a sample will be more evident, further increasing the number of mixed infections identified..
- Lineage, total number of SNPs, number of heterozygous sites and the mixture analysis result for both Bayesian clustering and heterozygous sites approaches is included for each sample.
- Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission.
- Molecular detection of mixed infections of Mycobacterium tuberculosis strains in sputum samples from patients in Karonga District, Malawi.
- Evidence of exogenous reinfection and mixed infection with more than one strain of Mycobacterium tuberculosis among Spanish HIV-infected inmates.
- Mixed infection and clonal representativeness of a single sputum sample in tuberculosis patients from a penitentiary hospital in Georgia.
- Deep whole-genome sequencing to detect mixed infection of Mycobacterium tuberculosis

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt