« Home « Kết quả tìm kiếm

Colombia, an unknown genetic diversity in the era of Big Data


Tóm tắt Xem thử

- Colombia, an unknown genetic diversity in the era of Big Data.
- Background: Latin America harbors some of the most biodiverse countries in the world, including Colombia..
- Despite the increasing use of cutting-edge technologies in genomics and bioinformatics in several biological science fields around the world, the region has fallen behind in the inclusion of these approaches in biodiversity studies.
- We aimed to determine how much of the Colombian biodiversity is contained in genetic data stored in these public databases and how much of this information has been generated by national institutions..
- Results: In Nucleotide, we found that 66.84% of total records for Colombia have been published at the national level, and this data represents less than 5% of the total number of species reported for the country.
- This number of species reported for Colombia span approximately 0.46% of the total.
- biodiversity reported for the country (56,343 species).
- Finally, in PATRIC database, 13.25% of the reported sequences were contributed by national institutions.
- Conclusions: Our findings show gaps in the representation of the Colombian biodiversity at the molecular and genetic levels in widely consulted public databases.
- This fact should be taken as an opportunity to foster the development of collaborative projects between research groups in the Latin American region to study the vast biodiversity of these countries using ‘ omics ’ technologies..
- 1 Bioinformatics Unit, Centro de Bioinformática y Biología Computacional de Colombia – BIOS, Manizales, Colombia.
- Full list of author information is available at the end of the article.
- 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Colombia is one of the top countries that harbor the greatest diversity worldwide, due to high species richness for various taxonomic groups [1, 2].
- study the philo- patry of species, distribution and local adaptations by comparing neutral or conserved variations in the gen- ome [13], as well as generate animal and plant breed- ing programs based on genetic markers [14]..
- Some of the achievements of both molecular biology and bio- informatics include functional genomics, where is pos- sible to study genes, proteins, and protein function, gene and protein expression in a cell under given con- ditions, 3D model generation in order to predict protein.
- In Latin America, DNA sequence information gen- eration and bioinformatics have advanced slowly com- pared to other regions of the world [23].
- In this study, we aimed to determine the amount of sequencing data of the Colombian biodiversity submit- ted by national institutions that is available in four main genetic sequence databases, including: Nucleotide and BioProject of the NCBI [28], Pathosystems Resource Integration Center (PATRIC) bacterial bio- informatics database [29], and Barcode of Life Data (BOLD) Systems [30].
- Furthermore, in order to obtain a broad and comparative view of the status of genetic diversity knowledge generation in Latin Amer- ica, we compared this information for Colombia with other countries of high biodiversity in the region..
- We determined the level of representation of the Colombian biodiversity in the molecular data stored in public genetic sequence databases, and compared these findings with data for Latin American countries such as Argentina, Brazil, Costa Rica, Mexico, and Peru, as these countries are also harbor a high biodiversity..
- Searches in the Nucleotide and BioProject databases were carried out using the Entrez Direct utility on the UNIX command line, which allowed data retrieval and formatting to generate customized downloads.
- Esearch –db nucleotide –query “Name of the country”.
- Esearch db – bioproject –query “Name of the country”.
- Once the records for each country were retrieved from each of the databases, the data processing step involved counting the entries per country, using customized scripts written in awk programming language.
- We con- sidered an entry to be published at the national level if the name of a national entity was mentioned in the re- spective search field in the databases.
- unknown collections were not taken into account, be- cause of the lack of information about their origin.
- For the search term “Colombia”, 479,319 total records were found, with 320,420 sequences (66.84%) published in the country.
- Among these, 253,006 entries belong to the transcriptome assembly project titled “Transcriptome ana- lysis of the Caribbean reef-building coral Pseudodiploria strigosa reveals a complex immune repertoire” [31]..
- Among the 67 national institutes identified, Universidad de Sao Paulo has provided most of the records (70.13.
- Nearly 3100 species were identified in the total records, of which the best-represented organisms were uncultured bacteria (108,611 records), Acinetobacter baumanii (22,319 records), Klebsiella pneumoniae (21,559 records), Enterococcus faecalis (15,136 records), and Escherichia coli (9941 records)..
- For Mexico, 157,797 records out of 1,349,367 were sub- mitted by 63 national institutions, representing 11.69% of the total records.
- Argentina showed the highest number of records re- trieved in the search where 57 national.
- institutes have submitted 4.72% of the records.
- Overall, eight institutes have submitted data for 705 species, led by the Instituto Nacional de Biodiversidad, publishing most of the records (3880).
- Finally, 1.01% of the total records for Peru (645,753) have been deposited by national institutes.
- 1 Main Colombian institutes that submit data to the Nucleotide (NCBI) database (Release 219.0 of April 15 of 2017).
- Table 2 shows the percentages of mammal, bird, rep- tile, amphibian, and vascular plant species representation based on nationally submitted genetic data available for each country in the Nucleotide database compared to referenced species diversity values..
- For the search term “Colombia”, 193 records were found, of which have been reported by Colombian institutions..
- For Brazil, of the total 558 records reported, 317 have been generated by national institutions, representing 56.8%..
- Costa Rica shows the least amount of records (40) in the BioProject database, and only three have been published by national institutions.
- representation at a national level for each country compared to the referenced values of species diversity shown.
- Costa Rica .
- Interestingly, the Joint Genome Institute (JGI) (USA) has provided the majority of records (25%) available for Costa Rica, through data of the termite gut metagenome (BioProject accession: PRJNA .
- represented for the different countries.
- The search term “Colombia” retrieved a total 6457 entries, with submitted by national institutions.
- Universidad de los Andes and the Instituto de Investigación de Recursos Biológicos Alexander von Humboldt have provided the majority of records for the country, while there were no submissions found from institutes of the Eje Cafetero region.
- 2 Colombian institutes that submit data to the BioProject (NCBI) database (Consulted June of 2017).
- sequences than any of the other countries surveyed, and these records represent 92.2% of the total records for the country.
- Likewise, Argentina has a high representation of insect barcode records (79,298), which represents 88.1% of the national total records, and also shows the highest number of national records for bird barcode sequences..
- 3 Number of records for each taxonomic supergroup in the BioProject (NCBI) database (Consulted June of 2017) for six Latin American countries surveyed.
- Overall, compared to the other countries, except for Peru, Colombia shows a low data availability in this database..
- “Colombia”, with 44 records published by national.
- Universidad de Sao Paulo, with 128 records, has provided most of the records, followed by Universidad Federal de Para, Universidad Estatal de Campinas, and Table 3 Number of DNA barcode sequence records deposited at a national level in BOLD Systems database (Consulted June of 2017), classified by taxonomic group, for six Latin American countries surveyed.
- 5 Colombian institutes that submit data to the PATRIC database (Consulted June of 2017).
- The best-represented organisms belong to the bacterial genera Acinetobacter and Streptococcus..
- Data belonging to Rhizobium represent 30% of the total of records..
- For Argentina, 487 records were obtained, and na- tional institutions have contributed 9.65% of the data..
- However, for Colombia, approximately 78% of national records belong to a single transcriptomic study, also referenced in the Bioproject database [Accession:.
- This is a clear example of the relevance of da- tabases such as BioProject, since it can gather all.
- information of a project in a single place allowing to eas- ily access all of the data generated in the project without the need to search individually for sequences and avoid missing information.
- Finally, of the total biodiversity values reported in Biodiversity Information System (BIS) for Colombia, less than 5% of the Colombian species richness is represented at a molecular level in this data- base, even considering microbial genetic data [7]..
- Colombia’s history has likely played a role in limiting molecular data generation, due to diverse factors.
- One of them relates to the permits on access to genetic re- sources.
- In Colombia, genetic resources are property of the state, they are inalienable, imprescriptible and non-releasable, and access to them is regulated by the Andean decision 391 (“Régimen Común sobre Acceso a los Recursos Genéticos.
- whereby whoever wishes to access them in the form of genes or derived products, according to the terms established in the decision 391, must request authorization of the state [37]..
- A study carried out by Quintero et al., in 2013 [14].
- regarding the status of access to genetic resources for Colombian research groups registered in Colciencias found that numerous research projects were being car- ried out without the necessary licenses, suggesting that a high percentage of national research was conducted il- legally under the framework of the Colombian legisla- tion.
- Also, the study mentioned that there was no relation between contract times and the time established to carry out the research, whereby a negative perception is generated both in national and international scientists for undertaking studies in the Colombian territory.
- In Colombia, there are several first, second, and third generation sequencers located in both public and private centers, including Universidad de los Andes, Instituto de Genética of the Universidad Nacional de Colombia, CorpoGen, International Center for Tropical Agriculture (CIAT), Corporación Colombiana de Investi- gación Agropecuaria (Corpoica), Universidad El Bosque, Universidad EAFIT, Corporación para Investigaciones Biológicas (CIB), among others.
- In Colombia, interest in biotechnology, bioinformatics, and the “omics sciences” has risen in the last decade..
- However, support by the government is often one of the limiting factors for the development of related projects, due to scarce funding competed among a high number of national researchers.
- Nevertheless, in 2016, this investment decreased to demonstrating that the science and research sector has yet to become one of the national priorities.
- However, due to the lack of financial resources dedicated to the STI sec- tor in the country, which have always been less than 0.8%.
- of the GDP [43], there are not many research projects that can benefit from this governmental aid.
- One of those being the post conflict scenario, an outcome of the peace agreement, where some areas that were once considered dangerous due to armed forces are now declared free zone, providing an oppor- tunity to explore this places that have been off-limits for years.
- This en- deavour is carried out by a large number of Colombian research centers and academia in collaboration with re- nowned international institutes, such as Kew Gardens in the UK.
- In addition, another program, namely BRIDGE Colombia [19], is also taking advantage of the new sce- nario proposing to explore broader geographic areas and aiming to increase our biodiversity figures, including genetic and taxonomic studies.
- Colombia has a better species representation in the Nucleotide NCBI database in comparison to other Latin American countries such as Costa Rica and Peru (Table 2)..
- Yet, for Colombia, most of the records belong to a single project for the species Pseudodiploria strigosa.
- During the last decade, some disciplines derived from molecular and genetic data have been gain- ing strength in Latin American countries, mainly be- cause of the benefits they can bring commercially..
- From 2005 to 2015, R&D investment in Latin Amer- ica doubled, with Argentina, Brazil, and Mexico ac- counting for 91% of the total.
- Meanwhile, in the science sector, the country has gone through a funding issue where the investment is exclusively targeted towards research and not to hire researchers.
- Overall, our findings show that the Colombian genetic biodiversity is under-represented in the consulted data- bases and that a great amount of records have been pub- lished by international centers and institutions.
- Furthermore, the existing conflicts and obsta- cles in the granting of contracts of access to genetic re- sources can greatly limit the publication of genetic information generated in the country.
- In order to improve collaborative work it would be necessary to raise a greater interest and know- ledge in the scientific and general community regarding the importance of describing our species, mostly at the molecular level.
- This will represent new resources for us as a society that can enable us to advance economically, for instance, to be used in biotechnology, or in pro of the conservation of our biodiversity which is also a priority for sustainable development.
- Projects where a party can contribute with skilled personnel, an- other one with equipment, and another with regulatory framework that helps with obtaining and releasing data will foster collaboration in the country and benefit all parties as- sociated, and progressively we could gain better academic and industrial relationships with developed countries..
- This fact shows how the region has lagged behind in development of NGS and bioinformat- ics technologies to study the large number of species that are only present in this region of the world.
- To ac- complish this, a greater number of collaborative efforts between Latin American and international scientists in regional and worldwide partnerships must be developed to promote research and publications in genomics fields and related science fields in the region..
- Funding for this research was provided by the Centro de Bioinformática y Biología Computacional de Colombia – BIOS, Manizales, Colombia and the Caldas Bioregión Royalities Project of the Governorate de Caldas.
- Publication charges were provided by the Centro de Bioinformática y Biología Computacional de Colombia – BIOS, Manizales, Colombia..
- The datasets supporting the conclusions of this article are available in the Genbank (release 219.0), BioProject, BOLD Systems, and PATRIC repositories, [https://www.ncbi.nlm.nih.gov/genbank/release/219/, https://www.ncbi.nlm.nih.gov/.
- The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/.
- Revision of the status of bird species occurring or reported in Colombia 2016 and assessment of BirdLife International ’ s new parrot taxonomy.
- Metagenomic analysis indicates that stressors induce production of herpes-like viruses in the coral Porites compressa.
- A review of the application of molecular genetics for fisheries management and conservation of sharks and rays.
- Database resources of the National Center for biotechnology information.
- The immunotranscriptome of the Caribbean reef-building coral Pseudodiploria strigosa.
- Transcriptome analysis of the entomopathogenic fungus Beauveria bassiana grown on cuticular extracts of the coffee berry borer.
- The Role of a Small Country in the Global Context of Bioinformatics

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt