« Home « Kết quả tìm kiếm

Protein sequences


Tìm thấy 20+ kết quả cho từ khóa "Protein sequences"

A context-free encoding scheme of protein sequences for predicting antigenicity of diverse influenza A viruses

tailieu.vn

CFreeEnS, based on the the published similarity matrices of amino acids, can capture the most important properties regard- ing the similarity of sequence pairs without designing features case-by-case. l) of the two sequences respectively. stands for a gap in the aligned protein sequences. Therefore, we assume that HA1 protein sequences of the exist- ing influenza viruses are available.

Assessment of databases to determine the validity of β- and γ-carbonic anhydrase sequences from vertebrates

tailieu.vn

The BLASTP program from the NCBI database identi- fied β-CA protein sequences from some vertebrates, in- cluding Lipotes vexillifer (XP Pantholops hodgsonii (XP Homo sapiens (SJM31717.1), and Oncorhynchus tshawytscha (XP_. In addition, the TBLASTN program of Ensembl genome browser 95 identified the genomic lo- cation for a β-CA gene in M. The aforementioned methods identified γ-CA protein sequences from some vertebrates, including L.

Improving protein domain classification for third-generation sequencing reads using deep learning

tailieu.vn

In the 3-branch model, each branch mod- els the translated protein sequences separately before the merging layer right before the two-layer classifier. In con- trast, in our 3-frame encoding, all three translated protein sequences were processed and combined by the 3-channel convolution filter in the first convolutional layer..

Complete genome sequences of Streptomyces spp. isolated from diseasesuppressive soils

tailieu.vn

This analysis is only expected to reveal the correct sequence variant when (i) the indel is present within a coding DNA sequence (CDS), (ii) correct protein sequences for close homologs are present in GenBank, and (iii) the 300 base- pair window that is searched is sufficiently focused such that top BLAST hits align to the translated query in the region of the variant locus (i.e. at the center of the query, not the edges). General characteristics of the genome sequences.

ExUTR: A novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data

tailieu.vn

Although ExUTR enables a genome-wide prediction of 3′-UTR sequences from massive RNA-Seq data without well-assembled and -annotated genomes, it does depend on genomic resources, particularly well- defined protein sequences. However, ExUTR only takes well- assembled transcripts as inputs and outputs corre- sponding predicted 3′-UTR candidates, but cannot automatically report alternative 3′-UTR sequences..

ProTstab – predictor for cellular protein stability

tailieu.vn

These tools are based on different principles, including amino acid sequences [1, 2], protein chain lengths [3, 4], physicochemical features [5],. Some of the predictions are rather simple to calculate, such as lengths of protein sequences. Substantially larger number of prediction methods forecast effects of single amino acid substitutions on protein stability.

Analysis of Theileria orientalis draft genome sequences reveals potential species-level divergence of the Ikeda, Chitose and Buffeli genotypes

tailieu.vn

Predicted protein sequences were examined with blastp- fast (e-value cutoff 1e-5) against a local version of the non-redundant protein database (downloaded . For characterisation of the presence of T. orientalis genes in sequences generated by this study and absence in the Shintoku genome sequence, translated Robertson, Fish Creek and Goon Nure proteins.

ReVac: A reverse vaccinology computational pipeline for prioritization of prokaryotic protein vaccine candidates

tailieu.vn

This component is an adaptation of the one above that runs against a non-redundant database of a commensal organism of choice. [45] which looks for Simple Sequence Repeats, up to 10 base pairs in length, in DNA coding sequences and 500 bp upstream of the gene. 0.01 times the total length of the repeat.. The above script was adapted to run on protein sequences looking for repeats, up to 20 amino acids in length, which would allow conformational changes in the protein.

Comprehensive genome-wide identification of angiosperm upstream ORFs with peptide sequences conserved in various taxonomic ranges using a novel pipeline, ESUCA

tailieu.vn

One major problem with identifying CPuORFs is that there are cases where a uORF found in the 5′-UTR of a transcript is fused to the mORF in an isoform of the transcript, and in some of these cases, such uORF sequences are conserved because they ac- tually encode parts of mORF-encoded protein sequences.. Such an ORF can be extracted as a CPuORF if the amino acid sequence in the N-terminal re- gion of the protein is evolutionarily conserved.

A new estimation of protein-level false discovery rate

tailieu.vn

In this framework, we are permuting protein sequences and performing search- ing against these fake sequences on a dataset to get the corresponding null distribution at the protein-level before p-value and FDR calculation. More impor- tantly, once the null/permutation distribution is available, we can calculate p-values and the FDR without search- ing a decoy database.

Genome-wide identification and expression profile under abiotic stress of the barley non-specific lipid transfer protein gene family and its Qingke Orthologues

tailieu.vn

First, 109 amino acid sequences were ob- tained after conducting a BLASTP analysis of the IPK Barley BLAST Server using previously reported nsLTP protein sequences of Arabidopsis (79), maize (63), cab- bage (63) and rice (77) as queries (Table S1), and the re- dundancy was checked. Then, each of the deduced protein sequences was manually assessed through the analysis of the cysteine residue motifs (8CM), and 107 proteins lacking the Cys residues were omitted from the.

A partially function-to-topic model for protein function prediction

tailieu.vn

First of all, the text dataset is composed of several documents numbered D1 to Dn, and the protein function dataset is composed of several protein sequences numbered P1 to Pn. ‘table’ and ‘database’. Then amino acid blocks are the main component of protein sequence, such as. ‘MS’ and ‘TS’. Likewise for protein function data, an amino acid block - protein sequence matrix is computed for the con- struction of protein BoW.

In-depth comparative analysis of malaria parasite genomes reveals protein-coding genes linked to human disease in Plasmodium falciparum genome

tailieu.vn

Each protein sequence of the six Plasmodium species was used as a query and searched against the total protein sequences of these species by phmmer. Further decrease in the thresh- old led to a slight increase in the number of components.. An increase in cut-off values from 0.4 to 0.5 led to a signifi- cant drop in the number of clusters, implying that many cluster structures were not well identified.

Genome-wide identification of lysin motif containing protein family genes in eight rosaceae species, and expression analysis in response to pathogenic fungus Botryosphaeria dothidea in Chinese white pear

tailieu.vn

After removing redundant and incomplete gene sequences, the longest transcript of the same gene was retained. The PbrLYP genes showed a random distribution on eight of the 17 chromosomes and three unanchored scaffolds (scaffolds681.0, scaf- folds831.0, and scaffolds897.0) in pear (Supplementary Fig. Phylogenetic analyses of the LYP protein sequences were performed in order to classify the LYP genes and investigate their evolutionary relationships.

Protein targets of thiazolidinone derivatives in Toxoplasma gondii and insights into their binding to ROP18

tailieu.vn

Alignment of the protein sequences of TgPDI and the A chain of hPDI. Alignment of the protein sequences of TgRNR2 and the A chain of the PvRNR2. Representation of the motions explained by the 1st (a), 2nd (b), 3rd (c), and 4th (d) PC-mode obtained from concatenated PCA. The authors EC, AH, JEG, DM, CRR and LP participated in the design of the research. Flexibility of the P-loop of Pim-1 kinase: observation of a novel conformation induced by interaction with an inhibitor.

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms

tailieu.vn

G3PO is a new gene prediction benchmark containing 1793 orthologous sequences from 20 different protein families, and designed to be as representative as possible of the living world. The quality of the protein sequences in the bench- mark was ensured by excluding sequences containing po- tential annotation errors, including deletions, insertions and mismatched segments.

Parallel identification of novel antimicrobial peptide sequences from multiple anuran species by targeted DNA sequencing

tailieu.vn

Thirteen of the pep- tides (10% of the total) showed significant BLAST e-values but <. 80% identity with AMPs deposited in the protein sequences databases (see Table 3 and Add- itional file 7).

In silico characterization of multiple genes encoding the GP63 virulence protein from Leishmania braziliensis: Identification of sources of variation and putative roles in immune evasion

tailieu.vn

All of the protein sequences derived from genes that met these inclusion cri- teria were submitted to the analysis of the OrthoMCL program [48], and grouped according to homology using the Markov Cluster algorithm [49]. Eight of the most variable paralogs from different L.. Specific regions of the protein models were then evaluated using the initial alignment information, highlighting the non-conserved re- gions which were characterized by amino acid exchanges..

Cloning a lysine rich protein gene from potato cultivar Thuong Tin and construction of the expression vector

tailieu.vn

STtLR gene and protein sequences analyses STtLR gene had similarity of 94% and 99%. with the genes encoding lysine-rich proteins of potatos that were registered in GenBank with the Accession numbers AY377987.1 and KU987257.1, respectively. The results of SmartBlast search revealed high similarity between STtLR protein and a lysine-rich protein from Solanum tuberosum (Accession No..

Genome-wide analysis of the serine carboxypeptidase-like protein family in Triticum aestivum reveals TaSCPL184-6D is involved in abiotic stress response

tailieu.vn

We downloaded the protein sequences from the Ensembl Plants database (http://plants.ensembl.org/. index.html) [66] and obtained the Hidden Markov Model (HMM) profile of the SCPL conservative domain (PF00450) from Pfam (https://pfam.xfam.org) [67–69].. After this verification, all of the candi- date SCPL genes were identified in the wheat genome..