Database @ CAPS :

STIFDB-Arabidopsis Stress Responsive Transcription Factor DataBase.

STIFDB is a database of abiotic stress responsive genes and their predicted abiotic transcription factor binding sites in Arabidopsis thaliana. We integrated 2269 genes upregulated in different stress related microarray experiments and surveyed their 1000 bp and 100 bp upstream regions and 5'UTR regions using the STIF algorithm and identified putative abiotic stress responsive transcription factor binding sites, which are compiled in the STIFDB database. STIFDB provides extensive information about various stress responsive genes and stress inducible transcription factors of Arabidopsis thaliana. STIFDB will be a useful resource for researchers to understand the abiotic stress regulome and transcriptome of this important model plant system.

3PFDB-a database of best representative PSSM profiles (BRPs) of protein families generated using a novel data mining approach.

We designed a novel data mining approach for the assessment of individual sequences from a protein family to identify a single Best Representative PSSM profile (BRP) per protein family. Using the approach, a database of protein family-specific best representative PSSM profiles called 3PFDB has been developed. PSSM profiles in 3PFDB are curated using performance of individual sequence as a reference in a rigorous scoring and coverage analysis approach using FASSM. We have assessed the suitability of 10, 85,588 sequences derived from seed or full alignments reported in Pfam database (Version 22). Coverage analysis using FASSM method is used as the filtering step to identify the best representative sequence, starting from full length or domain sequences to generate the final profile for a given family. 3PFDB is a collection of best representative PSSM profiles of 8,524 protein families from Pfam database.

MegaMotifbase [Structural Motifs Database]

MegaMotifbase is a database of structural motifs for protein structures related at the family and/or superfamily level. Such motifs among structurally aligned proteins are recognized by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other important structural features like secondary structural content, hydrogen bonding pattern and residue packing. These motifs may form the common core by maintaining a particular spatial orientation pattern when compared across different proteins belonging to the same family or superfamily. Such motifs can also be employed to design and rationalize protein engineering and folding experiments. Therefore, the MegaMotifbase can be a useful resource to gain knowledge about structure and functional relationship of proteins.

SMoS [ Structural Motifs of Superfamilies ]

SMoS is a database of Structural Motifs of aligned protein domain superfamilies. These Structural motifs along with their sequence and spatial orientation, represent the conserved core structure of each superfamily fold.

SUPFAM

Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure.

iMOTdb : Spatially Interacting Motifs in Proteins

The interacting motif database or iMOTdb , lists interacting motifs that are identified for all structural entries in the PDB. The conserved patterns or finger prints are identified for individual structural entries and also grouped together for reporting the common motifs shared among all superfamily members. The iMOT package (Bhaduri et al., 2004 ) has been employed for identifying the motifs in the database. Interacting motifs has been shown to assist our understanding of proteins structure and function. Information on such motifs should be of valuable in protein folding, modeling and engineering experiments. As shown in previous studies conserved spatially interacting motifs act as important constraint in pattern based remote homology search methods (Bhaduri et al., 2004 ). The interacting motifs representing the superfamilies of proteins are derived from structural alignments obtained from PASS2 (Bhaduri et al., 2004 ). These motifs are finger prints for a given protein family and provides useful insights regarding the structural and functional role regarding the protein. Pseudo potential evaluated between the various pairs of motifs reflects the interacting strength between the regions and highlights the thermodymanic stability of the local substructure. The database would thus provide useful insight into the understanding of the folding, structural modeling and envisaging mutational exercise on a given polypeptide.

DDBASE [Protein domain database]

A database of globular domains,derived from a non-redundant set of proteins,is useful for the sequence analysis of aligned domains,for structural comparisons, for understanding domain stability and flexibility and for fold recognition procedures. Domains are defined by the program DIAL and classified structurally using the procedure SEA.

DSDBASE [Protein disulphide database]

DSDBASE is a database on disulphide bonds in proteins that provides information on native disulphides and also those which were stereochemically possible between pairs of residues in a protein.This database has potential applications in protein engineering and in the structural biology field.

GenDiS:Genomic Distribution of Protein Structural Superfamilies

Several proteins that have substantially diverged during evolution retain similar three-dimensional structures and biological function inspite of poor sequence identity. The database on Genomic Distribution of protein structural domain Superfamilies (GenDiS) provides record for the distribution of 4001 protein domains organized as 1194 structural superfamilies across 18 997 genomes at various levels of hierarchy in taxonomy. GenDiS database provides a survey of protein domains enlisted in sequence databases employing a 3-fold sequence search approach. Lineage-specific literature is obtained from the taxonomy database for individual protein members to provide a platform for performing genomic and phyletic studies across organisms. The database documents residual properties and provides alignments for the various superfamily members in genomes, offering insights into the rational design of experiments and for the better understanding of a superfamily.

The current version contains 628 multi member superfamilies and 566 structure based sequence annotated single member superfamilies. Sequence members for the superfamilies in different genomes have been listed and aligned. Links have been provided to the conserved interacting motifs and the hidden markov models for the different superfamilies present in PASS2. Sequence alignment with PASS2 members are possible using MALIGN and JOY. Sequence searches against the PASS2 database using PSI-BLAST and PHI-BLAST can also be pursued. Prediction of a superfamily based on structural compatibility can also be performed. Each entry of PASS2 has been linked to other databases and tools for easy access of information. Structure-based alignments provides a firm basis for understanding and predicting amino acid substitution in superfamilies and for developing methods of fold recognition. Analysis of such sequence alignments provides a mean understanding of structural and functional similarities in protein superfamilies and in interpreting additional information when structures of new members of a superfamily are determined.

Protein superfamily alignment database(PASS2):

PASS2 is an automatic version of the original superfamily alignment database, CAMPASS (CAMbridge database of Protein Alignments organised as Structural Superfamilies,Sowdhamini et al., 1998). PASS2 contains alignments of protein structures at the superfamily level and is in direct correspondence with SCOP 1.63 release(Structural Classification Of Proteins,Murzin et al.,1995).