TitleGenome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes.
Publication TypeJournal Article
Year of Publication2018
AuthorsIyer MS, Joshi AG, Sowdhamini R
JournalMol Omics
Date Published2018 Jul 04
ISSN2515-4184
Abstract

Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.

DOI10.1039/c8mo00008e
Alternate JournalMol Omics
PubMed ID29971307