| Term |
Definition |
| Family |
An InterPro family is a group of evolutionarily related proteins that
share one or more domains/repeats in common. A InterPro entry of type=family
may contain a signature for a small conserved region that is representative
of the family, and need therefore not necessarily cover the whole protein. |
| Domain |
An InterPro domain is an independent structural unit which can be found
alone or in conjunction with other domains or repeats. Domains are evolutionarily
related. An InterPro entry of the type=domain is diagnostic for a domain
but not necessarily define domain boundaries exactly. |
| Repeat |
An InterPro repeat is a region that is not expected to fold into a globular
domain on its own. For example 6-8 copies of the WD40 repeat are needed
to form a single globular domain. There also many other short repeat motifs
that probably do not form a globular fold that have type=repeat. |
| Post-translational modification |
A post-translational modification includes for example, an N glycosylation
site. The sequence motif is defined by the molecular recognition of this
region in a cell. This may group together proteins that need not be evolutionarily
related. |
| Term |
Definition |
| Domain |
Conserved structural entities with distinctive secondary structure content
and a hydrophobic core. In small disulphide-rich and Zn2+-binding or Ca2+-
binding domains the hydrophobic core may be provided by cystines and metal
ions, respectively. Homologous domains with common functions usually show
sequence similarities. |
| Domain composition |
Proteins with the same domain composition have at least one copy of
each of domains of the query. |
| Domain organization |
Proteins having all the domains as the query in the same order (additional
domains are allowed). |
| Motif |
Sequence motifs are short conserved regions of polypeptides. Sets of
sequence motifs need not necessarily represent homologues. |
| Profile |
A profile is a table of position-specific scores and gap penalties,
representing an homologous family, that may be used to search sequence
databases (Bork and Gibson, 1996). |
| property |
Classical method |
Example |
| amino acid motifs |
|
PDZ domain (e.g. nitric oxide synthase), coiled coil domain (e.g. hemagglutinin,
syntaxin, SNAP-25, myosin) |
| isoelectric point (pI) |
derived from isoelectric focusing |
|
| molecular weight |
derived from Stokes radius and sedimentation coefficient |
|
| posttranslational modifications: phosphorylation |
Enzymatic analyses |
synapsin |
| posttranslational modifications: glycosylation |
Enzymatic analyses |
nerve growth factor, neural cell adhesion molecule |
| posttranslational modifications: isoprenylation |
|
lamin B, G protein g subunits, rab3A |
| posttranslational modifications: palmitoylation |
|
b-adrenergic receptor, GAP-43, insulin receptor, rhodopsin, nAChR |
| posttranslational modifications: myristoylation |
|
PKA, Gia-subunit, MARCKS protein, calcineurin |
| posttranslational modifications: GPI-anchored proteins |
Enzymatic analyses |
alkaline phosphatase, thy-1, prion protein, 5’-nucloetidase, uromodulin |
| sedimentation coefficient |
derived from sucrose density gradients |
|
| Stokes radius |
derived from gel filtration |
|
| transmembrane domain |
derived from subcellular fractionation |
|
| Abbreviation |
Evidence code |
Example(s) |
| IC |
Inferred by curator |
A protein is annotated as having the function of a “transcription factor.”
A curator may then infer that the localization is “nucleus” |
| IDA |
Inferred from direct assay |
An enzyme assay (for function); immunofluorescence microscopy (for cellular
component) |
| IEA |
Inferred from electronic annotation |
Annotations based on “hits” in searches such as BLAST (but without confirmation
by a curator; compare ISS) |
| IEP |
Inferred from expression pattern |
Transcripts levels (e.g. based on Northern blotting or microarrays)
or protein levels (e.g. from Western blots) |
| IGI |
Inferred from genetic interaction |
Suppresors; genetic lethals; complementation assays; experiments in
which one gene provides information about the function, process, or component
of another gene |
| IMP |
Inferred from mutant phenotype |
Gene mutation; gene knockout; overexpression; antisense assays |
| IPI |
Inferred from physical interaction |
Yeast two-hybrid assays; copurification; co-immunoprecipitation; binding
assays |
| ISS |
Inferred from sequence or structural similarity |
Sequence similarity; domains; BLAST results that are reviewed for accuracy
by a curator |
| NAS |
Non-traceable author statement |
Database entries such as a SwissProt record that does not cite a published
paper |
| ND |
No biological data available |
Corresponds to “unknown” molecular function, biological process, or
cellular compartment |
| TAS |
Traceable author statement |
Information in a review article or dictionary |
| General category |
Function |
COGs |
domains |
| Information storage and processing |
|
|
|
| |
Translation, ribosomal structure and biogenesis |
217 |
6,449 |
| |
Transcription |
133 |
5,442 |
| |
DNA replication, recombination and repair |
184 |
5,337 |
| Cellular processes |
|
|
|
| |
Cell division and chromosome partitioning |
32 |
842 |
| |
Posttranslational modification, protein turnover, chaperones |
109 |
3,155 |
| |
Cell envelope biogenesis, outer membrane |
155 |
4,079 |
| |
Cell motility and secretion |
133 |
3,110 |
| |
Inorganic ion transport and metabolism |
160 |
5,112 |
| |
Signal transduction mechanisms |
96 |
3,623 |
| Metabolism |
|
|
|
| |
Energy production and conversion |
223 |
5,584 |
| |
Carbohydrate transport and metabolism |
170 |
5,257 |
| |
Amino acid transport and metabolism |
233 |
8,383 |
| |
Nucleotide transport and metabolism |
85 |
2,364 |
| |
Coenzyme metabolism |
154 |
4,057 |
| |
Lipid metabolism |
75 |
2,609 |
| |
Secondary metabolites biosynthesis, transport and catabolism |
62 |
2,754 |
| Poorly characterized |
|
|
|
| |
General function prediction only |
449 |
11,948 |
| |
Function unknown |
752 |
6,431 |
| Metabolic Pathways |
|
| |
Carbohydrate Metabolism |
| |
Energy Metabolism |
| |
Lipid Metabolism |
| |
Nucleotide Metabolism |
| |
Amino Acid Metabolism |
| |
Metabolism of Other Amino Acids |
| |
Metabolism of Complex Carbohydrates |
| |
Metabolism of Complex Lipids |
| |
Metabolism of Cofactors and Vitamins |
| |
Biosynthesis of Secondary Metabolites |
| |
Biodegradation of Xenobiotics |
| Regulatory Pathways: Genetic Information Processing |
|
| |
Transcription |
| |
Translation |
| |
Sorting and Degradation |
| |
Replication and Repair |
| Regulatory Pathways: Environmental Information Processing |
|
| |
Membrane Transport |
| |
Signal Transduction |
| |
Ligand-Receptor Interaction |
| Regulatory Pathways: Cellular Processes |
|
| |
Cell Motility |
| |
Cell Growth and Death |
| |
Cell Communication |
| |
Development |
| |
Behavior |
| Regulatory Pathways: Human Diseases |
|
| |
Neurodegenerative Disorders |
| Database |
Comment |
| BIND (The Biomolecular Interaction
Network Database) |
database designed to store full descriptions of interactions, molecular
complexes and pathways. |
| Cellzome |
|
| DIP (The Database
of Interacting Proteins) |
|
| DLRP (Database
of Ligand-Receptor Partners) |
database of protein ligand and protein receptor pairs that are known
to interact with each other |
| FlyBase |
See Jacq (2001) |
| FlyNets |
See Jacq (2001) |
| KEGG (Kyoto Encyclopedia
of Genes and Genomes) |
|
| GeNet
(Gene Networks database) |
See Jacq (2001) |
| ProNet |
From Doubletwist, Inc. and Myriad Genetics |
| STKE (Signal Transduction Knowledge
Environment) |
See Jacq (2001) |
| Transfac
(Transcription factor database) |
|
| YPD, PombePD,
WormPD |
Proteome, Inc. databases |