KO Database of Molecular Functions

The KO (KEGG Orthology) database is a database of molecular functions represented in terms of functional orthologs. A functional ortholog is manually defined in the context of KEGG molecular networks, namely, KEGG pathway maps, BRITE hierarchies and KEGG modules. Each node of the network, such as a box in the KEGG pathway map, is given a KO identifier (called K number) as a functional ortholog defined from experimentally characterized genes and proteins in specific organisms, which are then used to assign orthologous genes in other organisms based on sequence similarity. The granularity of "function" is context-dependent, and the resulting KO grouping may correspond to a group of highly similar sequences within a limited organism group or it may be a more divergent group.

The KO system is a network-based classification of KOs shown below:
00001 KEGG Orthology (KO)
It consists of six top categories (09100 to 09160) for KEGG pathway maps and one top category (09180) for BRITE hierarchies, as well as one top category (09190) for those KOs that are not yet included in either of them. The category numbers for these top categories and the second-level categories under metabolism (09101 to 09112) are used to define color coding of functions (see KEGG Color Codes).

Major efforts have been made to associate each KO entry with experimental evidence of functionally characterized sequence data as shown in the SEQUENCE subfield of the REFERENCE field. In many cases such data are not found in the complete genomes of KEGG organisms. The addendum category of the GENES database allows functionally characterized individual protein sequences to be included in KEGG. As a byproduct of these efforts, sequence data have also been associated with EC numbers in Enzyme Nomenclature.

Genome Annotation in KEGG

Genome annotation in KEGG contains two unique aspects, KO assignment and KEGG mapping, as summarized below.

KO assignment
  • Molecular functions are stored in the KO (KEGG Orthology) database containing orthologs of experimentally characterized genes/proteins.
  • Genome annotation in KEGG is to assign KO identifiers (or K numbers) to individual genes in the genome, rather than giving text description of functions.
KEGG mapping
  • Cellular and organism-level functions are stored in the PATHWAY, BRITE and MODULE databases in terms of the molecular networks, which are all created as networks of K number nodes.
  • The KO assignment procedure converts a gene set in the genome to a K number set and leads to automatic reconstruction of KEGG pathways and other networks by the process called KEGG mapping, enabling interpretation of high-level functions.
See more details in:
KEGG Annotation

KO Assignment Tools

The KO assignment of the GENES database is performed by the previously developed KOALA (KEGG Orthology And Links Annotation) tool and the newly developed KoAnn (KO Annotation) tool, both of which process GFIT tables generated from the SSDB database of SSEARCH computation results for all pairwise genome comparisons. KOALA is now used only for the initial automatic KO assignment of new genomes. Both automatic and manual versions of KoAnn are used for all other annotations. BlastKOALA is a web server for automatic KO assignment using a KOALA-like algorithm for BLAST search against a reduced database.

KOALA / KoAnn BlastKOALA (new version)
Purpose Internal GENES annotation Genome annotation service
Search program SSEARCH BLASTP
Scoring Weighted sum of SW scores (KOALA scoring)
or identity scores (KoAnn scoring)
Weighted sum of BLAST bit scores
(KoAnn scoring)
Database Entire GENES database sequences KEGG Reference genomes and
functionally characterized seuences
linked from KO references

KOALA scoring includes: SW (Smith-Waterman) score, best-best flag, overlap of alignment, ratio of query and DB sequences, taxonomic category and Pfam domains.
KoAnn scoring includes: identify score, sequence length and best-best flag.


Reference
  1. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M.; KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457-D462 (2016). [pubmed]
  2. Kanehisa, M., Sato, Y., and Morishima, K.; BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726-731 (2016). [pubmed]

Last updated: April 1, 2023