Help
KEGG icon

KEGG Mapping

KEGG mapping as a set operation

KEGG mapping is the process to map molecular objects (genes, proteins, small molecules, etc.) to molecular network objects (KEGG pathway maps, BRITE hierarchies and KEGG modules). It is not simply an enrichment process; rather it is a set operation to generate a new set. From the beginning of the KEGG project, the basic idea was to automatically generate organism-specific pathways by the set operation between manually annotated genome data and manually created pathway maps. Thus, the KEGG mapping set operation has played a role to extend the KEGG knowledge base. In addition, it played another important role to assist integration and interpretation of users' datasets, especially large-scale datasets generated by high-throughput technologies (see: KEGG Mapper tools).

Here the network objects of pathway maps and Brite hierarchies are explained.

KEGG Pathway Maps

Graphical map objects
The KEGG pathway map is a moleculalr interaction/reaction network diagram represented in terms of the KEGG Orthology (KO) groups, so that experimental evidence in specific organisms can be generalized to other organisms through genomic information. Each map is manually drawn with in-house software called KegSketch, which generates the KGML+ file. This file is an SVG file containing graphics objects that are associated with KEGG objects (see KEGG object identifiers). Basic graphics objects in the reference KEGG pathway maps are:
  • boxes - ortholog (KO) groups identified by K numbers and, in metabolic maps, reactions identified by R numbers as well
  • circles - other molecules, usually chemical compounds identified by C numbers, but including glycans identified by G numbers
  • lines - reactions identified by R numbers in metabolic maps; ortholog (KO) groups identified by K numbers in global metabolism maps
and in organism specific pathway maps that are computationally generated:
  • boxes - genes or gene products identified by the combination of the KEGG organism code and gene identifiers
These map objects can be searched in the search box at the top of the KEGG PATHWAY page, in the search boxes of the pathway map viewer, and by the KEGG Mapper tools.
Convention of map number prefix
Each pathway map is identified by the combination of 2-4 letter code and 5 digit number (see KEGG Identifiers). The prefix has the following meaning:
  • map - Reference pathway
  • ko - Reference pathway (KO)
  • ec - Reference pathway (EC)
  • rn - Reference pathway (Reaction)
  • org - Organism-specific pathway map
Only the first reference pathway map is manually drawn; all other maps are computationally generated. For metabolic pathways, each box (or line) in the reference map is linked to the K number (KO identifeir), the EC number, and the R number (reaction identifier). The KO, EC, and reaction maps are linked to only one of them. For all metabolic and non-metabolic maps, K numbers are converted to gene identifiers in each organism to generate organism-specific pathways.

map00010
map
ko00010
ko
hsa00010
hsa


As shown here, "map" pathways are not colored, "ko/ec/rn" pathways are colored blue, and organism-specific pathways are colored green, where coloring indicates that map objects exist and are linked to corresponding entries.

For global metabolism maps, "map" pathways are fully colored, so that "ko/ec/rn" pathways and organism-specific pathways are generated by reducing the coloring indicating the absence of corresponding entries.
About KGML files
KGML is an exchange format of KEGG pathway maps. It is meant for outside users and is not used in any service or database update procedure within KEGG. KGML files, which are computationally generated from the manually defined KGML+ file, contain information about entries (KEGG objects) and two types of relationships.
  • relations - relationships between boxes
  • reactions - relationships between circles

BRITE Functional Hierarchies

BRITE hierarchy files
The KEGG BRITE database is a collection of BRITE hierarchy files, called htext (hierarchical text) files, with additional files for binary relations. The htext file is manually created with in-house software called KegHierEditor. The htext file contains "A", "B", "C", etc. at the first column to indicate the hierarchy level.
A Metabolism
B   Carbohydrate Metabolism
C     00010 Glycolysis / Gluconeogenesis [PATH:ko00010]
D       K00844  HK; hexokinase [EC:2.7.1.1]
D       K12407  GCK; glucokinase [EC:2.7.1.2]
D       K00845  glk; glucokinase [EC:2.7.1.2]
D       ......
Each BRITE hierarchy file represents a classification system of KEGG objects identified by the KEGG Identifiers; for example, pathway-based gene classification or protein family classification by the K numbers, compound classification by C numbers, drug classification by D numbers, and disease classification by H numbers.

The binary relation files contain the relationship between KEGG objects and attributes, which can be dynamically added to the hierarchy file as additional columns using the join feature of the Brite hierarchy viewer. Many binary relation files are computationally generated from the KEGG database contents and shown in the left panel of the Brite hierarchy viewer.

The KEGG objects of BRITE files can be searched in the search box at the top of the KEGG BRITE page, in the search boxes of the Brite hierarchy viewer, and by the KEGG Mapper tools.
Convention of brite number prefix
Each BRITE hierarchy file is identified by the combination of 2-4 letter code and 5 digit number (see KEGG Identifiers). The prefix has the following meaning:
  • ko - Reference hierarchy (KO)
  • org - Organism-specific hierarchy
  • br - Non-KO hierarchy
  • jp - Non-KO hierarchy in Japanese
Thus, the "ko" hierarchy file is manually created for the functional classifications of genes and proteins using the K numbers. Organism-specific hierarchy files are then computationally generated by converting K numbers to gene identifiers in each organism. The "br" hierarchy file is created for the functional classifications of chemical compounds, reactions, drugs, diseases, organisms, etc. using the KEGG identifiers other than the K numbers.

Last updated: January 1, 2024

Copyright 1995-2024 Kanehisa Laboratories