Computational immunology

In academia , computational immunology is a field of science that encompasses high-throughput genomic and bioinformatics approaches to immunology . The field’s main aim is to convert data into computational immunological problems, solve problems thesis using mathematical and computational approaches And Then thesis convert results into immunologically Meaningful interpretations.


The immune system is a complex system of the human body and it is one of the most challenging topics in biology. Immunology research is important for understanding the mechanisms underlying the defense of human body and to develop drugs for immunological diseases and maintain health. Recent findings in genomic and proteomic technologies have transformed the immunology research drastically. Sequencing of the human and other model organism genomes has been widely reported in the literature and has been reported in the clinical and clinical literature. Recent advances in bioinformatics goldcomputational biology were helpful To Understand and organizes thesis large scale data and gave rise to new area s’intitule That Computational Immunology gold immunoinformatics .

Computational immunology is a branch of bioinformatics and it is based on similar concepts and tools, such as sequence alignment and protein structure prediction tools. Immunomics is a discipline like genomics and proteomics . It is a science, which specifically combines immunology with computer science , mathematics , chemistry , and biochemistry for large-scale analysis of immune system functions. It AIMS to study the complex protein-protein interactions and networks and Allows a better understanding of immune responsesand their role during normal, diseased and reconstitution states. Computational immunology is a part of immunomics, which is focused on analyzing large scale experimental data. [1] [2]


Computational immunology began over 90 years ago with the theoretical modeling of malaria epidemiology. At that time, the focus is on the use of mathematics to guide the study of disease transmission. Since then, the field has expanded to cover all other aspects of the immune system processes and diseases. [3]

Immunological database

After the recent advances in sequencing and proteomics technology, there have been many fold increases in the generation of molecular and immunological data. The data are so diverse that they can be categorized in different databases according to their use in the research. There are 31 different immunological databases noted in the Nucleic Acids Research (NAR) Database Collection , which are given in the following table, together with some more immune related databases. [4] The information given in the table is taken from the database descriptions in the NAR Database Collection .

Database Description
ALPSbase Autoimmune lymphoproliferative syndrome database
AntigenDB Sequence, structure, and other data on pathogen antigens. [5]
AntiJen Quantitative binding data for peptides and proteins of immunological interest. [6]
BCIpep B-cell epitopes of antigenic proteins. This is a background of the information on the epitopes collected and compiled from published literature and existing databases. It covers a wide range of pathogenic organisms like viruses, bacteria, protozoa and fungi. Description of the invention provides amino acid sequences, source of the antigenic protein, immunogenicity, model organism, and antibody generation / neutralization test. [7]
dbMHC dbMHC provides access to HLA sequences, tools to support genetic testing of HLA loci, HLA allele and haplotype frequencies of over 90 populations worldwide, and clinical data on hematopoietic stem cell transplantation, and insulin-dependent diabetes mellitus (IDDM), Rheumatoid Arthritis ( RA), Narcolepsy and Spondyloarthropathy. For more information go to this link
DIGIT Database of ImmunoGlobulin sequences and Integrated Tools. [8]
FIMM FIMM is an integrated database of functional molecular immunology that focuses on the T-cell response to disease-specific antigens. HLA, peptides, T-cell epitopes, antigens, diseases and a backbone of future computational immunology research. Antigen protein data has been enriched with more than 27,000 sequences derived from the non-redundant SwissProt-TREMBL-TREMBL_NEW (SPTR) database of antigens similar or related FIMM antigens across various species to facilitate a comprehensive analysis of conserved or variable T-cell epitopes. [9]
GPX-Macrophage Expression Atlas The GPX Macrophage Expression Atlas (GPX-MEA) is an online resource for expression based studies of a range of macrophage cell types following treatment with pathogens and immune modulators.GPX Macrophage Atlas Expression (GPX-MEA) follows the MIAME standard. It places special emphasis on rigorously capturing the experimental design and enables the analysis of data from different micro-array experiments. This is the first example of a macrophage gene expression that allows efficient identification of transcriptional patterns, which provides novel insights into the biology of this cell system. [10]
HaptenDB It is a comprehensive database of hapten molecules. This is a database where information is collected and compiled from published literature and web resources. Description of the present invention provides the following information: i) nature of the hapten; ii) methods of anti-hapten antibody production; iii) information about carrier protein; iv) coupling method; v) assay method (used for characterization) and vi) specificities of antibodies. The Haptendb covers wide array of antibiotics from biomedical importance to pesticides. This database will be useful for studying serological reactions and production of antibodies. [11]
HPTAA HPTAA is a database of potential tumor-associated antigens that uses expression data from various expression platforms, and is a product of data expression. [12]
IEDB-3D Structural data within the Immune Epitope Database. [13]
IL2Rgbase X-linked severe combined immunodeficiency mutations. [14]
IMGT IMGT is an integrated knowledge resource specialized in IG, TR, MHC, superfamily IG, MHC superfamily and related proteins of the human immune system and other vertebrate species. IMGTW including 6 databases, 15 on-line tools for sequencing, gene and 3D structure analysis, and more than 10,000 pages of web resources. Data standardization, based on IMGT-ONTOLOGY, has been approved by WHO / IUIS. [15]
IMGT_GENE-DB IMGT / GENE-DB is the IMGT® comprehensive genome database for immunoglobulins (IG) and T cell receptors (TR) genes from human and mouse, and, in development, from other vertebrate species (eg rat). IMGT / GENE-DB is part of IMGT®, the international ImMunoGeneTics information system®, the high-quality integrated knowledge resource in IG, TR, major histocompatibility complex (MHC) of human and other vertebrate species, and related proteins of the immune system system (RPI) that belongs to the immunoglobulin superfamily (IgSF) and to the MHC superfamily (MhcSF). [16]
IMGT / HLA There are currently over 1600 officially recognized HLA alleles and these sequences are made available to the scientific community through the IMGT / HLA database. In 1998, the IMGT / HLA database was published. Since this time, the database has grown to be the primary source of information for the study of the major histocompatibility complex. The initial release of the database contains all the reports, alignment tools, submission tools and detailed descriptions of the source cells. The nomenclature Committee and on average an additional 75 new and confirmatory sequences are included in each quarterly release. The IMGT / HLA database provides a centralized resource for[17]
IMGT / LIGM-DB IMGT / LIGM-DB is the IMGT® comprehensive database of immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences, from human and other vertebrate species, with translation for fully annotated sequences, created in 1989 by LIGM http: // www / texts / IMGTinformation / LIGM.html), Montpellier, France, on the Web since July 1995. IMGT / LIGM-DB is the first and the largest database of IMGT®, the international ImMunoGeneTics information system®, the high-quality integrated knowledge resource specialized in IG, TR, major histocompatibility complex (MHC) of human and other vertebrate species, and related proteins of the immune system (RPI) that belong to the immunoglobulin superfamily (IgSF) and to the MHC superfamily (MhcSF). IMGT / LIGM-DB sequence data are identified by the EMBL / GenBank / DDBJ accession number. The only source of data for IMGT / LIGM-DB is EMBL which shares data with GenBank and DDBJ. [18]
Interferon Stimulated Gene Database Interferons (IFN) are a family of multifunctional cytokines that activate transcription of a subset of genes. The gene products induced by IFN are responsible for the antiviral, antiproliferative and immunomodulatory properties of this cytokine. Interferon stimulated genes (ISG). Interferon stimulated genes (ISG). To facilitate the dissemination of this information, the ISGs are assigned to functional categories. The database is fully searchable and contains links to the Unigen information. The database and the data array are accessible via the World Wide Web at ( We intend to add ISGs and compiles a complete list of ISGs.
IPD-ESTDAB The Immuno Polymorphism Database (IPD) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD-ESTDAB is a database of immunologically characterized melanoma cell lines. The database works in conjunction with the European Searchable Cell Line Database (ESTDAB) cell bank, which is provided in Tübingen, Germany and provides immunologically characterized tumor cells. [19] [20]
IPD-HPA – Human Platelet Antigens Human platelet antigens are alloantigens expressed only on platelets, specifically on platelet membrane glycoproteins. These platelet-specific antigens are immunogenic and can result in pathological reactions to transfusion therapy. The IPD-HPA section contains nomenclature information and additional background material about human platelet antigen. The various genes in the HPA system are not limited to single nucleotide polymorphisms (SNPs) are used to determine alleles. This information is presented in a grid of SNP for each gene The IPD and HPA nomenclature committee hope to expand this. [19] [20]
IPD-KIR – Killer-cell Immunoglobulin-like Receptors The Killer-cell Immunoglobulin-like Receptors (KIR) are members of the immunoglobulin super family (IgSF) formally called Killer-cell Inhibitory Receptors. KIRs have been shown to be highly polymorphic both at allelic and haplotypic levels. They are composed of two or three Ig-domains, a transmembrane region and cytoplasmic tail, which can be turned into short (activatory) or long (inhibitory). The Leukocyte Receptor Complex (LRC), which encodes KIR genes, has been shown to be polymorphic, polygenic and complex in a manner similar to the MHC. The IPD-KIR Sequence Database contains the most up to date nomenclature and sequence alignments. [19] [20]
IPD-MHC The MHC has been reported with different nomenclature systems used in the identification of new genes and alleles in each species. The sequences of the major histocompatibility complex are highly conserved between species. By bringing the work of different nomenclature committees and the sequence of different species together, it is hoped to provide a basis for further comparison. The first release of the IPD-MHC database involved the work of groups specializing in non-human primates, canines (DLA) and felines (FLA) and incorporated all data previously available in the IMGT / MHC database. This release included data from five species of ape, sixteen species of new world monkey, seventeen species of old world monkey, different canines and felines. Since the first release, sequences from cattle (BoLA), swine (SLA), and rats (RT1) have been added and the work to include MHC sequences from chickens, horses (ELA) is still going on.[19] [20]
MHCBN MHCBN is a comprehensive database comprising 23000 peptides sequences, one of which binding affinity with MHC or TAP molecules has been assayed experimentally. It is a curated database where entries are compiled from published literature and public databases. MHC or TAP binding specificity (source protein) is a peptide whose binding affinity (IC 50) MHCBN has number of web-based tools for the analysis and retrieval of information. All databases are hyperlinked to major databases such as SWISS-PROT, PDB, IMGT / HLA-DB, PubMed and OMIM to provide information beyond the scope of MHCBN. Current version of MHCBN contains 1053 entries of TAP binding peptides.[21]
MHCPEP This database contains the list of MHC-binding peptides. [22]
MPID-T2 MPID-T2 ( a highly curated database for sequence-structure-function information on MHC-peptide interactions. It contains all major histocompatibility complex proteins (MHCs) containing bound peptides, with emphasis on the structural characterization of these complexes. Database entries have been grouped into fully referenced redundant and non-redundant categories. The MHC-peptide interactions have been presented in a set of sequence and structural parameters representative of molecular recognition. MPID will facilitate the development of algorithms to predict whether a peptide sequence will bind to a specific MHC allele. MPID data has been sorted primarily on the basis of MHC Class, followed by organism (source MHC), next by allele type and finally by the length of peptide in the binding groove (peptide residues within 5 Å of the MHC). Data on inter-molecular hydrogen bonds, gap volume and gap index are available on the basis of available calculations. The available MHC-peptide databases have addressed sequence information as well as binding (or the lack thereof) of peptide sequences.[23]
MUGEN Mouse Database Murine models of immunological processes and immunological diseases. [24]
Protegen Protective antigen database and analysis system. [25]
SuperHapten SuperHapten is a hapten database integrating information from literature and web resources. The current version of the database compiles 2D / 3D structures, physicochemical properties and references for about 7,500 haptens and 25,000 synonyms. The commercial availability is documented for about 6,300 haptens and 450 related antibodies, enabling experimental approaches on cross-reactivity. The haptens are classified for their origin: pesticides, herbicides, insecticides, drugs, natural compounds, etc. Methods of identification of haptens and associated antibodies according to functional class, carrier protein, chemical scaffold, composition or structural similarity. [26]
The Immune Epitope Database (IEDB) The Immune Epitope Database (IEDB,, provides a catalog of experimentally characterized B and T cell epitopes, and MHC binding and MHC ligand elution experiments. The database represents the molecular structures recognized by adaptive immune receptors and the experimental contexts in which these molecules have been determined to be immune epitopes. Epitopes recognized in humans, non-human primates, rodents, pigs, and other tested species are included. Both positive and negative experimental results are captured. Over the course of four years, the data from 180,978 experiments were compiled from the literature, with some epitopes being mapped in infectious agents (excluding HIV) and 93% of those mapped in allergens. [27]
TmaDBpermanent dead link ] To analyze TMA output has relational database (known as TmaDB) has been developed to collate all aspects of information relating to TMAs. These data include the TMA construct protocol, experimental protocol and results from the various immunocytological and histochemical staining experiments including the scanned images for each of the TMA cores. In addition, the database contains several TMAs, the location of the TMAs and the individual specimen blocks (from which cores were taken) in the laboratory and their current status.[28]
VBASE2 VBASE2 is an integrative database of germ-line V genes from the immunoglobulin loci of human and mouse. It presents V gene sequences from the EMBL database and Ensembl together with the corresponding links to the data source. The VBASE2 dataset is generated in an automatic process based on the EMBL and the Ensembl dataset. The BLAST hits are evaluated with the DNAPLOT program, which allows immunoglobulin sequence alignment and comparison, RSS recognition and analysis of the V (D) J-rearrangements. As a result of the BLAST hit evaluation, the VBASE2 is a reference to the following: Class 2 contains sequences, which thus lacks a rearrangement, thus lacking evidence of functionality. Class 3 contains sequences which have been found to be different (V) J rearrangements but lack a genomic reference. All VBASE2 sequences are compared with the datasets from the VBASE-, IMGT- and KABAT-databases (latest published versions), and the respective references are provided in each VBASE2 sequence entry. The VBASE2 database can be accessed by the DNAPLOT program. A DAS-server shows the VBASE2 dataset within the Ensembl Genome Browser and links to the database. The VBASE2 database can be accessed by the DNAPLOT program. A DAS-server shows the VBASE2 dataset within the Ensembl Genome Browser and links to the database. The VBASE2 database can be accessed by the DNAPLOT program. A DAS-server shows the VBASE2 dataset within the Ensembl Genome Browser and links to the database.[29]
Epitome Epitome is a database of all known antigenic residues and antibodies that interact with them, including a detailed description of the residues involved in the interaction and their sequence / structure environments. Each entry in the database describes an interaction between a residue on an antigenic protein and a residue on an antibody chain. PDB identifier, antigen chain ID PDB position of the antigenic residue, type of antigenic residues and their sequence environment, antigen residues secondary structure state, antigen residues accessibility accessibility, antibody chain ID, type of antibody chain (heavy or light), CDR number, PDB position of the antibody residue, and type of antibody residue and its sequence environment. Additionally,[30]
ImmGen The Immunological Genome Consortium Database includes more than 250 types of cells and several data types. [31]

Online resources for allergy information are also available at . Such data is valuable for the investigation of cross-reactivity between known allergens and the analysis of potential allergenicity in proteins. The Structural Database of Allergy Proteins (SDAP) stores information of allergenic proteins. The Food Allergy Research and Resource Program (FARRP) Protein Allergen -Online Database contains sequences of known and putative allergens derived from scientific literature and public databases. Allergies emphasizes the annotation of allergens that result in an IgE-mediated disease.


A variety of computational, mathematical and statistical methods are available and reported. These tools are helpful for collection, analysis, and interpretation of immunological data. They include text mining, [32] information management, [33] [34] sequence analysis, analysis of molecular interactions, and mathematical models that enable advanced simulations of immune system and immunological processes [35] [36] . Attempts are made for the extraction of interesting and complex patterns from non-structured text documents in the immunological domain. Such as categorization of allergen cross-reactivity information, [32] identification of cancer-associated gene variants and the classification of immune epitopes.

Immunoinformatics is using the basic bioinformatics tools such as ClustalW, [37] BLAST, [38] and TreeView, as well as specialized immunoinformatics tools, such as EpiMatrix, [39] [40] IMGT / V-QUEST for IG and TR sequence analysis , IMGT / Necklace-of-Pearls and IMGT / StructuralQuery [41] for IG variable domain structure analysis. [42] Bacterial resistance to lamivudine (HLA sequence), HLA sequence conservation, help to test the origins of human immunodeficiency virus (HIV) sequences, and construct homology models for the analysis of hepatitis B emtricitabine.

There are also some computational models which focus on protein-protein interactions and networks. There are also tools that are used for B cell epitope mapping, proteasomal cleavage site prediction, and TAP-peptide prediction. [43] The experimental data is very much important to design and justify the models to predict various molecular targets. Computational immunology tools is the game between experimental data and mathematically designed computational tools.



Allergies , while a critical subject of immunology, also varies between individuals and between individuals. The assessment of protein allergenic potential focuses on three main aspects: (i) immunogenicity; (ii) cross-reactivity; and (iii) clinical symptoms. [44] Immunogenicity is due to responses of an IgE antibody -producing B cell and / or of a cell to a particular allergen . Therefore, immunogenicity studies focus mainly on the recognition of B-cells and T-cells for allergens. The three-dimensional structural properties of allergens control their allergenicity.

The use of immunoinformatics tools may be useful for predicting protein allergenicity and will be important in the screening of novel foods for their use. Thus, there are major efforts in the field of allergic and enzymatic differentiation in the treatment of allergic diseases. The World Health Organization and the Food and Agriculture Organization have proposed the following guidelines for the evaluation of allergenicity of genetically modified foods. According to the Codex Alimentarius , [45]a protein is allergenic if it possesses an identity of ≥6 contiguous amino acids or ≥35% sequence similarity over an amino acid window with a known allergen. Though there are rules, their inherent limitations have been reported [46]

Infectious diseases and host responses

In the study of infectious diseases and host responses, the mathematical and computer models are a great help. These models were very useful in characterizing the behavior and spread of infectious diseases, by understanding the dynamics of the pathogen in the host and the mechanisms of the pathogenesis of pathogen persistence. Examples include Plasmodium falciparum [47] and nematode infection in ruminants. [48]

Much has been done in understanding immune responses to various pathogens by integrating genomics and proteomics with bioinformatics strategies. Many exciting developments in large-scale screening of pathogens are currently taking place. National Institute of Allergy and Infectious Diseases (NIAID) has initiated an endeavor for systematic mapping of AC pathogens. These pathogens include Bacillus anthracis (anthrax), Clostridium botulinumtoxin (botulism), Variola major (smallpox), Francisella tularensis (tularemia), Viral hemorrhagic fevers, Burkholderia pseudomallei , Staphylococcus enterotoxinB, yellow fever, influenza, rabies, Chikungunya virus etc. Rule-based systems have been reported for the automated extraction and curation of influenza A records. [49]

This development would lead to the development of an algorithm that would help to identify the regions of pathogenesis and would be useful for vaccine development. This would be helpful in limiting the spread of infectious disease. Examples: HLA binding [50] and computational assessment of cross-reactivity of broadly neutralizing antibodies against viral pathogens [51] . These examples illustrate the power of immunoinformatics applications to help solve complex problems in public health. Immunoinformatics could accelerate the discovery process and potentially increase the time required for vaccine development.

Immune system function

Using this technology it is possible to know the model behind the immune system. It has been used to model T-cell-mediated suppression, [52] peripheral lymphocyte migration, [53] T-cell memory, [54] tolerance, [55] thymic function, [56] and antibody networks. [57] Models are helpful in predicting the dynamics of pathogenicity and T-cell memory in response to different stimuli. There are several models which are helpful in understanding the nature of specific immunity and immunogenicity.

For example, it was useful to examine the functional relationship between TAP peptide transport and HLA class I antigen presentation. [58]TAP is a transmembrane protein responsible for the transport of antigenic peptides into the endoplasmic reticulum, where MHCs class I molecules can bind to T cells. As TAP does not bind all peptides, TAP-binding affinity could influence the ability of a particular peptide to gain access to the MHC class I pathway. Artificial neural network (ANN), a computer model was used to study peptide binding to human TAP and its relationship with MHC class I binding. The affinity of HLA-binding peptides for TAP was found to be different in the HLA supertype concerned using this method. This research could have important implications for the design of peptide based immuno-therapeutic drugs and vaccines. It shows the power of the modeling approach to understand complex immune interactions. [58]

There is also some evidence that these peptides can be expected to provide information on the pathogenicity of specific peptides [59] .

Cancer Informatics

Cancer is the result of somatic mutations that provide cancer cells with a selective growth advantage. Recently it has been very important to determine the novel mutations. Genomics and proteomics techniques are used to identify mutations of each specific cancer and their treatments. Computational tools are used to predict growth and surface antigens on cancerous cells. There are publications explaining a targeted approach for assessing mutations and cancer risk. CanPredict Algorithm has been used to indicate how closely a specific gene resembles known cancer-causing genes. [60]Cancer immunology has been given a lot of importance. Protein-protein interaction networks provide valuable information on tumorigenesis in humans. Cancer proteins exhibit a network topology that is different from normal proteins in the human interactome. [61] [62] Immunoinformatics were useful in increasing success of vaccination. Recently, pioneering has been conducted to analyze the host immune system dynamics in response to artificial immunity by vaccination strategies. [63] [64] [65] . HLA [36] Other cancer tools for predictive cancer .. These resources are likely to grow significantly in the future and will be more important in this area.