Group leader : B. Habermann
The Computational Biology group adresses biological problems using computational methodologies. We have two major research directions: first, we are interested in large-scale data integration, focusing on mitochondrial function in development and disease; second, we are developing methods to work with remote sequence similarities, with the focus on de novo prediction of short, functional motifs in proteins.
Computer-driven analysis has complemented biological research for a long time. With the beginning of sequencing and deciphering the genetic code, methods were develop that help analyze this type of data. The sequencing of parts of or entire genomes has for instance enabled us to establish a fine-tuned view on the evolution of species.
In recent years, large-scale screens and next generation sequencing is generating a tremendous amount of data, which cannot be analyzed – or understood – without the help of computational techniques. Our lab is working in computer-assisted analysis of biological data.
Our team works on two aspects in computational biology:
First, we want to make use of the vast amount of biological information available to date and put it into the perspective of biological systems in development or disease. We have chosen mitochondria to demonstrate the usefulness of large-scale data integration to understand a biological system. Mitochondria are the so-called power-houses of the cell. They provide not only energy to the cell, but are tightly involved in a multitude of metabolic functions in the cell. Their central role in cellular development and also homeostasis makes them a perfect target to investigate changes taking place in the mitochondrial system. We develop methods for the interpretation and integration of biological data and apply them to understand the – change of – function of mitochondria in a developing tissue or in a disease like Parkinson disease or cancer.
Second, we are working with protein sequence similarities in the so-called midnight zone: two related proteins have undergone so many mutations that it is difficult to detect the similarity in their sequence. We have developed methods for detecting remote homologs, remotely conserved functional domains, as well as remotely conserved orthologs – which depicts proteins that are directly evolved from each other. Currently, we are working on methods for the de novo prediction of short, functional motifs in proteins: can we, without any prior information, identify a short stretch of sequence in a protein, which performs a specific function, such as binding another protein or binding to a ligand? We are approaching this problem using either only the protein sequence or only its structure in three-dimensional space.
The Computational Biology group is actively involved in two main research directions: the integration of large-scale data and working with remote protein sequence similarities.
Integration of large-scale data
We work on the integration of large-scale data from different sources to extract meaningful biological information. We use mostly NGS-data, integrating differential expression with ChIP-seq or interactome data to provide biologists with testable hypothesis for further experimental studies. To this end, we develop data integration methods that are easy to use for non-experts.
To show feasibility of our methods, we have chosen the mitochondrial system, as it represents the central organelle for metabolic functions and energy production in the cell. It is experimentally very well characterized in terms of protein content and enzymatic pathways. Therefore, it enables us to look at changes in mitochondrial function in differing cellular conditions.
MitoXplore – understanding mitochondrial function in health and disease
We are developing the MitoXplore platform, an integrative web-tool to integrate large-scale expression and mutation data with the mitochondrial interactome and mitochondrial pathways. Using specialized pipelines for NGS-data analysis, we extract mutation and expression data for all proteins localized to mitochondria, irrespective of their genomic localization (mitochondrial or nuclear genome). We integrate expression and mutation data with a manually assembled and curated mitochondrial interactome and visualize observed changes in different experimental or disease conditions. This enables us to rapidly and visually compare different data-sets with respect to their mitochondrial functions.
This project is supported by DFG grant ‘Systems biological analysis of cancer genomes using deductive databases’.
AnnoMiner – integrating ChIP-seq data
AnnoMiner integrates peak profiles emanating from ChIP-seq experiments using heuristic overlap criteria. Our algorithm can be used to annotate ChIP-seq peak profiles with genomic features like genes, or compare ChIP-seq profiles with each other. A second feature of AnnoMiner is to use available ChIP-seq profiles for enrichment analysis of large-scale expression studies. Just like looking for enriched binding site motifs in regulatory elements of differentially expressed genes, we can look for enrichment of ChIP-seq peaks from public repositories to find possible regulatory factors involved in differential expression. AnnoMiner is available for all model organisms and will be provided as a web-tool.
Biological networks for data analysis, integration and visualization
Biological networks such as protein-protein interaction networks or gene regulatory networks are an integral part to understand biological systems. We use such networks to interpret and integrate large-scale data coming from expression studies. We have developed several algorithms for 1) the generation of non-redundant protein interaction networks (miMerge, miScore (Villaveces, et al., Database, 2015)), 2) the visualization and integration of pathway data (KEGGviewer (Villaveces, et al., F100Res 3:43, 2014), PsiquicGraph (Villaveces, et al., F100Res 3:44, 2014)) both available via the BioJS platform, as well as 3) the Cytoscape plugins for focus network generation and pathway enrichment (viPEr & PEANuT (Garmhausen et al., BMC Genomics 16:790, 2015)).
Working with remote sequence analysis – motif de novo prediction and orthology detection in the midnight zone of sequence similarity
Our Darwinian view on evolution states that evolution is the result of random changes of our genetic code combined with the process of natural selection. Many small changes over a long period of time have a major evolutionary impact. As a result, even true orthologs can share only low sequence similarity, which we refer to as conservation in the twilight or midnight zone.
Our group is interested in detecting sequence relationships in the twilight and midnight zone.
Remote orthology detection
We are interested in discovering remote orthologs. Identifying orthologous proteins is one of the key tasks in computatinoal biology: we need to know a protein’s orthologs to understand its evolution. Orthologs also tell us, whether the process a protein is involved in, is conserved beyond model species and across kingdoms.
Orthologs are equally important for wet-lab research: we transfer functional information across orthologous proteins and can therefore provide testable hypothesis for a protein’s function for uncharacterized proteins.
The level of sequence conservation even between orthologs is however sometimes below the detection limit of standard software and settings.
We have addressed this problem and developed a web-based method, morFeus (Wagner, et al., BMC Bioinformatics 15 (1), 263, 2014; free usage at http://bio.biochem.mpg.de/morfeus/) for the detection of orthologs in the twilight and midnight zone of sequence similarity.
We compare weighted, binary representations of sequence alignments from a relaxed BLAST search and cluster hits based on their similarity to the query. Iterative reciprocal BLAST searches are carried out to verify orthology. Not only the query, but also other verified orthologs can establish orthology and include further hits for back-BLASTs. In a final step, a network of orthology (see figure) is created and a score independent of the BLAST E-value is calculated for putative orthologs using centrality scoring. We have tested morFeus against the state-of-the-art resources HomoloGene and Inparanoid and achieve significantly higher sensitivity with equal specificity.
de novo motif discovery in protein sequences
Protein motifs are defined as self-sufficient functional units. They are typically only between three and 23 amino acids long and have various functions in proteins. They can serve as cleavage sites, are required for proteosomal degradation, are involved in docking and ligand binding, serve as signals for post-translational modification or are signals for subcellular localization. Their shortness and the fact that they typically lack substantial sequence conservation makes them very difficult to find de novo – i.e. without prior information on the localization or nature of the motif. We aim to identify short functional motifs in proteins de novo. We perform evolutionary restricted profile comparisons to detect common motifs in a set of unrelated proteins. In collaboration with wet-lab researchers, we experimentally test our predicted motifs.
Discovery of structural short motifs in protein 3D structures
Next to sequence-based approaches, we are interested in finding structural motifs in proteins. Can we predicted potential patches on the surface of 3D structures responsible for protein-protein or protein-ligand interaction? We employ statistical methods to identify potential functional, structural motifs in protein 3Dstructures.
December 18th, 2015
Human Holliday junction resolvase GEN1 uses a chromodomain for efficient DNA recognition and cleavage.
October 14th, 2015
Virtual pathway explorer (viPEr) and pathway enrichment analysis tool (PEANuT): creating and analyzing focus networks to identify cross-talk between molecules and pathways.
September 18th, 2015
mRNA export through an additional cap-binding complex consisting of NCBP1 and NCBP3.
June 4th, 2015
Tools for visualization and analysis of molecular networks, pathways, and -omics data.
February 16th, 2015
The RNA-binding protein Arrest (Bruno) regulates alternative splicing to enable myofibril maturation in Drosophila flight muscle.
February 4th, 2015
DNA-protein crosslink repair: proteases as DNA repair enzymes.
February 4th, 2015