Group leader : B. Habermann
The Computational Biology group adresses biological problems using computational methodologies. We have two major research directions: first, we are interested in large-scale data integration, focusing on mitochondrial function in development and disease; second, we are developing methods to work with remote sequence similarities, with the focus on de novo prediction of short, functional motifs in proteins.
Computer-driven analysis has complemented biological research for a long time. With the beginning of sequencing and deciphering the genetic code, methods were develop that help analyze this type of data. The sequencing of parts of or entire genomes has for instance enabled us to establish a fine-tuned view on the evolution of species.
In recent years, large-scale screens and next generation sequencing is generating a tremendous amount of data, which cannot be analyzed – or understood – without the help of computational techniques. Our lab is working in computer-assisted analysis of biological data.
Our team works on two aspects in computational biology:
First, we want to make use of the vast amount of biological information available to date and put it into the perspective of biological systems in development or disease. We have chosen mitochondria to demonstrate the usefulness of large-scale data integration to understand a biological system. Mitochondria are the so-called power-houses of the cell. They provide not only energy to the cell, but are tightly involved in a multitude of metabolic functions in the cell. Their central role in cellular development and also homeostasis makes them a perfect target to investigate changes taking place in the mitochondrial system. We develop methods for the interpretation and integration of biological data and apply them to understand the – change of – function of mitochondria in a developing tissue or in a disease like Parkinson disease or cancer.
Second, we are working with protein sequence similarities in the so-called midnight zone: two related proteins have undergone so many mutations that it is difficult to detect the similarity in their sequence. We have developed methods for detecting remote homologs, remotely conserved functional domains, as well as remotely conserved orthologs – which depicts proteins that are directly evolved from each other. Currently, we are working on methods for the de novo prediction of short, functional motifs in proteins: can we, without any prior information, identify a short stretch of sequence in a protein, which performs a specific function, such as binding another protein or binding to a ligand? We are approaching this problem using either only the protein sequence or only its structure in three-dimensional space.
The Computational Biology group is actively involved in two main research directions: the integration of large-scale data and working with remote protein sequence similarities.
Integration of large-scale data
We work on the integration of large-scale data from different sources to extract meaningful biological information. We use mostly NGS-data, integrating differential expression with ChIP-seq or interactome data to provide biologists with testable hypothesis for further experimental studies. To this end, we develop data integration methods that are easy to use for non-experts.
To show feasibility of our methods, we have chosen the mitochondrial system, as it represents the central organelle for metabolic functions and energy production in the cell. It is experimentally very well characterized in terms of protein content and enzymatic pathways. Therefore, it enables us to look at changes in mitochondrial function in differing cellular conditions.
MitoXplore – understanding mitochondrial function in health and disease
We are developing the MitoXplore platform, an integrative web-tool to integrate large-scale expression and mutation data with the mitochondrial interactome and mitochondrial pathways. Using specialized pipelines for NGS-data analysis, we extract mutation and expression data for all proteins localized to mitochondria, irrespective of their genomic localization (mitochondrial or nuclear genome). We integrate expression and mutation data with a manually assembled and curated mitochondrial interactome and visualize observed changes in different experimental or disease conditions. This enables us to rapidly and visually compare different data-sets with respect to their mitochondrial functions.
This project is supported by DFG grant ‘Systems biological analysis of cancer genomes using deductive databases’.
AnnoMiner – integrating ChIP-seq data
AnnoMiner integrates peak profiles emanating from ChIP-seq experiments using heuristic overlap criteria. Our algorithm can be used to annotate ChIP-seq peak profiles with genomic features like genes, or compare ChIP-seq profiles with each other. A second feature of AnnoMiner is to use available ChIP-seq profiles for enrichment analysis of large-scale expression studies. Just like looking for enriched binding site motifs in regulatory elements of differentially expressed genes, we can look for enrichment of ChIP-seq peaks from public repositories to find possible regulatory factors involved in differential expression. AnnoMiner is available for all model organisms and will be provided as a web-tool.
Biological networks for data analysis, integration and visualization
Biological networks such as protein-protein interaction networks or gene regulatory networks are an integral part to understand biological systems. We use such networks to interpret and integrate large-scale data coming from expression studies. We have developed several algorithms for 1) the generation of non-redundant protein interaction networks (miMerge, miScore (Villaveces, et al., Database, 2015)), 2) the visualization and integration of pathway data (KEGGviewer (Villaveces, et al., F100Res 3:43, 2014), PsiquicGraph (Villaveces, et al., F100Res 3:44, 2014)) both available via the BioJS platform, as well as 3) the Cytoscape plugins for focus network generation and pathway enrichment (viPEr & PEANuT (Garmhausen et al., BMC Genomics 16:790, 2015)).
Working with remote sequence analysis – motif de novo prediction and orthology detection in the midnight zone of sequence similarity
Our Darwinian view on evolution states that evolution is the result of random changes of our genetic code combined with the process of natural selection. Many small changes over a long period of time have a major evolutionary impact. As a result, even true orthologs can share only low sequence similarity, which we refer to as conservation in the twilight or midnight zone.
Our group is interested in detecting sequence relationships in the twilight and midnight zone.
Remote orthology detection
We are interested in discovering remote orthologs. Identifying orthologous proteins is one of the key tasks in computatinoal biology: we need to know a protein’s orthologs to understand its evolution. Orthologs also tell us, whether the process a protein is involved in, is conserved beyond model species and across kingdoms.
Orthologs are equally important for wet-lab research: we transfer functional information across orthologous proteins and can therefore provide testable hypothesis for a protein’s function for uncharacterized proteins.
The level of sequence conservation even between orthologs is however sometimes below the detection limit of standard software and settings.
We have addressed this problem and developed a web-based method, morFeus (Wagner, et al., BMC Bioinformatics 15 (1), 263, 2014; free usage at http://bio.biochem.mpg.de/morfeus/) for the detection of orthologs in the twilight and midnight zone of sequence similarity.
We compare weighted, binary representations of sequence alignments from a relaxed BLAST search and cluster hits based on their similarity to the query. Iterative reciprocal BLAST searches are carried out to verify orthology. Not only the query, but also other verified orthologs can establish orthology and include further hits for back-BLASTs. In a final step, a network of orthology (see figure) is created and a score independent of the BLAST E-value is calculated for putative orthologs using centrality scoring. We have tested morFeus against the state-of-the-art resources HomoloGene and Inparanoid and achieve significantly higher sensitivity with equal specificity.
de novo motif discovery in protein sequences
Protein motifs are defined as self-sufficient functional units. They are typically only between three and 23 amino acids long and have various functions in proteins. They can serve as cleavage sites, are required for proteosomal degradation, are involved in docking and ligand binding, serve as signals for post-translational modification or are signals for subcellular localization. Their shortness and the fact that they typically lack substantial sequence conservation makes them very difficult to find de novo – i.e. without prior information on the localization or nature of the motif. We aim to identify short functional motifs in proteins de novo. We perform evolutionary restricted profile comparisons to detect common motifs in a set of unrelated proteins. In collaboration with wet-lab researchers, we experimentally test our predicted motifs.
Discovery of structural short motifs in protein 3D structures
Next to sequence-based approaches, we are interested in finding structural motifs in proteins. Can we predicted potential patches on the surface of 3D structures responsible for protein-protein or protein-ligand interaction? We employ statistical methods to identify potential functional, structural motifs in protein 3Dstructures.
April 29th, 2017
HH-MOTiF: de novo detection of short linear motifs in proteins by Hidden Markov Model comparisons
March 27th, 2017
Revision and reannotation of the Halomonas elongata DSM 2581T genome.
March 9th, 2017
A Guide to Computational Methods for Predicting Mitochondrial Localization.
September 21st, 2016
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity
October 14th, 2015
Virtual pathway explorer (viPEr) and pathway enrichment analysis tool (PEANuT): creating and analyzing focus networks to identify cross-talk between molecules and pathways.
June 4th, 2015
Tools for visualization and analysis of molecular networks, pathways, and -omics data.
February 4th, 2015
Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study.
August 6th, 2014
morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring.
February 13th, 2014
KEGGViewer, a BioJS component to visualize KEGG Pathways.
February 13th, 2014
PsicquicGraph, a BioJS component to visualize molecular interactions from PSICQUIC servers.
August 29th, 2012
Designing efficient and specific endoribonuclease-prepared siRNAs.
March 10th, 2011
HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.
June 5th, 2010
SeLOX--a locus of recombination site search tool for the detection and directed evolution of site-specific recombination systems.
March 11th, 2007
Genome-wide resources of endoribonuclease-prepared short interfering RNAs for specific loss-of-function studies.
October 23rd, 2006
ProFAT: a web-based tool for the functional annotation of protein sequences.
August 13th, 2004
An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries.
July 1st, 2004
DEQOR: a web-based tool for the design and quality control of siRNAs.
March 5th, 2004
The BAR-domain family of proteins: a case of bending and binding?
March 1st, 2004
The power and the limitations of cross-species protein identification by mass spectrometry-driven sequence similarity searches.
July 7th, 2016
Structure of a Cytoplasmic 11-Subunit RNA Exosome Complex.
May 9th, 2016
Secretory cargo sorting by Ca2+-dependent Cab45 oligomerization at the trans-Golgi network.
December 18th, 2015
Human Holliday junction resolvase GEN1 uses a chromodomain for efficient DNA recognition and cleavage.
February 16th, 2015
The RNA-binding protein Arrest (Bruno) regulates alternative splicing to enable myofibril maturation in Drosophila flight muscle.
April 2nd, 2013