Gene Function Prediction

Next Generation Sequencing delivers enormous amounts of DNA or RNA sequence data, but understanding what all the genes in this data are doing is far from easy. We offer advanced computational methods to annotate gene functions and help predict causal candidate genes for traits of interest. We developed a method to predict gene functions based on gene sequence- and gene expression-information. This overcomes the problem that using only sequence similarity (e.g. BLAST) to predict gene functions often does not work very well.

gfp networks Figure: Network information is used to predict biological processes in which genes are involved: Circles indicate genes, lines connections between genes based on e.g. co-expression. Using partial knowledge about gene functions (“training data”), biological processes can be predicted for genes with completely unknown function (grey color in partially labelled network).

Based on this gene function approach, we developed a novel method allowing to automatically prioritize potential causal genes underlying Quantitative Trait Loci (QTLs) for a variety of traits. QTL data indicate genome regions linked to traits such as e.g. yield. A bottleneck in using QTL data is the fact that these regions can be quite large, containing hundreds of genes. Finding the most likely causal genes in these regions can be quite time-consuming. Our approach to pinpoint these genes in an automatic way was validated on a large set of rice QTL data [5]. We are currently further developing the method as well as applying it to specific crops and traits of interest.