Featured Websites

Magallanes: a web services discovery and automatic workflow composition tool.

October 5, 2014 |

BACKGROUND: Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data a...

Identifying dispersed epigenomic domains from ChIP-Seq data.

October 5, 2014 |

MOTIVATION: Increasing rates of publication and DNA sequencing make the problem of finding relevant articles for a particular gene or genomic region more challenging than ever. Existing text-mining approaches focus on finding gene names or identifiers in English text. These are often not unique and ...

Prediction of the binding affinities of peptides to class II MHC using a regularized thermodynamic model.

October 5, 2014 |

SUMMARY: The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers o...

Robust synthetic biology design: stochastic game theory approach.

October 5, 2014 |

SUMMARY: LINKDATAGEN is a perl tool that generates linkage mapping input files for five different linkage mapping tools using data from all 11 HAPMAP Phase III populations. It provides rudimentary error checks and is easily amended for personal linkage mapping preferences. Availability and Implement...

Detection of DNA copy number alterations using penalized least squares regression.

October 5, 2014 |

MOTIVATION: Genomic DNA copy number alterations are characteristic of many human diseases including cancer. Various techniques and platforms have been proposed to allow researchers to partition the whole genome into segments where copy numbers change between contiguous segments, and subsequently to ...

NCBI Viral genomes

October 5, 2014 |

Viral genome resource at NCBI ...

Estimation of GFP-tagged RNA numbers from temporal fluorescence intensity data.

October 5, 2014 |

Merging the forward and reverse reads from paired-end sequencing is a critical task that can significantly improve the performance of downstream tasks, such as genome assembly and mapping, by providing them with virtually elongated reads. However, due to the inherent limitations of most paired-end s...

SCAN: SNP and copy number annotation.

October 5, 2014 |

Data fusion methods are powerful tools for evaluating experiments designed to discover measurable features of directly unobservable systems. We describe an interactive software platform, Visual Integration for Bayesian Evaluation, that ingests or creates bayesian posterior probability matrices, perf...

MDB: a database system utilizing automatic construction of modules and STAR-derived universal language.

October 5, 2014 |

MOTIVATION: The value of information greatly increases if stored in databases. The objective was to construct a multi-purpose database system primarily designed to store and provide access to three-dimensional structures of biological molecules including theoretical models. RESULTS: A dictionary def...

Robust and accurate data enrichment statistics via distribution function of sum of weights.

October 5, 2014 |

SUMMARY: Enumeration of the dense sub-graphs of a graph is of interest in community discovery and membership problems, including dense sub-graphs that overlap each other. Described herein is ODES (Overlapping DEnse Sub-graphs), pthreads parallelized software to extract all overlapping maximal sub-gr...


Protein-protein binding affinity prediction on a diverse set of structures.

MOTIVATION: Accurate binding free energy functions for protein-protein interactions are imperative for a wide range of purposes. Their construction is predicated upon ascertaining the factors that influence binding and their relative importance. A recent benchmark of binding affinities has allowed, for the first time, the evaluation and construction of binding free energy models using a diverse set of complexes, and a systematic assessment of our ability to model the energetics of conformational changes. RESULTS: We construct a large set of molecular descriptors using commonly available tools, introducing the use of energetic factors associated with conformational changes and disorder to order transitions, as well as features calculated on structural ensembles. The descriptors are used to train and test a binding free energy model using a consensus of four machine learning algorithms, whose performance constitutes a significant improvement over the other state of the art empirical free energy functions tested. The internal workings of the learners show how the descriptors are used, illuminating the determinants of protein-protein binding. AVAILABILITY: The molecular descriptor set and descriptor values for all complexes are available in the supplementary. A web server for the learners and coordinates for the bound and unbound structures can be accessed from the website: http://bmm.cancerresearchuk.org/%7EAffinity CONTACT: paul.bates@cancer.org.uk.

Reconstructing transcription factor activities in hierarchical transcription network motifs.

MOTIVATION: A knowledge of the dynamics of transcription factors is fundamental to understand the transcriptional regulation mechanism. Nowadays an experimental measure of transcription factor activities in vivo represents a challenge. Several methods have been developed to infer these activities from easily measurable quantities such as mRNA expression of target genes. A limitation of these methods is represented by the fact that they rely on very simple single-layer structures, typically consisting of one or more transcription factors regulating a number of target genes. RESULTS: We present a novel statistical inference methodology to reverse engineer the dynamics of transcription factors in hierarchical network motifs such as feed-forward loops. The approach we present is based on a continuous time representation of the system where the high level master transcription factor is represented as a two state Markov jump process driving a system of differential equations. We solve the inference problem using an efficient variational approach and demonstrate our method on simulated data and two real datasets. The results on real data show that the predictions of our approach can capture biological behaviours in a more effective way than single-layer models of transcription, and can lead to novel biological insights. AVAILABILITY: http://homepages.inf.ed.ac.uk/gsanguin/software.html CONTACT: g.sanguinetti@ed.ac.uk.

survcomp: an R/Bioconductor package for performance assessment and comparison of survival models.

SUMMARY: The survcomp package provides functions to assess and statistically compare the performance of survival/risk prediction models. It implements state-of-the-art statistics to (i) measure the performance of risk prediction models, (ii) combine these statistical estimates from multiple datasets using a meta-analytical framework, and (iii) statistically compare the performance of competitive models. AVAILABILITY: The R/Bioconductor package survcomp is provided open source under the Artistic-2.0 License with a user manual containing installation, operating instructions and use case scenarios on real datasets. survcomp requires R version 2.13.0 or higher.URL: http://bioconductor.org/packages/release/bioc/html/survcomp.html CONTACT: Benjamin Haibe-Kains <bhaibeka@jimmy.harvard.edu>, Markus Schröder <mschroed@jimmy.harvard.edu>