PhyloDet: a scalable visualization tool for mapping multiple traits to large evolutionary trees.

October 5, 2014 |

Evolutionary biologists are often interested in finding correlations among biological traits across a number of species, as such correlations may lead to testable hypotheses about the underlying function. Because some species are more closely related than others, computing and visualizing these corr...

Simultaneous Recognition and Segmentation of Cells: Application in C. elegans.

October 5, 2014 |

MOTIVATION: Automatic recognition of cell identities is critical for quantitative measurement, targeting, and manipulation of cells of model animals at single-cell resolution. It has been shown to be a powerful tool for studying gene expression and regulation, cell lineages, and cell fates. Existing...

Reconstruction of highly heterogeneous gene-content evolution across the three domains of life.

October 5, 2014 |

MOTIVATION AND RESULTS: Motivated by the recent rise of interest in small regulatory RNAs, we present Locomotif--a new approach for locating RNA motifs that goes beyond the previous ones in three ways: (1) motif search is based on efficient dynamic programming algorithms, incorporating the establish...

A mixture model-based approach to the clustering of microarray expression data.

October 5, 2014 |

MOTIVATION: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cl...

The Sequence Alignment/Map format and SAMtools.

October 5, 2014 |

SUMMARY: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and...

The most informative spacing test effectively discovers biologically relevant outliers or multiple modes in expression.

October 5, 2014 |

The development of bioinformatic solutions for microbial ecology in Perl is limited by the lack of modules to represent and manipulate microbial community profiles from amplicon and meta-omics studies. Here we introduce Bio-Community, an open-source, collaborative toolkit that extends BioPerl. Bio-C...

On the inference of spatial structure from population genetics data.

October 5, 2014 |

FBA-SimVis is a VANTED plug-in for the constraint-based analysis of metabolic models with special focus on the visual exploration of metabolic flux data resulting from model analysis. The program provides a user-friendly environment for model reconstruction, constraint-based model analysis, and inte...

IDEA: Interactive Display for Evolutionary Analyses.

October 5, 2014 |

SUMMARY: The R package SIMoNe (Statistical Inference for MOdular NEtworks) enables inference of gene-regulatory networks based on partial correlation coefficients from microarray experiments. Modelling gene expression data with a Gaussian graphical model (hereafter GGM), the algorithm estimates non-...

A censored beta mixture model for the estimation of the proportion of non-differentially expressed genes.

October 5, 2014 |

The Microbial Proteomic Resource (MPR) is a repository service that contains non-redundant protein databases of related bacterial strains, which were generated through an in-house developed software called Multi-Strain Mass Spectrometry Prokaryotic DataBase Builder (MSMSpdbb). MSMSpdbb merges and cl...

Cross-species queries of large gene expression databases.

October 5, 2014 |

High-throughput screening (HTS) is a common technique for both drug discovery and basic research, but researchers often struggle with how best to derive hits from HTS data. While a wide range of hit identification techniques exist, little information is available about their sensitivity and specific...

Protein-protein binding affinity prediction on a diverse set of structures.

MOTIVATION: Accurate binding free energy functions for protein-protein interactions are imperative for a wide range of purposes. Their construction is predicated upon ascertaining the factors that influence binding and their relative importance. A recent benchmark of binding affinities has allowed, for the first time, the evaluation and construction of binding free energy models using a diverse set of complexes, and a systematic assessment of our ability to model the energetics of conformational changes. RESULTS: We construct a large set of molecular descriptors using commonly available tools, introducing the use of energetic factors associated with conformational changes and disorder to order transitions, as well as features calculated on structural ensembles. The descriptors are used to train and test a binding free energy model using a consensus of four machine learning algorithms, whose performance constitutes a significant improvement over the other state of the art empirical free energy functions tested. The internal workings of the learners show how the descriptors are used, illuminating the determinants of protein-protein binding. AVAILABILITY: The molecular descriptor set and descriptor values for all complexes are available in the supplementary. A web server for the learners and coordinates for the bound and unbound structures can be accessed from the website: http://bmm.cancerresearchuk.org/%7EAffinity CONTACT: paul.bates@cancer.org.uk.

Reconstructing transcription factor activities in hierarchical transcription network motifs.

MOTIVATION: A knowledge of the dynamics of transcription factors is fundamental to understand the transcriptional regulation mechanism. Nowadays an experimental measure of transcription factor activities in vivo represents a challenge. Several methods have been developed to infer these activities from easily measurable quantities such as mRNA expression of target genes. A limitation of these methods is represented by the fact that they rely on very simple single-layer structures, typically consisting of one or more transcription factors regulating a number of target genes. RESULTS: We present a novel statistical inference methodology to reverse engineer the dynamics of transcription factors in hierarchical network motifs such as feed-forward loops. The approach we present is based on a continuous time representation of the system where the high level master transcription factor is represented as a two state Markov jump process driving a system of differential equations. We solve the inference problem using an efficient variational approach and demonstrate our method on simulated data and two real datasets. The results on real data show that the predictions of our approach can capture biological behaviours in a more effective way than single-layer models of transcription, and can lead to novel biological insights. AVAILABILITY: http://homepages.inf.ed.ac.uk/gsanguin/software.html CONTACT: g.sanguinetti@ed.ac.uk.

survcomp: an R/Bioconductor package for performance assessment and comparison of survival models.

SUMMARY: The survcomp package provides functions to assess and statistically compare the performance of survival/risk prediction models. It implements state-of-the-art statistics to (i) measure the performance of risk prediction models, (ii) combine these statistical estimates from multiple datasets using a meta-analytical framework, and (iii) statistically compare the performance of competitive models. AVAILABILITY: The R/Bioconductor package survcomp is provided open source under the Artistic-2.0 License with a user manual containing installation, operating instructions and use case scenarios on real datasets. survcomp requires R version 2.13.0 or higher.URL: http://bioconductor.org/packages/release/bioc/html/survcomp.html CONTACT: Benjamin Haibe-Kains <bhaibeka@jimmy.harvard.edu>, Markus Schröder <mschroed@jimmy.harvard.edu>