Featured Websites

CryptoDB - Cryptosporidium database

October 5, 2014 |

Cryptosporidium parvum genome database ...

PEAKS: identification of regulatory motifs by their position in DNA sequences.

October 5, 2014 |

Many DNA functional motifs tend to accumulate or cluster at specific gene locations. These locations can be detected, in a group of gene sequences, as high frequency peaks with respect to a reference position, such as the transcription start site (TSS). We have developed a web tool for the identific...

ISOL@: an Italian SOLAnaceae genomics resource.

October 5, 2014 |

Computer simulations play an important role in studies of non-random mating populations. Because of implementation difficulties, only very limited types of non-random mating schemes are provided in the currently available simulation programs. Starting with version 0.8.5, simuPOP provides a few matin...

LegumeTFDB: an integrative database of Glycine max, Lotus japonicus and Medicago truncatula transcription factors.

October 5, 2014 |

We have established a database named LegumeTFDB to provide access to transcription factor (TF) repertoires of three major legume species: soybean (Glycine max), Lotus japonicus and Medicago truncatula. LegumeTFDB integrates unique information for each TF gene and family, including sequence features,...

A genotype calling algorithm for the Illumina BeadArray platform.

October 5, 2014 |

SUMMARY: The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Window...

uBioRSS: tracking taxonomic literature using RSS.

October 5, 2014 |

Web content syndication through standard formats such as RSS and ATOM has become an increasingly popular mechanism for publishers, news sources and blogs to disseminate regularly updated content. These standardized syndication formats deliver content directly to the subscriber, allowing them to loca...

NGSANE: a lightweight production informatics framework for high-throughput data analysis.

October 5, 2014 |

The development of bioinformatic solutions for microbial ecology in Perl is limited by the lack of modules to represent and manipulate microbial community profiles from amplicon and meta-omics studies. Here we introduce Bio-Community, an open-source, collaborative toolkit that extends BioPerl. Bio-C...

ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking.

October 5, 2014 |

Unsupervised class discovery is a highly useful technique in cancer research, where intrinsic groups sharing biological characteristics may exist but are unknown. The consensus clustering (CC) method provides quantitative and visual stability evidence for estimating the number of unsupervised classe...

Medline search engine for finding genetic markers with biological significance.

October 5, 2014 |

MOTIVATION: Genome-wide high density SNP association studies are expected to identify various SNP alleles associated with different complex disorders. Understanding the biological significance of these SNP alleles in the context of existing literature is a major challenge since existing search engin...

Analysis of segmental duplications via duplication distance.

October 5, 2014 |

In the Arabidopsis thaliana regulatory element analyzer (AtREA) server, we have integrated sequence data, genome-wide expression data and functional annotation data in three application modules which will be useful to identify major regulatory targets of a user-provided cis-regulatory element (CRE),...


Protein-protein binding affinity prediction on a diverse set of structures.

MOTIVATION: Accurate binding free energy functions for protein-protein interactions are imperative for a wide range of purposes. Their construction is predicated upon ascertaining the factors that influence binding and their relative importance. A recent benchmark of binding affinities has allowed, for the first time, the evaluation and construction of binding free energy models using a diverse set of complexes, and a systematic assessment of our ability to model the energetics of conformational changes. RESULTS: We construct a large set of molecular descriptors using commonly available tools, introducing the use of energetic factors associated with conformational changes and disorder to order transitions, as well as features calculated on structural ensembles. The descriptors are used to train and test a binding free energy model using a consensus of four machine learning algorithms, whose performance constitutes a significant improvement over the other state of the art empirical free energy functions tested. The internal workings of the learners show how the descriptors are used, illuminating the determinants of protein-protein binding. AVAILABILITY: The molecular descriptor set and descriptor values for all complexes are available in the supplementary. A web server for the learners and coordinates for the bound and unbound structures can be accessed from the website: http://bmm.cancerresearchuk.org/%7EAffinity CONTACT: paul.bates@cancer.org.uk.

Reconstructing transcription factor activities in hierarchical transcription network motifs.

MOTIVATION: A knowledge of the dynamics of transcription factors is fundamental to understand the transcriptional regulation mechanism. Nowadays an experimental measure of transcription factor activities in vivo represents a challenge. Several methods have been developed to infer these activities from easily measurable quantities such as mRNA expression of target genes. A limitation of these methods is represented by the fact that they rely on very simple single-layer structures, typically consisting of one or more transcription factors regulating a number of target genes. RESULTS: We present a novel statistical inference methodology to reverse engineer the dynamics of transcription factors in hierarchical network motifs such as feed-forward loops. The approach we present is based on a continuous time representation of the system where the high level master transcription factor is represented as a two state Markov jump process driving a system of differential equations. We solve the inference problem using an efficient variational approach and demonstrate our method on simulated data and two real datasets. The results on real data show that the predictions of our approach can capture biological behaviours in a more effective way than single-layer models of transcription, and can lead to novel biological insights. AVAILABILITY: http://homepages.inf.ed.ac.uk/gsanguin/software.html CONTACT: g.sanguinetti@ed.ac.uk.

survcomp: an R/Bioconductor package for performance assessment and comparison of survival models.

SUMMARY: The survcomp package provides functions to assess and statistically compare the performance of survival/risk prediction models. It implements state-of-the-art statistics to (i) measure the performance of risk prediction models, (ii) combine these statistical estimates from multiple datasets using a meta-analytical framework, and (iii) statistically compare the performance of competitive models. AVAILABILITY: The R/Bioconductor package survcomp is provided open source under the Artistic-2.0 License with a user manual containing installation, operating instructions and use case scenarios on real datasets. survcomp requires R version 2.13.0 or higher.URL: http://bioconductor.org/packages/release/bioc/html/survcomp.html CONTACT: Benjamin Haibe-Kains <bhaibeka@jimmy.harvard.edu>, Markus Schröder <mschroed@jimmy.harvard.edu>