Featured Websites

BLAST2GENE: a comprehensive conversion of BLAST output into independent genes and gene fragments.

October 5, 2014 |

SUMMARY: BLAST2GENE is a program that allows a detailed analysis of genomic regions containing completely or partially duplicated genes. From a BLAST (or BL2SEQ) comparison of a protein or nucleotide query sequence with any genomic region of interest, BLAST2GENE processes all high scoring pairwise a...

Detection and correction of probe-level artefacts on microarrays.

October 5, 2014 |

We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions...

CCDB - The CyberCell Database

October 5, 2014 |

E. coli database at U. Alberta ...

PACdb: PolyA Cleavage Site and 3-UTR Database.

October 5, 2014 |

SUMMARY: The PolyA Cleavage Site and 3-UTR Database (PACdb) is a web-accessible database that catalogs putative 3-processing sites and 3-UTR sequences for multiple organisms. Sites have been identified primarily via expressed sequence tag-genome alignments, enabling delineation of both the specifici...

The Cell Cycle DB

October 5, 2014 |

Genes and proteins involved in human and yeast cell cycle ...

Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants.

October 5, 2014 |

BACKGROUND: Peptide Mass Fingerprinting (PMF) is a widely used mass spectrometry (MS) method of analysis of proteins and peptides. It relies on the comparison between experimentally determined and theoretical mass spectra. The PMF process requires calibration, usually performed with external or inte...

SuperDrug: a conformational drug database.

October 5, 2014 |

MOTIVATION: Different resources exist for experimentally determined and computed three-dimensional (3D)-structures of low molecular weight structures but for approved drugs, no free, publicly accessible source of 3D-structures and conformers is available. Furthermore, for selection purposes or for c...

DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

October 5, 2014 |

DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the fi...

Error correction of high-throughput sequencing datasets with non-uniform coverage.

October 5, 2014 |

MOTIVATION: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a mu...

ssSNPer: identifying statistically similar SNPs to aid interpretation of genetic association studies.

October 5, 2014 |

ssSNPer is a novel user-friendly web interface that provides easy determination of the number and location of untested HapMap SNPs, in the region surrounding a tested HapMap SNP, which are statistically similar and would thus produce comparable and perhaps more significant association results. Ident...


Protein-protein binding affinity prediction on a diverse set of structures.

MOTIVATION: Accurate binding free energy functions for protein-protein interactions are imperative for a wide range of purposes. Their construction is predicated upon ascertaining the factors that influence binding and their relative importance. A recent benchmark of binding affinities has allowed, for the first time, the evaluation and construction of binding free energy models using a diverse set of complexes, and a systematic assessment of our ability to model the energetics of conformational changes. RESULTS: We construct a large set of molecular descriptors using commonly available tools, introducing the use of energetic factors associated with conformational changes and disorder to order transitions, as well as features calculated on structural ensembles. The descriptors are used to train and test a binding free energy model using a consensus of four machine learning algorithms, whose performance constitutes a significant improvement over the other state of the art empirical free energy functions tested. The internal workings of the learners show how the descriptors are used, illuminating the determinants of protein-protein binding. AVAILABILITY: The molecular descriptor set and descriptor values for all complexes are available in the supplementary. A web server for the learners and coordinates for the bound and unbound structures can be accessed from the website: http://bmm.cancerresearchuk.org/%7EAffinity CONTACT: paul.bates@cancer.org.uk.

Reconstructing transcription factor activities in hierarchical transcription network motifs.

MOTIVATION: A knowledge of the dynamics of transcription factors is fundamental to understand the transcriptional regulation mechanism. Nowadays an experimental measure of transcription factor activities in vivo represents a challenge. Several methods have been developed to infer these activities from easily measurable quantities such as mRNA expression of target genes. A limitation of these methods is represented by the fact that they rely on very simple single-layer structures, typically consisting of one or more transcription factors regulating a number of target genes. RESULTS: We present a novel statistical inference methodology to reverse engineer the dynamics of transcription factors in hierarchical network motifs such as feed-forward loops. The approach we present is based on a continuous time representation of the system where the high level master transcription factor is represented as a two state Markov jump process driving a system of differential equations. We solve the inference problem using an efficient variational approach and demonstrate our method on simulated data and two real datasets. The results on real data show that the predictions of our approach can capture biological behaviours in a more effective way than single-layer models of transcription, and can lead to novel biological insights. AVAILABILITY: http://homepages.inf.ed.ac.uk/gsanguin/software.html CONTACT: g.sanguinetti@ed.ac.uk.

survcomp: an R/Bioconductor package for performance assessment and comparison of survival models.

SUMMARY: The survcomp package provides functions to assess and statistically compare the performance of survival/risk prediction models. It implements state-of-the-art statistics to (i) measure the performance of risk prediction models, (ii) combine these statistical estimates from multiple datasets using a meta-analytical framework, and (iii) statistically compare the performance of competitive models. AVAILABILITY: The R/Bioconductor package survcomp is provided open source under the Artistic-2.0 License with a user manual containing installation, operating instructions and use case scenarios on real datasets. survcomp requires R version 2.13.0 or higher.URL: http://bioconductor.org/packages/release/bioc/html/survcomp.html CONTACT: Benjamin Haibe-Kains <bhaibeka@jimmy.harvard.edu>, Markus Schröder <mschroed@jimmy.harvard.edu>