Software and Databases
CellBase is a NoSQL database that integrates the most relevant biological information about genomic features, proteins, gene expression, regulation, functional annotation, genomic variation and systems biology information. Several relevant data sources such as ENSEMBL, Uniprot, ClinVar or IntAct are imported. CellBase also has a variant annotation built-in component that provides an Ensembl VEP compatible annotation. All data is available through a command line, Java API or by RESTful web services. CellBase is hosted at https://github.com/opencb/cellbase.
OpenCGA provides a scalable and high-performance solution for genomics big data analysis and visualization. OpenCGA integrates some of the OpenCB projects and implements, in addition, other components:
- a Storage Engine framework to store and index NGS read alignments and genomic variants into different NoSQL such as MongoDB or Hadoop HBase - the current implementation can store efficiently thousands of gVCF files while remaining responsive when querying data
- a Catalog which keeps track of users, studies, files, samples, jobs, ... and also provides authentication and authorization capabilities
- Analysis and Execution Engine to execute genomic analysis in a traditional HPC cluster or in Hadoop. OpenCGA has implemented a command line and RESTful web services to manage and query all the data.
OpenCGA is freely available at GitHub https://github.com/opencb/opencga.
Genome Maps is an open source, modern and high-performance web-based HTML5 genome browser. Genome Maps can browse genomic and annotation data from CellBase as well as display remote big data datasets from an OpenCGA server such as BAM and VCF files. Genome Maps has been implemented to be very fast and designed in a modular way to ease its integration with other web applications. Genome Maps constitutes the genome browser component of OpenCB for CellBase and OpenCGA projects. It is also used by other projects such as ICGC or Babelomics.
HPG BigData implements many NGS functionalities and big data databases to work at petabyte scale using Apache Hadoop. The most relevant features include NGS data format converters to Avro or Parquet, quality control and statistics calculation, and indexing and storage for real-time and interactive analysis and visualization. HPG BigData has been implemented as a Java library and a command line interface.
HPG BigData is freely available at GitHub https://github.com/opencb/hpg-bigdata.
BioNetDB implements a GraphDB-based NoSQL storage engine to integrate and analyse biological networks in Systems Biology. BioNetDB integrates most of the relevant biological networks repositories from different data sources such as Reactome or IntAct. It is also possible to integrate data from CellBase or custom expression and variants data from users. BioNetDB data can be queried either through a command line interface or RESTful web services.
BioNetDB is freely available at GitHub https://github.com/opencb/bionetdb.
We also collaborate in the development of other bioinformatic tools.
The European Variation Archive (EVA) from EMBL-EBI accepts submission of, and provides access to, all types of genetic variants from any species, observed in germline or somatic sources, ranging from SNVs to large structural variants. All data is open access via direct query through web and/or programmatic interfaces. The main submission format is a VCF file. VCF files should provide either genotypes from individual samples or aggregate summary information, such as allele frequencies, and be accompanied by descriptive metadata.
EMBL-EBI EVA is available at http://www.ebi.ac.uk/eva.
Babelomics is an integrative platform for the analysis of Transcriptomics, Proteomics and Genomics data with advanced functional profiling. This new version of Babelomics integrates primary (normalization, calls, etc.) and secondary (signatures, predictors, associations, TDTs, clustering, etc.) analysis tools within an environment that allows relating genomic data and/or interpreting them by means of different functional enrichment or gene set methods. Such interpretation is made using functional definitions, protein-protein interactions...
Babelomics is available at http://babelomics.org/.
BiERapp is an interactive web application for assisting in gene prioritization in whole exome sequencing (WES) experiments. BiERapp is mainly oriented to disease gene finding in Mendelian disorders, although it can be applied to other contexts, such as case-control comparisons. BiERapp uploads standard VCF formats and provides different filtering options for the variants based on known population frequencies, predicted pathologic effect, consequence types, etc. Its most interesting feature is an intuitive filter that allows reproducing any familiar pedigree with any inheritance model (including incomplete penetrance) and facilitates the selection of variants (and genes with deleterious variants) segregating along the family. Also case-control or sporadic de novo mutational diseases can be analysed in this framework. BiERapp also manages missing values efficiently.
BiERapp is available at http://bierapp.babelomics.org/.