The Computational Biology Lab of the HPC Service is focused on the development of new advanced computing solutions using the most modern HPC and big data technologies to solve current challenges in genomic data analysis and visualization. Some of these challenges include big data size, performance, scalability, security, data integration, collaboration, knowledge base, variant annotation, sample annotation, ...
We are interested in developing new algorithms and bioinformatic tools for the analysis of genomic data that enable researchers to understand what biological processes, genes or variants are involved in different phenotypes or diseases. All the software is developed in an open source initiative called OpenCB and is freely available at GitHub https://github.com/opencb. Our main lines of research cover:
- Characterization and analysis of genomic variants
- Next-Generation Sequencing (NGS) data processing and analysis
- Computational Systems Biology
- HPC and big data (Hadoop, Spark) software development for genome scale data analysis
- Machine Learning and Data Mining
- Cloud-based solutions to process and manage large amounts of data
- Databases and Genome scale data visualization.
A few years ago biology entered the big data era. Many other fields of science have been working with big data for some years, but this is a new scenario for biology and most of the software in bioinformatics has not been designed to work with hundreds of TeraBytes or PetaBytes of data. We aim to to close this gap by using the most advanced computing technologies to provide a new generation of HPC and big data software. Some of these technologies include:
- Big data technologies such as Hadoop or Spark to process and analyse big datasets
- High-Performance Computing (HPC) technologies such as AVX2, GPUs, ... to speed-up computing performance
- NoSQL databases such as MongoDB, Solr or HBase allow to store and index hundreds of TBs of data for their analysis and visualization. RESTful web services to make all this data available to applications
- HTML5 and SVG standards to deliver high-end web applications and visualization solutions.
Efficiency Matters. As data size increases in biology, producing efficient computing software is becoming almost as important as getting value from your analysis. Inefficient computation translates to an economic penalty on science (in time, effort and hardware needs).
This lab was founded in 2015 and is led by Ignacio Medina. He started developing most of this work in Joaquin Dopazo's group at CIPF. In 2006 he joined the Department of Bioinformatics and Genomics at the Prince Felipe Research Center, as a Bioinformatician and Researcher, and in 2010 became a Project Manager of several clinical and software development projects. In 2014, he joined EMBL-EBI as a Project Manager and Senior Software Architect in the EBI Variation Team. In 2015, he joined University of Cambridge as Head of the Computational Biology Lab in the HPC Service. His current research interests include HPC and big data software development for genomic-scale data analysis and visualization. He participates in several international projects such as EVA or Genomics England. He has authored more than 40 papers in international peer-reviewed journals.