LCSB R³
Responsible and Reproducible Research

IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses#

Authors#

Shaman Narayanasamy, Yohan Jarosz, Emilie E. L. Muller, Anna Heintz-Buschart, Malte Herold, Anne Kaysen, Cédric C. Laczny, Nicolás Pinel, Patrick May and Paul Wilmes*

Abstract#

Existing workflows for the analysis of multi-omic microbiome datasets are lab-specific and often result in sub-optimal data usage. Here we present IMP, a reproducible and modular pipeline for the integrated and reference-independent analysis of coupled metagenomic and metatranscriptomic data. IMP incorporates robust read preprocessing, iterative co-assembly, analyses of microbial community structure and function, automated binning, as well as genomic signature-based visualizations. The IMP-based data integration strategy enhances data usage, output volume, and output quality as demonstrated using relevant use-cases. Finally, IMP is encapsulated within a user-friendly implementation using Python and Docker. IMP is available at http://r3lab.uni.lu/web/imp/ (MIT license).

Please cite the article on Genome Biology.

Website#

The IMP website hosted within R3lab frame is available on https://r3lab.uni.lu/web/imp.

Source code#

The source code of IMP is available on the LCSB Gitlab where you can traceback what have been done by the authors.

Additional source code#

Some additional analysis made on the paper as also available on the LCSB Gitlab.

Workflow#

Snakemake, a reproducible workflow engine that allow us IMP users execute the same workflow for each analysis. From Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.

Tool repository#

The IMP tool repository is hosted on the LCSB webdav storage. You will find some of the tools installed within IMP. IMP versions are stored under the dist folder.

Environment#

All tools dependencies of IMP are frozen inside a docker container. The files used to build the container can be found on Gitlab and versioned tarballs of the containers are found in the LCSB WebDAV folder.

R toolkit#

We use the checkpoint library to ensure that all of the necessary R packages are installed with the correct versions.

Report#

IMP report contains all information needed to reproduce the same workflow, from the version of IMP used to make the analysis to the configuration file used to start the workflow. Everything is condensed and organized within an HTML report. Some reports can be found on LCSB WebDAV.