Impy is a command line wrapper that will help you translate command lines to the underlying docker container.
Each time a IMP command is run, it prints out the docker command used. You could also use the latter one if you
understand how docker works. It could provides you more control but
impy command line should be sufficient for most cases.
It is recommended to run
impycommand as a non-root user. As it run docker container, it mounts local data volumes to the docker container and it makes difficult to handle permissions if you run the container as root. Please follow the official documentation to know how to proceed. You may encounter difficulties when running
impycommand as root.
Before everything, IMP needs databases that are not shipped inside, otherwise the container would be too large. So you should have a working internet connection to download databases over the network.
impy init # or impy -d /path/to/database init
You have to run this only the first time. Once all databases are downloaded you could start to use IMP script.
If you have changed the default database path with the
-d option, don’t forget to give the same option/ and path to the
Use a different filter
When performing the preprocessing step, IMP can filter against a fasta file. By default it takes the hg38. You can change this parameters using the parameter
impy init --index <path/to/filter.fa>
If you change this, don’t forget to change the configuration file (see config)
filtering: filter parameter to
match the name of the filter you use. For instance if you run this command:
impy init --index hg19.fa, you have to
change the configuration parameter to
- Default run to the end of the workflow
impy run -m <input/MG.R1.fq> -m <input/MG.R2.fq> -t <input/MT.R1.fq> -t <input/MT.R2.fq> -o <output directory>
<output directory> is the directory where output files will be located.
<input> are the metagenomics and metatranscriptomics paired input fastq files.
- Run in single omics mode
impy run -t <input/MT.R1.fq> -t <input/MT.R2.fq> -o <output directory> --single-omics
Here metatranscriptomics data only is provided (same for metagenomics but with the
-m option instead of
Description of command line parameters
-h: get help.
-a: Change the assembler to use: idba/megahit are currently supported. Default to
-c: Set the path to the user config file. Default to
-d: Set the path to the databases. Default to
-o: Set the path to the output. Default to
-s: Set the path to source code to use (the base Snakemake workflow.). By default it uses the one shipped inside the container.
--threads: Number of threads to use.
--memtotal: Cap of memory to use for megahit in GB.
--memcore: Memory allowed per core for samtools in GB.
--enter: Enter the container. See snakemake usage for more info.
--image-name: Docker image name.
--image-tag: Docker image version.
--image-repo: Repository of the images (Could be an url/path)
Some parameters can also be changed in a configuration file (
-c option). Please refers to the configuration section.
Run a specific part of the workflow
- Run only preprocessing
impy preprocessing -m <input/MG.R1.fq> -m <input/MG.R2.fq> -o <output directory> --single-omics --single-step
Here the pipeline will stop after preprocessing.
- Run from assembly
Default run to the end of the workflow:
impy assembly -m <input/MG.R1.fq> -m <input/MG.R2.fq> -m <input/MG.SE.fq> -t <input/MT.R1.fq> -t <input/MT.R2.fq> -t <input/MT.SE.fq> -o <output directory>
Here 3 inputs are provided for MG or MT data. The first two one are the preprocessed MG/MT paired files, and the third one is the single end reads left.
You could also run it in single omics mode and/or stop the workflow at the end of the assembly part.
impy <step> -m <input/MG.R1.fq> -m <input/MG.R2.fq> -o <output directory> --single-omics --single-step
- Run from analysis
As for now, analysis translation of parameters is not yet implemented. In order to run only the Analysis step, you need to provide the output directory of a previous IMP run as an input.
impy analysis --data-dir <output-directory> --single-step
- Run from binning
As for now, binning translation of parameters is not yet implemented. In order to run only the Binning step, you need to provide the output directory of a previous IMP run as an input.
impy binning --data-dir <output-directory> --single-step # or to change the binning method impy -b maxbin binning --data-dir <output-directory> --single-step
impy, you can enter the container and run the pipeline in interactive mode:
impy --enter ... <command> ...
You will find yourself in the
~/code directory where IMP codebase is located.
Every interesting path are in your home folder:
~/output: where the output is located. Map to the
~/data: where the input data is located. Map to the
~/code: where the IMP source code is located. Map to the
~/databases: where the IMP databases are located. Map to the
At any moment, you could exit the container with the command
Use different image and source code
To use latest development version of IMP, you have to use a different code base than the one shipped inside the container, to do so, clone the latest version from the IMP repo and specify the directory when you launch
impy, this will override the source code used inside the container.
cd /some/path git clone https://git-r3lab.uni.lu/IMP/IMP.git impy -s /some/path/IMP ... <command> ...
If you want to use a different image or a different version, you could also specify it from the command line.
impy --image-tag 1.4 ... <command> ...