Impy

Impy is a command line wrapper that will help you translate command lines to the underlying docker container.

Each time a IMP command is run, it prints out the docker command used. You could also use the latter one if you understand how docker works. It could provides you more control but impy command line should be sufficient for most cases.

It is recommended to run impy command as a non-root user. As it run docker container, it mounts local data volumes to the docker container and it makes difficult to handle permissions if you run the container as root. Please follow the official documentation to know how to proceed. You may encounter difficulties when running impy command as root.

Initialization

Before everything, IMP needs databases that are not shipped inside, otherwise the container would be too large. So you should have a working internet connection to download databases over the network.

impy init

# or

impy -d /path/to/database init

You have to run this only the first time. Once all databases are downloaded you could start to use IMP script. If you have changed the default database path with the -d option, don’t forget to give the same option/ and path to the impy commands.

Use a different filter

When performing the preprocessing step, IMP can filter against a fasta file. By default it takes the hg38. You can change this parameters using the parameter index:

impy init --index <path/to/filter.fa>

If you change this, don’t forget to change the configuration file (see config) filtering: filter parameter to match the name of the filter you use. For instance if you run this command: impy init --index hg19.fa, you have to change the configuration parameter to hg19.

Running

  • Default run to the end of the workflow
impy run -m <input/MG.R1.fq> -m <input/MG.R2.fq> -t <input/MT.R1.fq> -t <input/MT.R2.fq> -o <output directory>

<output directory> is the directory where output files will be located.

<input> are the metagenomics and metatranscriptomics paired input fastq files.

  • Run in single omics mode
impy run -t <input/MT.R1.fq> -t <input/MT.R2.fq> -o <output directory> --single-omics

Here metatranscriptomics data only is provided (same for metagenomics but with the -m option instead of -t).

Description of command line parameters

  • -h: get help.
  • -a: Change the assembler to use: idba/megahit are currently supported. Default to idba.
  • -c: Set the path to the user config file. Default to current_directory/userconfig.imp.json.
  • -d: Set the path to the databases. Default to current_directory/imp-db/.
  • -o: Set the path to the output. Default to current_directory/imp-output/.
  • -s: Set the path to source code to use (the base Snakemake workflow.). By default it uses the one shipped inside the container.
  • --threads: Number of threads to use.
  • --memtotal: Cap of memory to use for megahit in GB.
  • --memcore: Memory allowed per core for samtools in GB.
  • --enter: Enter the container. See snakemake usage for more info.
  • --image-name: Docker image name.
  • --image-tag: Docker image version.
  • --image-repo: Repository of the images (Could be an url/path)

Configuration

Some parameters can also be changed in a configuration file (-c option). Please refers to the configuration section.

Run a specific part of the workflow

  • Run only preprocessing
impy preprocessing -m <input/MG.R1.fq> -m <input/MG.R2.fq> -o <output directory> --single-omics --single-step

Here the pipeline will stop after preprocessing.

  • Run from assembly

Default run to the end of the workflow:

impy assembly -m <input/MG.R1.fq> -m <input/MG.R2.fq> -m <input/MG.SE.fq> -t <input/MT.R1.fq> -t <input/MT.R2.fq> -t <input/MT.SE.fq> -o <output directory>

Here 3 inputs are provided for MG or MT data. The first two one are the preprocessed MG/MT paired files, and the third one is the single end reads left.

You could also run it in single omics mode and/or stop the workflow at the end of the assembly part.

impy <step> -m <input/MG.R1.fq> -m <input/MG.R2.fq> -o <output directory> --single-omics --single-step
  • Run from analysis

As for now, analysis translation of parameters is not yet implemented. In order to run only the Analysis step, you need to provide the output directory of a previous IMP run as an input.

impy analysis --data-dir <output-directory> --single-step
  • Run from binning

As for now, binning translation of parameters is not yet implemented. In order to run only the Binning step, you need to provide the output directory of a previous IMP run as an input.

impy binning --data-dir <output-directory> --single-step
# or to change the binning method
impy -b maxbin binning --data-dir <output-directory> --single-step

Docker

By using impy, you can enter the container and run the pipeline in interactive mode:

impy --enter ... <command> ...

You will find yourself in the ~/code directory where IMP codebase is located. Every interesting path are in your home folder:

  • ~/output: where the output is located. Map to the -o option.

  • ~/data: where the input data is located. Map to the -m and -t options.

  • ~/code: where the IMP source code is located. Map to the -s option.

  • ~/databases: where the IMP databases are located. Map to the -d option.

At any moment, you could exit the container with the command exit.

Use different image and source code

To use latest development version of IMP, you have to use a different code base than the one shipped inside the container, to do so, clone the latest version from the IMP repo and specify the directory when you launch impy, this will override the source code used inside the container.

cd /some/path
git clone https://git-r3lab.uni.lu/IMP/IMP.git
impy -s /some/path/IMP ... <command> ...

If you want to use a different image or a different version, you could also specify it from the command line.

impy --image-tag 1.4 ... <command> ...