README - Project Container¶
Author: Vi Varga
Last Modified: 19.02.2024
Introduction¶
This README.ipynb file provides a brief explanation/guide for how to use the container that has been prepared for students in BBT045 to use for their projects. A README.md file will also be provided, with all markdown-formatted text included therein.
The bbt045-projects.sif
container has been created for students to use to run their projects, so that students do not have to install their own software on the Vera cluster. Please contact the teaching staff (especially Vi), if you would like access to a program that is not included in the container, or if something malfunctions.
If you do not have it already, you can download this information in Jupyter Notebook format from here.
Installed software¶
The full list of programs installed in the bbt045-projects.sif
container can be found in the bbt045-projects.yml
and conda_environment_args_proj.def
files included in the same directory as the container (/cephyr/NOBACKUP/groups/bbt045_2024/ProjectSoftware/
). Below is a list of the most important software:
- FastQC
- TrimGalore!
- Trimmomatic
- MetaCompass
- SPAdes (including metaSPAdes)
- Prokka
- CD-HIT
- MetaPhlan2
- Bowtie2
- Python
- Biopython
- Jupyter
- matplotlib, seaborn
- numpy, pandas
- scipy
Using the container¶
In order to use the bbt045-projects.sif
container, please use the run_jupyter_proj.sh
script found in the same directory as the container, and modify the time requirement and ID as you have done for the run_jupyter.sh
script before. ALternatively, you can continue using your copy of the run_jupyter.sh
, script, and simply change the PATH to the container to read:
container=/cephyr/NOBACKUP/groups/bbt045_2024/ProjectSoftware/bbt045-projects.sif
Of the programs mentioned above, all but MetaCompass have been installed using conda
. All programs installed via conda
can be run directly from within your Jupyter Notebook, like so:
! metaspades.py -h
SPAdes genome assembler v3.15.5 [metaSPAdes mode] Usage: spades.py [options] -o <output_dir> Basic options: -o <output_dir> directory to store all the resulting files (required) --iontorrent this flag is required for IonTorrent data --test runs SPAdes on toy dataset -h, --help prints this usage message -v, --version prints version Input data: --12 <filename> file with interlaced forward and reverse paired-end reads -1 <filename> file with forward paired-end reads -2 <filename> file with reverse paired-end reads -s <filename> file with unpaired reads --merged <filename> file with merged forward and reverse paired-end reads --pe-12 <#> <filename> file with interlaced reads for paired-end library number <#>. Older deprecated syntax is -pe<#>-12 <filename> --pe-1 <#> <filename> file with forward reads for paired-end library number <#>. Older deprecated syntax is -pe<#>-1 <filename> --pe-2 <#> <filename> file with reverse reads for paired-end library number <#>. Older deprecated syntax is -pe<#>-2 <filename> --pe-s <#> <filename> file with unpaired reads for paired-end library number <#>. Older deprecated syntax is -pe<#>-s <filename> --pe-m <#> <filename> file with merged reads for paired-end library number <#>. Older deprecated syntax is -pe<#>-m <filename> --pe-or <#> <or> orientation of reads for paired-end library number <#> (<or> = fr, rf, ff). Older deprecated syntax is -pe<#>-<or> --s <#> <filename> file with unpaired reads for single reads library number <#>. Older deprecated syntax is --s<#> <filename> --pacbio <filename> file with PacBio reads --nanopore <filename> file with Nanopore reads Pipeline options: --only-error-correction runs only read error correction (without assembling) --only-assembler runs only assembling (without read error correction) --checkpoints <last or all> save intermediate check-points ('last', 'all') --continue continue run from the last available check-point (only -o should be specified) --restart-from <cp> restart run with updated options and from the specified check-point ('ec', 'as', 'k<int>', 'mc', 'last') --disable-gzip-output forces error correction not to compress the corrected reads --disable-rr disables repeat resolution stage of assembling Advanced options: --dataset <filename> file with dataset description in YAML format -t <int>, --threads <int> number of threads. [default: 16] -m <int>, --memory <int> RAM limit for SPAdes in Gb (terminates if exceeded). [default: 250] --tmp-dir <dirname> directory for temporary files. [default: <output_dir>/tmp] -k <int> [<int> ...] list of k-mer sizes (must be odd and less than 128) [default: 'auto'] --phred-offset <33 or 64> PHRED quality offset in the input reads (33 or 64), [default: auto-detect] --custom-hmms <dirname> directory with custom hmms that replace default ones, [default: None]
MetaCompass does not have a conda
package available, so it has been installed in the container from source. In order to use it, you must call the program using the full path to the executable, like so:
! /opt/MetaCompass-2.0-beta/go_metacompass.py -h
MetaCompass metagenome assembler version 2.0.0 by Victoria Cepeda (vcepeda@cs.umd.edu) usage: go_metacompass.py [-h] [-c [CONFIG]] [-1 [FORWARD]] [-2 [REVERSE]] [-U [UNPAIRED]] [-r [REF]] [-s [REFSEL]] [-p [PICKREF]] [-m [MINCOV]] [-g [MINCTGLEN]] [-l [READLEN]] [-b] -o [OUTDIR] [-k] [-t [THREADS]] -y [MEMORY] [--Force] [--unlock] [--nolock] [--verbose] [--reason] [--dryrun] snakemake and metacompass params options: -h, --help show this help message and exit required: -c [CONFIG], --config [CONFIG] config (json) file, set read length etc -1 [FORWARD], --forward [FORWARD] Provide comma separated list of forward paired-end reads -2 [REVERSE], --reverse [REVERSE] Provide comma separated list of reverse paired-end reads -U [UNPAIRED], --unpaired [UNPAIRED] Provide comma separated list of unpaired reads (r1.fq,r2.fq,r3.fq) metacompass: -r [REF], --ref [REF] reference genomes -s [REFSEL], --refsel [REFSEL] reference selection [tax/all] -p [PICKREF], --pickref [PICKREF] depth or breadth -m [MINCOV], --mincov [MINCOV] min coverage to assemble -g [MINCTGLEN], --minctglen [MINCTGLEN] min contig length -l [READLEN], --readlen [READLEN] max read length output: -b, --clobber clobber output directory (if exists?) -o [OUTDIR], --outdir [OUTDIR] output directory? (cwd default) -k, --keepoutput keep all output generated (default is to delete all but final fasta files) performance: -t [THREADS], --threads [THREADS] num threads -y [MEMORY], --memory [MEMORY] memory snakemake: --Force force snakemake to rerun --unlock unlock snakemake locks --nolock remove stale locks --verbose verbose --reason reason --dryrun dryrun
And of course, you can run Python code directly from within the code cells of your Jupyter Notebook, like so:
print("Jupyter Notebook cells are Python cells by default.")
Jupyter Notebook cells are Python cells by default.