# README - Project Container

Author: Vi Varga

Last Modified: 19.02.2024


## Introduction

This README.ipynb file provides a brief explanation/guide for how to use the container that has been prepared for students in BBT045 to use for their projects. A README.md file will also be provided, with all markdown-formatted text included therein. 

The `bbt045-projects.sif` container has been created for students to use to run their projects, so that students do not have to install their own software on the Vera cluster. Please contact the teaching staff (especially Vi), if you would like access to a program that is not included in the container, or if something malfunctions. 

If you do not have it already, you can download this information in Jupyter Notebook format from [here](README.ipynb).


## Installed software

The full list of programs installed in the `bbt045-projects.sif` container can be found in the `bbt045-projects.yml` and `conda_environment_args_proj.def` files included in the same directory as the container (`/cephyr/NOBACKUP/groups/bbt045_2024/ProjectSoftware/`). Below is a list of the most important software: 
 - FastQC
 - TrimGalore!
 - Trimmomatic
 - MetaCompass
 - SPAdes (including metaSPAdes)
 - Prokka
 - CD-HIT
 - MetaPhlan2
 - Bowtie2
 - Python
    - Biopython
    - Jupyter
    - matplotlib, seaborn
    - numpy, pandas
    - scipy


## Using the container

In order to use the `bbt045-projects.sif` container, please use the `run_jupyter_proj.sh` script found in the same directory as the container, and modify the time requirement and ID as you have done for the `run_jupyter.sh` script before. ALternatively, you can continue using your copy of the `run_jupyter.sh`, script, and simply change the PATH to the container to read: 

```bash
container=/cephyr/NOBACKUP/groups/bbt045_2024/ProjectSoftware/bbt045-projects.sif
```

Of the programs mentioned above, all but MetaCompass have been installed using `conda`. All programs installed via `conda` can be run directly from within your Jupyter Notebook, like so: 

In [3]:
! metaspades.py -h

SPAdes genome assembler v3.15.5 [metaSPAdes mode]

Usage: spades.py [options] -o <output_dir>

Basic options:
  -o <output_dir>             directory to store all the resulting files (required)
  --iontorrent                this flag is required for IonTorrent data
  --test                      runs SPAdes on toy dataset
  -h, --help                  prints this usage message
  -v, --version               prints version

Input data:
  --12 <filename>             file with interlaced forward and reverse paired-end reads
  -1 <filename>               file with forward paired-end reads
  -2 <filename>               file with reverse paired-end reads
  -s <filename>               file with unpaired reads
  --merged <filename>         file with merged forward and reverse paired-end reads
  --pe-12 <#> <filename>      file with interlaced reads for paired-end library number <#>.
                              Older deprecated syntax is -pe<#>-12 <filename>
  --pe-1 <#> <filename>       file w

MetaCompass does not have a `conda` package available, so it has been installed in the container from source. In order to use it, you must call the program using the full path to the executable, like so: 

In [2]:
! /opt/MetaCompass-2.0-beta/go_metacompass.py -h

MetaCompass metagenome assembler version 2.0.0 by Victoria Cepeda (vcepeda@cs.umd.edu)

usage: go_metacompass.py [-h] [-c [CONFIG]] [-1 [FORWARD]] [-2 [REVERSE]]
                         [-U [UNPAIRED]] [-r [REF]] [-s [REFSEL]]
                         [-p [PICKREF]] [-m [MINCOV]] [-g [MINCTGLEN]]
                         [-l [READLEN]] [-b] -o [OUTDIR] [-k] [-t [THREADS]]
                         -y [MEMORY] [--Force] [--unlock] [--nolock]
                         [--verbose] [--reason] [--dryrun]

snakemake and metacompass params

options:
  -h, --help            show this help message and exit

required:
  -c [CONFIG], --config [CONFIG]
                        config (json) file, set read length etc
  -1 [FORWARD], --forward [FORWARD]
                        Provide comma separated list of forward paired-end
                        reads
  -2 [REVERSE], --reverse [REVERSE]
                        Provide comma separated list of reverse paired-end
                        reads
  -U [

And of course, you can run Python code directly from within the code cells of your Jupyter Notebook, like so: 

In [4]:
print("Jupyter Notebook cells are Python cells by default.")

Jupyter Notebook cells are Python cells by default.
