Set-up a Snakemake profile¶
This is not essential but it will make running the pipeline much easier by submitting jobs to the cluster automatically and using pre-set parameters.
Note: Cookiecutter is required for this step. This can be installed using pip install cookiecutter
.
For SLURM based clusters:¶
# create config directory that snakemake searches for profiles (or use something else)
profile_dir="${HOME}/.config/snakemake"
mkdir -p "$profile_dir"
# use cookiecutter to create the profile in the config directory
template="gh:Snakemake-Profiles/slurm"
cookiecutter --output-dir "$profile_dir" "$template"
For SGE based clusters:¶
Warning
This has not been tested
mkdir -p ~/.config/snakemake
cd ~/.config/snakemake
cookiecutter https://github.com/Snakemake-Profiles/sge.git
Example SLURM profile:¶
/home/a/asmith/.config/snakemake/slurm/
├── config.yaml
├── CookieCutter.py
├── __pycache__
│ ├── CookieCutter.cpython-310.pyc
│ ├── CookieCutter.cpython-311.pyc
│ ├── slurm_utils.cpython-310.pyc
│ └── slurm_utils.cpython-311.pyc
├── settings.json
├── slurm-jobscript.sh
├── slurm-sidecar.py
├── slurm-status.py
├── slurm-submit.py
└── slurm_utils.py
settings.json
:
{
"SBATCH_DEFAULTS": "--partition=short --time=0-01:00:00 --mem=3G",
"CLUSTER_NAME": "",
"CLUSTER_CONFIG": ""
}
config.yaml
:
cluster-sidecar: "slurm-sidecar.py"
cluster-cancel: "scancel"
restart-times: "0"
jobscript: "slurm-jobscript.sh"
cluster: "slurm-submit.py"
cluster-status: "slurm-status.py"
max-jobs-per-second: "10"
max-status-checks-per-second: "10"
local-cores: 1
latency-wait: "5"
use-conda: "True"
use-singularity: "False"
singularity-args: -B /ceph -B /databank -B $TMPDIR --cleanenv
jobs: "50"
printshellcmds: "True"
retries: 3
# Example resource configuration
# default-resources:
# - runtime=100
# - mem_mb=6000
# - disk_mb=1000000
# # set-threads: map rule names to threads
# set-threads:
# - single_core_rule=1
# - multi_core_rule=10
# # set-resources: map rule names to resources in general
# set-resources:
# - high_memory_rule:mem_mb=12000
# - long_running_rule:runtime=1200
Note: The singularity-args are required to mount the data directories into the container. e.g.
singularity-args: -B /ceph -B /databank
Gives the container access to the /ceph
and /databank
directories on the cluster. The current working directory is also mounted into the container by default. You can add additional directories by adding more -B
flags. Obviously this will be different for each cluster so you'll need your own defaults. The $TMPDIR
is also mounted as this causes errors if not. The --cleanenv
flag is also required to prevent the container from inheriting the environment from the host.