Pipeline¶
Configuration¶
The pipeline is configured using a YAML file: e.g. config_atac.yml
, config_chip.yml
. We highly recommend using the seqnado-config
command to generate the configuration file as this will prompt the user for the required information and ensure that the configuration file is valid. The configuration file can be edited manually if required e.g. using nano
or the VS Code text editor.
Generate the working directory and configuration file¶
The following command will generate the working directory and configuration file for the ATAC-seq pipeline:
seqnado-config chip
# options
-r, --rerun # Re-run the config
-g, --genome [dm6|hg19|hg38|hg38_dm6|hg38_mm39|hg38_spikein|mm10|mm39|other] # Genome to use if genome preset is configured
You should get something like this:
$ seqnado-config chip
What is your project name? [cchahrou_project]: TEST
What is your genome name? [other]: hg38
Path to Bowtie2 genome indices: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/bt2_index/hg38
Path to chromosome sizes file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/sequence/hg38.chrom.sizes
Path to GTF file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/genes/hg38.ncbiRefSeq.gtf
Path to blacklist bed file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/hg38-blacklist.v2.bed.gz
Do you want to remove blacklist regions? (yes/no) [yes]: yes
Remove PCR duplicates? (yes/no) [yes]: yes
Remove PCR duplicates method: [picard]: picard
Do you have spikein? (yes/no) [no]: yes
Normalisation method: [orlando/with_input]: orlando
Reference genome: [hg38]: hg38
Spikein genome: [dm6]: dm6
Path to fastqscreen config: [/ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf]: /ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf
Do you want to make bigwigs? (yes/no) [no]: yes
Pileup method: [deeptools/homer]: deeptools
Do you want to make heatmaps? (yes/no) [no]: yes
Do you want to call peaks? (yes/no) [no]: yes
Peak caller: [lanceotron/macs/homer]: lanceotron
Do you want to make a UCSC hub? (yes/no) [no]: yes
UCSC hub directory: [/path/to/ucsc_hub/]: /project/milne_group/datashare/etc
What is your email address? [cchahrou@example.com]: email for UCSC
Color by (for UCSC hub): [samplename]: samplename
Directory '2024-01-26_chip_TEST' has been created with the 'config_chip.yml' file.
This will generate the following files:
$ tree 2024-01-13_chip_test/
2024-01-13_chip_test/
├── config_chip.yml
└── readme_test.md
0 directories, 2 files
Edit the configuration file (if required)¶
The configuration file can be edited manually if required e.g. using nano
or the VS Code text editor. Use this if you have made an error in the configuration file or if you want to change it for any other reason.
Warning
If you edit the configuration file manually, you must ensure that it is valid YAML syntax (ensure that you do not delete any colons, commas, or change the indentation). You can check that the file is valid using the following command:
nano config_chip.yml # Note to exit nano press ctrl+x and then "y" followed by "enter" to save
Create a design file (optional)¶
Infer sample names from fastq file names¶
If the fastq files are named in a way that seqnado can infer the sample names, then a design file will be generated automatically:
ChIP-seq
- samplename1_Antibody_R1.fastq.gz
- samplename1_Antibody_R2.fastq.gz
- samplename1_Input_1.fastq.gz
- samplename1_Input_2.fastq.gz
For ATAC-seq:
- sample-name-1_R1.fastq.gz
- sample-name-1_R2.fastq.gz
- sample-name-1_1.fastq.gz
- sample-name-1_2.fastq.gz
For RNA-seq:
- sample-name-1_R1.fastq.gz
- sample-name-1_R2.fastq.gz
- sample-name-1_1.fastq.gz
- sample-name-1_2.fastq.gz
Use seqnado-design
to generate a design file¶
If the fastq files are not named in a way that seqnado can infer the sample names, then a design file can be generated using the seqnado-design
command. You'll need to enter the working directory and generate a design file:
cd 2024-01-13_test/
seqnado-design chip /path/to/fastq/files/* # Note that you can use tab completion to complete the path to the fastq files
This will generate a design file called design.csv
in the working directory.
Warning
You need to specify the fastq files in the command line to use for the design generation e.g. in the current working directory:
```bash
seqnado-design chip *.fastq.gz
```
ATAC|RNA-seq design file¶
An ATAC-seq or RNA-seq design file should look something like this:
sample,r1,r2
rna,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna_1.fastq.gz
Note
The design file is a CSV file with the following columns:
* sample
- The sample name. Altering this will change the name of the output files so can be useful for renaming samples.
* r1
- The path to the read 1 fastq file
* r2
- The path to the read 2 fastq file
ChIP-seq design file¶
A ChIP assay design file should look something like this:
sample,ip_r1,ip_r2,control_r1,control_r2,ip,control
CTCF,CTCF_CTCF_2.fastq.gz,CTCF_CTCF_1.fastq.gz,CTCF_input_2.fastq.gz,CTCF_input_1.fastq.gz,CTCF,input
Note
The design file is a CSV file with the following columns:
* sample
- The sample name. Altering this will change the name of the output files so can be useful for renaming samples.
* ip_r1
- The path to the IP read 1 fastq file
* ip_r2
- The path to the IP read 2 fastq file
* control_r1
- The path to the control read 1 fastq file
* control_r2
- The path to the control read 2 fastq file
* ip
- The name of the IP sample
* control
- The name of the control sample
RNA-seq design file¶
An RNA-seq design file should look something like this:
sample,r1,r2
rna,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna_1.fastq.gz
If you want to run DeSeq2, then you will need to add an additional column to the design file to indicate which samples are in the control group:
sample,r1,r2,deseq2
rna1,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna1_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna1_1.fastq.gz,control
rna2,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna2_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna2_1.fastq.gz,control
rna3,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna3_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna3_1.fastq.gz,control
rna4,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna4_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna4_1.fastq.gz,treated
rna5,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna5_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna5_1.fastq.gz,treated
rna6,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna6_2.fastq.gz,/tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna6_1.fastq.gz,treated
Running the pipeline¶
Ensure files are in the correct location¶
Before running the pipeline, ensure that the fastq files, and design are in the correct location:
# Fastq files
ln -s /path/to/fastq_files/ /path/to/working-directory/made-by-seqnado-config/
# Design
mv /path/to/design.csv /path/to/working-directory/made-by-seqnado-config/
Check you are in the correct directory¶
$ ls -l
-rw-r--r-- 1 asmith asmithgrp 1845 Jan 13 10:50 config_rna.yml
-rw-r--r-- 1 asmith asmithgrp 14784 Jan 13 10:50 deseq2_test.qmd
-rw-r--r-- 1 asmith asmithgrp 155 Jan 13 14:40 design.csv
-rw-r--r-- 1 asmith asmithgrp 3813176 Jan 13 10:50 rna_1.fastq.gz
-rw-r--r-- 1 asmith asmithgrp 3836966 Jan 13 10:50 rna_2.fastq.gz
Ensure that the pipeline will not stop when you log out¶
tmux new -s NAME_OF_SESSION
# or
screen -S NAME_OF_SESSION
# to detach from tmux session
ctrl+b d
# to exit screen session
ctrl+a d
Check you have activated the conda environment¶
conda activate seqnado
Run the pipeline¶
The pipeline can be run using the following command:
seqnado [atac|chip|rna|snp] -c <number of cores> --preset [ss|ls]
An actual example would be:
seqnado rna -c 8 --preset ss
Note
- To visualise which tasks will be performed by the pipeline before running.
seqnado atac -c 1 --preset ss --dag | dot -Tpng > dag.png
Pipeline Errors¶
Check the log file for errors:
# Look at all of the log files
cat seqnado_error.log
# Look for errors in the log files
cat seqnado_error.log | grep exception -A 10 -B 10