Pipeline¶
Configuration¶
The pipeline is configured using a YAML file: e.g. config_atac.yml
, config_chip.yml
. We highly recommend using the seqnado-config
command to generate the configuration file as this will prompt the user for the required information and ensure that the configuration file is valid. The configuration file can be edited manually if required e.g. using nano
or the VS Code text editor.
Generate the working directory and configuration file¶
The following command will generate the working directory and configuration file for the ATAC-seq pipeline:
seqnado-config chip
# options
-r, --rerun # Re-runs the config in existing seqnado directory
You should get something like this:
$ seqnado-config chip
What is your project name? [cchahrou_project]: cchahrou_project
What is the genome? [hg38]: hg38
Perform fastqscreen? (yes/no) [no]: yes
Path to fastqscreen config: [/PATH/TO/fastq_screen.conf]: /PATH/TO/fastq_screen.conf
Do you want to remove blacklist regions? (yes/no) [yes]: yes
Remove PCR duplicates? (yes/no) [yes]: yes
Remove PCR duplicates method: [picard/samtools]: picard
Calculate library complexity? (yes/no) [no]: yes
Do you have spikein? (yes/no) [no]: yes
Normalisation method: [orlando/with_input]: orlando
Reference genome: [hg38]: hg38
Spikein genome: [dm6]: dm6
Do you want to make bigwigs? (yes/no) [no]: yes
Pileup method: [deeptools/homer]: deeptools
Do you want to make heatmaps? (yes/no) [no]: yes
Do you want to call peaks? (yes/no) [no]: yes
Peak caller: [lanceotron/macs/homer/seacr]: lanceotron
Do you want to make a UCSC hub? (yes/no) [no]: yes
UCSC hub directory: [seqnado_output/hub/]: seqnado_output/hub/
What is your email address? [email@example.com]: email for UCSC
Color by (for UCSC hub): [samplename]: samplename
Generate GEO submission files (MD5Sums, read count summaries...)? (yes/no) [no]: yes
Perform plotting? (yes/no) [no]: yes
Path to bed file with coordinates for plotting [None]:
Path to bed file with genes. [None]:
Directory '2025-02-11_chip_cchahrou_project/' has been created with the 'config_chip.yml' file.
This will generate the following files:
$ tree 2025-02-11_chip_cchahrou_project/
2025-02-11_chip_cchahrou_project/
├── config_chip.yml
└── fastq/
1 directory, 1 file
Check the tool options¶
In the newly created config yaml file, check the tool options especially for rna quantification!
Edit the configuration file (if required)¶
The configuration file can be edited manually if required e.g. using nano
or the VS Code text editor. Use this if you have made an error in the configuration file or if you want to change it for any other reason.
Warning
If you edit the configuration file manually, you must ensure that it is valid YAML syntax (ensure that you do not delete any colons, commas, or change the indentation).
nano config_chip.yml # Note to exit nano press ctrl+x and then "y" followed by "enter" to save
Organise your fastq files¶
use symlinks from your raw_data:
cd 2025-02-11_chip_cchahrou_project/fastq
ln -s /path/to/raw_data/SampleName_S1_L001_R1_001.fastq.gz samplename1_Antibody_R1.fastq.gz
ln -s /path/to/raw_data/SampleName_S1_L001_R2_001.fastq.gz samplename1_Antibody_R2.fastq.gz
Infer sample names from fastq file names¶
If the fastq files are named in a way that seqnado can infer the sample names, then a design file will be generated automatically:
ChIP-seq
- samplename1_Antibody_R1.fastq.gz
- samplename1_Antibody_R2.fastq.gz
- samplename1_Input_1.fastq.gz
- samplename1_Input_2.fastq.gz
For ATAC-seq:
- sample-name-1_R1.fastq.gz
- sample-name-1_R2.fastq.gz
- sample-name-1_1.fastq.gz
- sample-name-1_2.fastq.gz
For RNA-seq:
- sample-name-1_R1.fastq.gz
- sample-name-1_R2.fastq.gz
- sample-name-1_1.fastq.gz
- sample-name-1_2.fastq.gz
Design file¶
If the fastq files are not named in a way that seqnado can infer the sample names, then a design file can be generated using the seqnado-design
command. You'll need to enter the working directory and generate a design file:
cd ..
seqnado-design chip fastq/*
# Note that you can use tab completion to complete the path to the fastq files
This will generate a design file called design.csv
in the working directory.
Warning
You need to specify the fastq files in the command line to use for the design generation e.g. in the current working directory:
```bash
seqnado-design chip *.fastq.gz
```
Merging replicates or samples¶
To merge samples for counting or bigwig/peak generation add a merge column to the design file
sample_name,r1,r2,scale_group,merge
atac,/ceph/project/milne_group/cchahrou/software/SeqNado/2025-02-11_chip_cchahrou_project/atac_1.fastq.gz,/ceph/project/milne_group/cchahrou/software/SeqNado/2025-02-11_chip_cchahrou_project/atac_2.fastq.gz,all,merge_group
atac2,/ceph/project/milne_group/cchahrou/software/SeqNado/2025-02-11_chip_cchahrou_project/atac_1.fastq.gz,/ceph/project/milne_group/cchahrou/software/SeqNado/2025-02-11_chip_cchahrou_project/atac_2.fastq.gz,all,merge_group
This will merge both to make a merge_group
bigwig and peak file
ATAC|RNA-seq design file¶
An ATAC-seq or RNA-seq design file should look something like this:
sample,r1,r2
rna,/path/to/fastq/rna_2.fastq.gz,/path/to/fastq/rna_1.fastq.gz
Note
The design file is a CSV file with the following columns:
* sample
- The sample name. Altering this will change the name of the output files so can be useful for renaming samples.
* r1
- The path to the read 1 fastq file
* r2
- The path to the read 2 fastq file
ChIP-seq design file¶
A ChIP assay design file should look something like this:
sample_name,ip,control,ip_r1,ip_r2,control_r1,control_r2,scale_group
chip-rx,MLL,input,fastq/chip-rx_MLL_1.fastq.gz,fastq/chip-rx_MLL_2.fastq.gz,fastq/chip-rx_input_1.fastq.gz,fastq/chip-rx_input_2.fastq.gz,all
Note
The design file is a CSV file with the following columns:
* sample
- The sample name. Altering this will change the name of the output files so can be useful for renaming samples.
* ip_r1
- The path to the IP read 1 fastq file
* ip_r2
- The path to the IP read 2 fastq file
* control_r1
- The path to the control read 1 fastq file
* control_r2
- The path to the control read 2 fastq file
* ip
- The name of the IP sample
* control
- The name of the control sample
RNA-seq design file¶
An RNA-seq design file should look something like this:
sample,r1,r2
rna,/path/to/fastq/rna_2.fastq.gz,/path/to/fastq/rna_1.fastq.gz
If you want to run DeSeq2, then you will need to add an additional column to the design file to indicate which samples are in the control group:
sample,r1,r2,deseq2
rna1,/path/to/fastq/rna1_2.fastq.gz,/path/to/fastq/rna1_1.fastq.gz,control
rna2,/path/to/fastq/rna2_2.fastq.gz,/path/to/fastq/rna2_1.fastq.gz,control
rna3,/path/to/fastq/rna3_2.fastq.gz,/path/to/fastq/rna3_1.fastq.gz,control
rna4,/path/to/fastq/rna4_2.fastq.gz,/path/to/fastq/rna4_1.fastq.gz,treated
rna5,/path/to/fastq/rna5_2.fastq.gz,/path/to/fastq/rna5_1.fastq.gz,treated
rna6,/path/to/fastq/rna6_2.fastq.gz,/path/to/fastq/rna6_1.fastq.gz,treated
Running the pipeline¶
Ensure files are in the correct location¶
Before running the pipeline, ensure that the fastq files, and design are in the correct location:
# Fastq files
ln -s /path/to/fastq_files/ /path/to/working-directory/made-by-seqnado-config/
# Design
mv /path/to/design.csv /path/to/working-directory/made-by-seqnado-config/
Check you are in the correct directory¶
$ ls -l
-rw-r--r-- 1 asmith asmithgrp 1845 Jan 13 10:50 config_rna.yml
-rw-r--r-- 1 asmith asmithgrp 14784 Jan 13 10:50 deseq2_test.qmd
-rw-r--r-- 1 asmith asmithgrp 155 Jan 13 14:40 design.csv
-rw-r--r-- 1 asmith asmithgrp 3813176 Jan 13 10:50 rna_1.fastq.gz
-rw-r--r-- 1 asmith asmithgrp 3836966 Jan 13 10:50 rna_2.fastq.gz
Ensure that the pipeline will not stop when you log out¶
screen -S NAME_OF_SESSION
# to exit screen session
ctrl+a d
# or
tmux new -s NAME_OF_SESSION
# to detach from tmux session
ctrl+b d
Check you have activated the conda environment¶
conda activate seqnado
Run the pipeline¶
The pipeline can be run using the following command:
seqnado [atac|chip|rna|snp] -c <number of cores> --preset [ss|ls]
# additional options
--queue/-q [short|long] --scale-resource/-s <factor to multiply resources>
An actual example would be:
seqnado rna -c 8 --preset ss -q short
Note
- To visualise which tasks will be performed by the pipeline before running.
seqnado atac -c 1 --preset ss --dag | dot -Tpng > dag.png
Pipeline Errors¶
Check the log file for errors:
# Look at all of the log files
cat seqnado_error.log
# Look for errors in the log files
cat seqnado_error.log | grep exception -A 10 -B 10