Nanopore sequencing analysis
Author: Magali Hennion.
Last update : July 2023.
Collaboration between Laure Ferry (EpiG) and Magali Hennion (BiBs), with the help of Olivier Kirsh and Elouan Bethuel, Epigenetics and Cell Fate lab.
Table of content
Run
We followed this protocol.
Basecalling
After the run open the Windows terminal “invite de commandes”, and start the basecalling using “dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg” model.
> "C:\Program Files\OxfordNanopore\MinKNOW\guppy\bin\guppy_basecaller.exe" --input_path E:\data\Nanopore\RUNS\2023XXX_runID\no_sample\PATH2\pod5 --recursive --save_path E:\data\Nanopore\RUNS\2023XXX_runID\basecalled --config dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg --align_ref C:\data\Nanopore\References\mm39.fa --compress_fastq --bam_out --device cuda:all --chunks_per_runner 256
Replacing 2023XXX_runID
by your run folder.
Options:
--input_path
path to the pod5 files
--recursive
-> use all pod5 files in all the folders (contained in input_path)
--save_path
path to the folder for the results
--config
configuration file
--align_ref
path to the reference fasta file
--compress_fastq
-> compress FASTQ files to save space
--bam_out
-> do the mapping and output bam file (with the methylation)
--device cuda:all
-> to use GPU
--chunks_per_runner 256
-> configuration for GPU usage (optimized for Angus)
It is possible to use other config cfg
file. Check in the folder C:\Program Files\OxfordNanopore\MinKNOW\guppy\data\
what is available.
Look at ONT resources for the analysis:
https://nanoporetech.com/support/nanopore-sequencing-data-analysis.
Basic QC on IFB cluster
See the introduction to IFB cluster. Connect to the Jupyter Hub. You need 20 Gb to run the analysis, so you have to increase the RAM when starting your Jupyter session.
For now we work in edc_nanopore
project.
Add missing libraries (only once)
Open a new notebook in Python 3.9.
Type the following commands:
Run the analysis
The file necessary for the basic QC is sequencing_summary.txt
obtained AFTER basecalling. Upload this file to the cluster.
Open Template_basicQC_v1.1.ipynb
and save as RUNID_basicQC_v1.1.ipynb
. Then run the cells adapting the path to your sequencing_summary.txt
and choosing the name of your HTML report.
Template_basicQC_v1.1.ipynb
can downloaded here.
For adaptive sampling, use Template_QC_adaptive_v1.0.ipynb
.
Methylation analysis
Copy the processed data (bam files only) to the cluster
Open Ubuntu shell and type:
cd /mnt/e/Data/Nanopore/RUNS
Copy the bamfiles to the cluster (here only pass files)
rsync -av /mnt/e/Data/Nanopore/RUNS/2023XXX_runID/basecalled/pass/*bam ferry@ipop-up.rpbs.univ-paris-diderot.fr:/shared/projects/nano4edc/BAM_files/2023XXX_runID
Run Methylator
As the workflow is under development, get the last version before running… [Nota: For now in WGBSflow folder].
Modify the configuration files (config_nanopore.yaml
and metadata.tsv
in configs
folder). You can rename metadata.tsv
(here to metadata_rrms.tsv
) and adjust the path in config_nanopore.yaml
.
Here is an example of config_nanopore.yaml
:
# ============================================================================= #
# ========= Methylator Workflow configuration file (Nanopore data) ============ #
# ============================================================================= #
# Please check the parameters, and adjust them according to your circumstance
# Project name
PROJECT: RRMS
## paths for intermediate final results
BIGDATAPATH: Big_Data # for big files
RESULTPATH: Results
## genome files
GENOMEPATH: /shared/banks/mus_musculus/mm39/fasta/mm39.fa # path to the reference genome's folder
## genome
ASSEMBLY: mm39 # mm10 name of the assembly used for the analysis (now use mm39, new version)
## maximum number of cores you want to allocate to one job of the workflow (mapping and feature counting)
NCORE: 32
## maximum number of jobs running in parallel
NJOBS: 100
# ===================== Configuration for Nanopore data ==================== #
# ========================================================================== #
DATA: "NANOPORE" # do not touch!
METAFILE: configs/metadata_rrms.tsv
NANO_BAM_PATH: /shared/projects/nano4edc/BAM_files/
COMPARISON: [["WT-E14-rrms-1","WT-E14-rrms-2"]] # [["WT","TKO"], ["WT","WT2"], ["WT2","TKO"]]
# # ===================== Configuration for process BAM files ================== #
# =========================================================================== #
MERGE: yes # if yes, all BAM files to a same condition are merged , useful for increase the coverage
BAMTOBED: yes # if yes, convert BAM to BED files , necessary for Statistical Analysis, use no if your files are yet in BED format online
# =================== Configuration for Statistical Analysis ================ #
# =========================================================================== #
DMT: yes # if yes, perform a differential methylation by tiles (DMT) analysis , if no, perform a DM by Cytosines (DMC)
EDMR: no # if yes, perform a differential methylation analysis by region (Empirical Differentially Methylated Regions, EDMR) only possible with a previous DMC analysis
TILESIZE: 250
STEPSIZE: 1 # Tiles relative step size
# ===== Exploratory analysis ===== #
## params
MINCOV: 10 # int, minimum coverage for the analysis
NB_CPG_TILES: 1
COV.PERC: 99.9 # to the coverage filter, choose the percentile for remove top ..%
UNITE: all # 'all' or 'one' (at least one per group)
QVALUE: 0.05 # QValue
# ===== Differential analysis ===== #
## params
SIGNIDIF: 10 # SigDiffMeth en %
DIFFCPG: 25
QVALCPG: 0.05
# ======================= Configuration for annotations ===================== #
# =========================================================================== #
# ===== Standard annotations ===== #
## GTF
GTFPATH: /shared/banks/mus_musculus/mm39/gtf/gencode.vM27.annotation.gtf
## CPG Bed
BEDPATH: /shared/projects/nano4edc/Methylator/cpgIslandExt.mm39.bed
# ===== Customs Annotations ===== #
CUSTOM_ANNOT: no
METAFILE_ANNOT: configs/metadata_annot.tsv
CUSTOM_ANNOT_PATH: "/shared/projects/wgbs_flow/Elouan/Custom_tracks/"
Here is an example of metadata_rrms.tsv
.
sample group
20230608_E14_mouse_ES_WT_RRMS WT-E14-rrms-1
20230626_E14_mouse_ES_WT_RRMS_150ng_8kb_pass WT-E14-rrms-2
The sample have to be the name of the folder containing the BAM files.
When the configuration is fine, start the workflow using the command sbatch WGBSworkflow.sh nanopore
.
For instance:
[hennion @ ipop-up 14:29]$ WGBSflow : sbatch WGBSworkflow.sh nanopore
Submitted batch job 1159358`
Cross your fingers.
See the documentation.