Nanopore sequencing analysis

Author: Magali Hennion.
Last update : July 2023.
Collaboration between Laure Ferry (EpiG) and Magali Hennion (BiBs), with the help of Olivier Kirsh and Elouan Bethuel, Epigenetics and Cell Fate lab.

Table of content

Run
Basecalling
- Options:
Basic QC on IFB cluster
- Add missing libraries (only once)
- Run the analysis
Methylation analysis
- Copy the processed data (bam files only) to the cluster
- Run Methylator

Run

We followed this protocol.

Basecalling

After the run open the Windows terminal “invite de commandes”, and start the basecalling using “dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg” model.

> "C:\Program Files\OxfordNanopore\MinKNOW\guppy\bin\guppy_basecaller.exe" --input_path E:\data\Nanopore\RUNS\2023XXX_runID\no_sample\PATH2\pod5 --recursive --save_path  E:\data\Nanopore\RUNS\2023XXX_runID\basecalled --config dna_r10.4_e8.1_modbases_5mc_cg_sup.cfg --align_ref C:\data\Nanopore\References\mm39.fa --compress_fastq --bam_out --device cuda:all --chunks_per_runner 256

Replacing 2023XXX_runID by your run folder.

Options:

--input_path path to the pod5 files
--recursive -> use all pod5 files in all the folders (contained in input_path)
--save_path path to the folder for the results
--config configuration file
--align_ref path to the reference fasta file
--compress_fastq -> compress FASTQ files to save space
--bam_out -> do the mapping and output bam file (with the methylation)
--device cuda:all -> to use GPU
--chunks_per_runner 256 -> configuration for GPU usage (optimized for Angus)

It is possible to use other config cfg file. Check in the folder C:\Program Files\OxfordNanopore\MinKNOW\guppy\data\ what is available.

Look at ONT resources for the analysis: https://nanoporetech.com/support/nanopore-sequencing-data-analysis.

Basic QC on IFB cluster

See the introduction to IFB cluster. Connect to the Jupyter Hub. You need 20 Gb to run the analysis, so you have to increase the RAM when starting your Jupyter session. For now we work in edc_nanopore project.

Add missing libraries (only once)

Open a new notebook in Python 3.9. Type the following commands:

!pip install aplanat

!pip install epi2melabs

Run the analysis

The file necessary for the basic QC is sequencing_summary.txt obtained AFTER basecalling. Upload this file to the cluster.

Open Template_basicQC_v1.1.ipynb and save as RUNID_basicQC_v1.1.ipynb. Then run the cells adapting the path to your sequencing_summary.txt and choosing the name of your HTML report.

Template_basicQC_v1.1.ipynb can downloaded here.

For adaptive sampling, use Template_QC_adaptive_v1.0.ipynb.

Methylation analysis

Copy the processed data (bam files only) to the cluster

Open Ubuntu shell and type:

cd /mnt/e/Data/Nanopore/RUNS

Copy the bamfiles to the cluster (here only pass files)

rsync -av /mnt/e/Data/Nanopore/RUNS/2023XXX_runID/basecalled/pass/*bam ferry@ipop-up.rpbs.univ-paris-diderot.fr:/shared/projects/nano4edc/BAM_files/2023XXX_runID

Run Methylator

As the workflow is under development, get the last version before running… [Nota: For now in WGBSflow folder].

Modify the configuration files (config_nanopore.yaml and metadata.tsv in configs folder). You can rename metadata.tsv (here to metadata_rrms.tsv) and adjust the path in config_nanopore.yaml.

Here is an example of config_nanopore.yaml:

# ============================================================================= #
# ========= Methylator Workflow configuration file (Nanopore data) ============ #
# ============================================================================= #

# Please check the parameters, and adjust them according to your circumstance

# Project name
PROJECT: RRMS

## paths for intermediate final results
BIGDATAPATH: Big_Data # for big files
RESULTPATH: Results

## genome files
GENOMEPATH: /shared/banks/mus_musculus/mm39/fasta/mm39.fa  # path to the reference genome's folder 

## genome
ASSEMBLY: mm39 # mm10 name of the assembly used for the analysis (now use mm39, new version)

## maximum number of cores you want to allocate to one job of the workflow (mapping and feature counting)
NCORE: 32 

## maximum number of jobs running in parallel
NJOBS: 100



# ===================== Configuration for Nanopore data ==================== #
# ========================================================================== #
DATA: "NANOPORE" # do not touch!
METAFILE: configs/metadata_rrms.tsv
NANO_BAM_PATH: /shared/projects/nano4edc/BAM_files/
COMPARISON: [["WT-E14-rrms-1","WT-E14-rrms-2"]]    # [["WT","TKO"], ["WT","WT2"], ["WT2","TKO"]]



# # ===================== Configuration for process BAM files  ================== #
# =========================================================================== #

MERGE: yes # if yes, all BAM files to a same condition are merged  , useful for increase the coverage 
BAMTOBED: yes # if yes, convert BAM to BED files , necessary for Statistical Analysis, use no if your files are yet in BED format online


# =================== Configuration for Statistical Analysis ================ #
# =========================================================================== #


DMT: yes # if yes, perform a differential methylation by tiles (DMT) analysis , if no, perform a DM by Cytosines (DMC)
EDMR: no # if yes, perform a differential methylation analysis by region (Empirical Differentially Methylated Regions, EDMR) only possible with a previous DMC analysis
TILESIZE: 250 
STEPSIZE: 1  # Tiles relative step size

# ===== Exploratory analysis ===== #

## params 
MINCOV: 10  # int, minimum coverage for the analysis
NB_CPG_TILES: 1 
COV.PERC: 99.9 # to the coverage filter, choose the percentile for remove top ..% 
UNITE: all # 'all' or 'one' (at least one per group)
QVALUE: 0.05  # QValue

# ===== Differential analysis ===== #

## params
SIGNIDIF: 10  # SigDiffMeth en %
DIFFCPG: 25
QVALCPG: 0.05


# ======================= Configuration for annotations ===================== # 
# =========================================================================== #

# ===== Standard annotations  ===== #
## GTF 
GTFPATH: /shared/banks/mus_musculus/mm39/gtf/gencode.vM27.annotation.gtf

## CPG Bed 
BEDPATH: /shared/projects/nano4edc/Methylator/cpgIslandExt.mm39.bed

# ===== Customs Annotations ===== #
CUSTOM_ANNOT: no 
METAFILE_ANNOT: configs/metadata_annot.tsv
CUSTOM_ANNOT_PATH: "/shared/projects/wgbs_flow/Elouan/Custom_tracks/"

Here is an example of metadata_rrms.tsv.

sample	group
20230608_E14_mouse_ES_WT_RRMS	WT-E14-rrms-1
20230626_E14_mouse_ES_WT_RRMS_150ng_8kb_pass	WT-E14-rrms-2

The sample have to be the name of the folder containing the BAM files.

When the configuration is fine, start the workflow using the command sbatch WGBSworkflow.sh nanopore.

For instance:

[hennion @ ipop-up 14:29]$ WGBSflow : sbatch WGBSworkflow.sh nanopore
Submitted batch job 1159358`

Cross your fingers.

See the documentation.

BiBs 2024 parisepigenetics
https://github.com/parisepigenetics/bibs

programming pages theme v0.5.22 (https://github.com/pixeldroid/programming-pages)

s	focus search bar ( enter to select, ▲ / ▼ to change selection)
g c	go to cluster
g e	go to edctools
g f	go to facility
g g	go to guidelines
h	toggle this help ( esc also exits)