sampledoc
News and Announcements »

denoiser.py – Remove noise from 454 sequencing data

Description:

The denoiser removes sequencing noise characteristic to pyrosequencing by flowgram clustering. For a detailed explanation of the underlying algorithm see (Reeder and Knight, Nature Methods 7(9), 2010).

Usage: denoiser.py [options]

Input Arguments:

Note

[REQUIRED]

-i, --input_files
Path to flowgram files (.sff.txt), comma separated

[OPTIONAL]

-f, --fasta_fp
Path to fasta input file. Reads not in the fasta file are filtered out before denoising. File format is as produced by split_libraries.py [default: None]
-o, --output_dir
Path to output directory [default: random dir in ./]
-c, --cluster
Use cluster/multiple CPUs for flowgram alignments [default: False]
-p, --preprocess_fp
Do not do preprocessing (phase I),instead use already preprocessed data in PREPROCESS_FP
--checkpoint_fp
Resume denoising from checkpoint. Be careful when changing parameters for a resumed run. Requires -p option. [default: None]
-s, --squeeze
Use run-length encoding for prefix filtering in phase I [default: False]
-S, --split
Split input into per library sets and denoise separately [default: False]
--force
Force overwrite of existing directory [default: False]
--primer
Primer sequence [default: CATGCTGCCTCCCGTAGGAGT]
-n, --num_cpus
Number of cpus, requires -c [default: 1]
-m, --max_num_iterations
Maximal number of iterations in phase II. None means unlimited iterations [default: None]
-b, --bail_out
Stop clustering in phase II with clusters smaller or equal than BAILde [default: 1]
--percent_id
Sequence similarity clustering threshold [default: 0.97]
--low_cut-off
Low clustering threshold for phase II [default: 3.75]
--high_cut-off
High clustering threshold for phase III [default: 4.5]
--low_memory
Use slower, low memory method [default: False]
-e, --error_profile
Path to error profile [default= /Users/caporaso/code/qiime/qiime/support_files/denoiser/Data/FLX_error_profile.dat]
--titanium
Shortcut for -e /Users/caporaso/code/qiime/qiime/support_files/denoiser/Data//Titanium_error_profile.dat –low_cut-off=4 –high_cut_off=5 . Warning: overwrites all previous cut-off values [DEFAULT: False]

Output:

centroids.fasta: The cluster representatives of each cluster

singletons.fasta: contains all unclustered reads

denoiser_mapping.txt: This file contains the actual clusters. The cluster centroid is given first,
the cluster members follow after the ‘:’.

checkpoints/ : directory with checkpoints

Note that the centroids and singleton files are disjoint. For most downstream analyses one wants to cat the two files.

Run denoiser on flowgrams in 454Reads.sff.txt with read-to-barcode mapping in seqs.fna, put results into Outdir, log progress in Outdir/denoiser.log

denoiser.py -i 454Reads.sff.txt -f seqs.fna -v -o Outdir

Multiple sff.txt files:

Run denoiser on two flowgram files in 454Reads_1.sff.txt and 454Reads_2.sff.txt with read-to-barcode mapping in seqs.fna, put results into Outdir, log progress in Outdir/denoiser.log

denoiser.py -i 454Reads_1.sff.txt,454Reads_2.sff.txt -f seqs.fna -v -o Outdir

Denoise multiple library separately:

Run denoiser on flowgrams in 454Reads.sff.txt with read-to-barcode mapping in seqs.fna, split input files into libraries and process each library separately, put results into Outdir, log progress in Outdir/denoiser.log

denoiser.py -S -i 454Reads.sff.txt -f seqs.fna -v -o Outdir

Resuming a failed run:

Resume a previous denoiser run from breakpoint stored in Outdir_from_failed_run/checkpoints/checkpoint100.pickle. The checkpoint option requires the -p or –preprocess option, which usually can be set to the output dir of the failed run. All other arguments must be identical to the failed run.

denoiser.py -i 454Reads.sff.txt -f seqs.fna -v -o Outdir_resumed -p Outdir_from_failed_run --checkpoint Outdir_from_failed_run/checkpoints/checkpoint100.pickle

Site index


sampledoc