sampledoc
News and Announcements »

denoiser_preprocess.py – Run phase of denoiser algorithm: prefix clustering

Description:

The script denoiser_preprocess.py runs the first clustering phase which groups reads based on common prefixes.

Usage: denoiser_preprocess.py [options]

Input Arguments:

Note

[REQUIRED]

-i, --input_file
Path to flowgram file [REQUIRED]

[OPTIONAL]

-f, --fasta_file
Path to fasta input file [default: None]
-s, --squeeze
Use run-length encoding for prefix filtering [default: False]
-l, --log_file
Path to log file [default: preprocess.log]
-p, --primer
Primer sequence used for the amplification [default: CATGCTGCCTCCCGTAGGAGT]
-o, --output_dir
Path to output directory [default: /tmp/]

Output:

prefix_dereplicated.sff.txt: human readable sff file containing the flowgram of the
cluster representative of each cluster.

prefix_dereplicated.fasta: Fasta file containing the cluster representative of each cluster.

prefix_mapping.txt: This file contains the actual clusters. The cluster centroid is given first,
the cluster members follw after the ‘:’.

Run program on flowgrams in 454Reads.sff. Remove reads which are not in split_lib_filtered_seqs.fasta. Remove primer CATGCTGCCTCCCGTAGGAGT from reads before running phase I

denoiser_preprocess.py -i 454Reads.sff.txt -f split_lib_filtered_seqs.fasta -p CATGCTGCCTCCCGTAGGAGT

Site index


sampledoc