sampledoc
News and Announcements »

Denoising and chimera detection usage differences in QIIME

This tutorial covers some of the main differences in the utilization of the various denoising and chimera detection software implemented in QIIME.

The overview tutorial describes the steps one would use to process 454 data without denoising or chimera detection. The data processing can be roughly summarized as the following:

  1. SFF (raw 454 data) -> 2. fasta/qual files -> 3. demultiplexing/quality filtering -> 4. OTU picking -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing

Differences from the default pipeline listed above will be used to describe how each denoising/chimera detection software integrates into the QIIME software package.

Ampliconnoise

Ampliconnoise uses flowgram files generated from SFF files to denoise 454 data and optionally detect chimeras. See script details here: ampliconnoise.py

Ampliconnoise effectively replaces the demultiplexing/quality filtering step above, making the pipeline this:

  1. SFF (raw 454 data) -> 2. flowgram (sff.txt) -> 3. ampliconnoise.py (plus suggested step of reverse primer removal) -> 4. OTU picking -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing

Barcodes and forward primers are removed by ampliconnoise.py, however, reverse primers at the end of the sequence may be retained, so it is strongly recommended that truncate_reverse_primer.py be run immediately after ampliconnoise.py so the reverse primer and subsequent sequence does not interfere with downstream steps.

Denoiser

Denoiser also utilizes flowgram files to detect and correct sequencing errors (but not chimeras). However, it utilizes the output of split_libraries.py to limit the sequences tested to those present in the output fasta file generated by split_libraries.py. Reverse primer removal with truncate_reverse_primer.py is also strongly encouraged. The steps involved are:

  1. SFF (raw 454 data) -> 2. fasta/qual/flowgram (sff.txt) files -> 3. split_libraries.py -> 4. denoise_wrapper.py (plus suggested step of reverse primer removal) -> 4. OTU picking -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing

USEARCH

Usearch uses cluster abundance for de novo chimera detection, a reference sequence set for reference based chimera detection, and a cluster size filtering step (which is similar to filtering singletons as a rough but fast way to remove noise from data), and clusters sequences into OTUs. Usearch is used after demultiplexing sequences, so the steps for processing data are:

  1. SFF (raw 454 data) -> 2. fasta/qual files -> 3. demultiplexing/quality filtering -> 4. OTU picking/chimera detection/low abundance cluster filtering with usearch implementation in pick_otus.py -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing

ChimeraSlayer

ChimeraSlayer utilizes a reference dataset to detect potential chimeras in a representative sequence set. The processing pipeline is:

  1. SFF (raw 454 data) -> 2. fasta/qual files -> 3. demultiplexing/quality filtering -> 4. OTU picking -> 5. representative sequences -> 6. Chimera detection with identify_chimeric_seqs.py -> 7. Filter chimeras as described here. -> 8. taxonomic assignments/tree building -> 9. OTU table and downstream processing

sampledoc