News and Announcements » |
QIIME provides three high-level protocols for OTU picking. These can be described as de novo, closed-reference, and open-reference OTU picking, and are accessible through pick_de_novo_otus.py, pick_closed_reference_otus.py, and pick_open_reference_otus.py. Each of these protocols are briefly described in this document; for a more detailed discussion of these OTU picking protocols, please see Rideout et al. (2014).
Open-reference OTU picking with pick_open_reference_otus.py is the preferred strategy for OTU picking among the QIIME developers.
Note
QIIME does not actually implement OTU picking algorithms, but rather wraps external OTU clustering tools. For this reason, it is important to cite the OTU clustering tools that you used directly, in addition to citing QIIME. There are a number of OTU clustering tools available through QIIME’s workflows, including open source (e.g., SortMeRNA, SUMACLUST, and swarm) and closed source tools (e.g., uclust and usearch). uclust is the default OTU clustering tool used in QIIME’s workflows. We are currently evaluating changing the default OTU clustering tool to one of the open source alternatives for future versions of QIIME.
In a de novo OTU picking process, reads are clustered against one another without any external reference sequence collection. pick_de_novo_otus.py is the primary interface for de novo OTU picking in QIIME, and includes taxonomy assignment, sequence alignment, and tree-building steps. A benefit of de novo OTU picking is that all reads are clustered. A drawback is that there is no existing support for running this in parallel in QIIME, so it can be too slow to apply to large datasets (e.g., more than 10 million reads).
You must use de novo OTU picking if:
You cannot use de novo OTU picking if:
Pros:
Cons:
In a closed-reference OTU picking process, reads are clustered against a reference sequence collection and any reads which do not hit a sequence in the reference sequence collection are excluded from downstream analyses. pick_closed_reference_otus.py is the primary interface for closed-reference OTU picking in QIIME. If the user provides taxonomic assignments for sequences in the reference database, those are assigned to OTUs.
You must use closed-reference OTU picking if:
You cannot use closed-reference OTU picking if:
Pros:
Cons:
In an open-reference OTU picking process, reads are clustered against a reference sequence collection and any reads which do not hit the reference sequence collection are subsequently clustered de novo. pick_open_reference_otus.py is the primary interface for open-reference OTU picking in QIIME, and includes taxonomy assignment, sequence alignment, and tree-building steps.
Open-reference OTU picking with pick_open_reference_otus.py is the preferred strategy for OTU picking among the QIIME developers.
You cannot use open-reference OTU picking if:
Pros:
Cons:
Please refer to the script usage examples in pick_de_novo_otus.py, pick_closed_reference_otus.py, and pick_open_reference_otus.py, and the QIIME Illumina Overview Tutorial and the QIIME 454 Overview Tutorial for examples of how to use QIIME’s OTU picking workflows.
If you’re interested only in dereplicating sequences as your OTU picking process, that is a special case of de novo clustering where the similarity threshold is 100%. To achieve that you can do the following:
pick_de_novo_otus.py -i $PWD/seqs.fna -o $PWD/derep_uc/ -p $PWD/dereplication_params.txt
where the following is in $PWD/dereplication_params.txt:
pick_otus:similarity 1.0
If you’re interested in running the usearch OTU pickers in size-order mode (meaning that accepts are prioritized by the size of the cluster rather than the percent identity), add the following lines to a parameters file:
pick_otus:otu_picking_method usearch61
pick_otus:sizeorder True
pick_otus:maxaccepts 16
pick_otus:maxrejects 64
Pass this parameters file via -p to any of the three OTU picking workflows in QIIME.