News and Announcements » |
This document describes how to perform chained or multi-step OTU picking. This is relevant, for example, when you have a very large collection of sequences, and you first want to apply a fast but rough OTU picker (e.g., PrefixSuffix) and then want to apply a slow but better OTU picker (e.g., cdhit).
This example illustrates how to chain two OTU pickers, which is probably the most common usage. It is possible however to chain an arbitrary number of OTU pickers.
pick_otus.py -m prefix_suffix -u 0 -i seqs.fasta -o prefix_picked_otus
The resulting OTU map will look something like:
0 seq1 seq2 seq5
1 seq3 seq4
2 seq6 seq7 seq8
where 0, 1, and 2 are OTU ids, and seq* are sequence ids.
pick_rep_set.py -i prefix_picked_otus/seqs_otus.txt -f seqs.fasta -o prefix_picked_otus/repr_set.fasta
The resulting fasta file will look something like:
>0
ACCGGTAGAGAGATTAG
>1
ACGGTGCAGGGA
>2
CCTGAGGGGTTTGGAAAAAAG
pick_otus.py -m cdhit -i prefix_picked_otus/repr_set.fasta -o prefix_picked_otus/cdhit_picked_otus/
The resulting OTU map will look something like:
0 0 2
1 1
where the first column (containing 0 and 1) are the OTU ids and the remaining values on each line are the OTU ids from the first round of OTU picking.
You’ll need to provide the OTU map filepaths generated above in the order that they were generated! The files paths (passed via -i) should be comma-separated, with no spaces.
merge_otu_maps.py -i prefix_picked_otus/seqs_otus.txt,prefix_picked_otus/cdhit_picked_otus/rep_set_otus.txt -o otus.txt
The resulting OTU map will look something like:
0 seq1 seq2 seq5 seq6 seq7 seq8
1 seq3 seq4
where 0 and 1 are the OTU ids from step 3 (i.e., the second round of OTU picking), and seq* are the original sequence identifiers.
At this stage, you’ll pass the final OTU map (otus.txt) and the original sequence collection (seqs.fasta).
pick_rep_set.py -i otus.txt -f seqs.fasta -o repr_set.fasta