|News and Announcements »|
This script should be applied to generate a useful tree when aligning against a template alignment (e.g., with PyNAST). This script will remove positions which are gaps in every sequence (common for PyNAST, as typical sequences cover only 200-400 bases, and they are being aligned against the full 16S gene). Additionally, the user can supply a lanemask file, that defines which positions should included when building the tree, and which should be ignored. Typically, this will differentiate between non-conserved positions, which are uninformative for tree building, and conserved positions which are informative for tree building. FILTERING ALIGNMENTS WHICH WERE BUILT WITH PYNAST AGAINST THE GREENGENES CORE SET ALIGNMENT SHOULD BE CONSIDERED AN ESSENTIAL STEP.
Usage: filter_alignment.py [options]
The output of filter_alignment.py consists of a single FASTA file, which ends with “pfiltered.fasta”, where the “p” stands for positional filtering of the columns.
As a simple example of this script, the user can use the following command, which consists of an input FASTA file (i.e. resulting file from align_seqs.py) and the output directory “filtered_alignment/”:
filter_alignment.py -i seqs_rep_set_aligned.fasta -o filtered_alignment/
Apply the same filtering as above, but additionally remove sequences whose distance from the majority consensus sequence is more than 3 (can be changed by passing –threshold) standard deviations above the mean:
filter_alignment.py -i seqs_rep_set_aligned.fasta -o filtered_alignment/ --remove_outliers
Alternatively, if the user would like to use a different gap fraction threshold (“-g”), they can use the following command:
filter_alignment.py -i seqs_rep_set_aligned.fasta -o filtered_alignment/ -g 0.95